Update README.md

This commit is contained in:
FaceZhao
2023-08-21 09:42:03 +00:00
parent 43da671952
commit ef97120203

View File

@ -2,7 +2,7 @@
本项目**MS-Image2Video**旨在解决根据输入图像生成高清视频任务。**MS-Image2Video**由达摩院研发的高清视频生成基础模型其核心部分包含两个阶段分别解决语义一致性和清晰度的问题参数量共计约37亿模型经过在大规模视频和图像数据混合预训练并在少量精品数据上微调得到该数据分布广泛、类别多样化模型对不同的数据均有良好的泛化性。项目于现有的视频生成模型**MS-Image2Video**在清晰度、质感、语义、时序连续性等方面均具有明显的优势。 本项目**MS-Image2Video**旨在解决根据输入图像生成高清视频任务。**MS-Image2Video**由达摩院研发的高清视频生成基础模型其核心部分包含两个阶段分别解决语义一致性和清晰度的问题参数量共计约37亿模型经过在大规模视频和图像数据混合预训练并在少量精品数据上微调得到该数据分布广泛、类别多样化模型对不同的数据均有良好的泛化性。项目于现有的视频生成模型**MS-Image2Video**在清晰度、质感、语义、时序连续性等方面均具有明显的优势。
此外,**MS-Image2Video**的许多设计理念继承于我们公开工作**VideoComposer**,您可以参考我们的[VideoComposer](https://videocomposer.github.io)和本项目的Github代码库了解详细细节 此外,**MS-Image2Video**的许多设计理念继承于我们已经公开工作**VideoComposer**,您可以参考我们的[VideoComposer](https://videocomposer.github.io)和本项目的Github代码库了解详细细节
The **MS-Image2Video** project aims to address the task of generating high-definition videos based on input images. Developed by Alibaba Cloud, the **MS-Image2Video** is a fundamental model for generating high-definition videos. Its core components consist of two stages that address the issues of semantic consistency and clarity, totaling approximately 3.7 billion parameters. The model is pre-trained on a large-scale mix of video and image data and fine-tuned on a small number of high-quality data sets with a wide range of distributions and diverse categories. The model demonstrates good generalization capabilities for different data types. Compared to existing video generation models, **MS-Image2Video** has significant advantages in terms of clarity, texture, semantics, and temporal continuity. The **MS-Image2Video** project aims to address the task of generating high-definition videos based on input images. Developed by Alibaba Cloud, the **MS-Image2Video** is a fundamental model for generating high-definition videos. Its core components consist of two stages that address the issues of semantic consistency and clarity, totaling approximately 3.7 billion parameters. The model is pre-trained on a large-scale mix of video and image data and fine-tuned on a small number of high-quality data sets with a wide range of distributions and diverse categories. The model demonstrates good generalization capabilities for different data types. Compared to existing video generation models, **MS-Image2Video** has significant advantages in terms of clarity, texture, semantics, and temporal continuity.
@ -17,7 +17,7 @@ Additionally, many of the design concepts for **MS-Image2Video** are inherited f
## 模型介绍 (Introduction) ## 模型介绍 (Introduction)
**MS-Image2Video**建立在Stable Diffusion之上如图Fig.2所示通过专门设计的时空UNet在隐空间中进行时空建模并通过解码器重建出最终视频。为能够生成720P视频我们将**MS-Image2Video**分为两个阶段第一阶段保证语义一致性但低分辨率第二阶段通过DDIM逆运算并在新的VLDM上进行去噪以提高视频分辨率已经时间和空间上的一致性。通过在模型、训练和数据上的联合优化,本项目主要具有以下几个特点: **MS-Image2Video**建立在Stable Diffusion之上如图Fig.2所示通过专门设计的时空UNet在隐空间中进行时空建模并通过解码器重建出最终视频。为能够生成720P视频我们将**MS-Image2Video**分为两个阶段第一阶段保证语义一致性但低分辨率第二阶段通过DDIM逆运算并在新的VLDM上进行去噪以提高视频分辨率以及同时提升时间和空间上的一致性。通过在模型、训练和数据上的联合优化,本项目主要具有以下几个特点:
- 高清&宽屏可以直接生成720P(1280*720)分辨率的视频,且相比于现有的开源项目,不仅分辨率得到有效提高,其生产的宽屏视频可以适合更多的场景 - 高清&宽屏可以直接生成720P(1280*720)分辨率的视频,且相比于现有的开源项目,不仅分辨率得到有效提高,其生产的宽屏视频可以适合更多的场景
- 无水印,模型通过我们内部大规模无水印视频/图像训练,并在高质量数据微调得到,生成的无水印视频可适用更多视频平台,减少许多限制 - 无水印,模型通过我们内部大规模无水印视频/图像训练,并在高质量数据微调得到,生成的无水印视频可适用更多视频平台,减少许多限制
@ -44,7 +44,7 @@ Below are some examples generated by the model:
<p> <p>
</center> </center>
**为方便展示本页面展示为低分辨率GIF格式但是GIF会下降视频质量具体效果可以参下面的视频链接** **为方便展示本页面展示为低分辨率GIF格式但是GIF会下降视频质量720P的视频效果可以参下面对应的视频链接**
**For display purposes, this page shows low-resolution GIF format. However, GIF format may reduce video quality. For specific effects, please refer to the video link below.** **For display purposes, this page shows low-resolution GIF format. However, GIF format may reduce video quality. For specific effects, please refer to the video link below.**
@ -60,10 +60,10 @@ Below are some examples generated by the model:
</tr> </tr>
<tr> <tr>
<td ><center> <td ><center>
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424529635078.mp4">Video</a> <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424319402790.mp4">HQ Video</a>
</center></td> </center></td>
<td ><center> <td ><center>
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424518475338.mp4">Video</a> <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423628044217.mp4">HQ Video</a>
</center></td> </center></td>
</tr> </tr>
<tr> <tr>
@ -76,10 +76,10 @@ Below are some examples generated by the model:
</tr> </tr>
<tr> <tr>
<td ><center> <td ><center>
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424533127167.mp4">Video</a> <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423965629168.mp4">HQ Video</a>
</center></td> </center></td>
<td ><center> <td ><center>
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424521315462.mp4">Video</a> <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423969933887.mp4">HQ Video</a>
</center></td> </center></td>
</tr> </tr>
<tr> <tr>
@ -92,10 +92,10 @@ Below are some examples generated by the model:
</tr> </tr>
<tr> <tr>
<td ><center> <td ><center>
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423542860291.mp4">Video</a> <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423966661082.mp4">HQ Video</a>
</center></td> </center></td>
<td ><center> <td ><center>
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424524367930.mp4">Video</a> <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424613631285.mp4">HQ Video</a>
</center></td> </center></td>
</tr> </tr>
<tr> <tr>
@ -108,10 +108,10 @@ Below are some examples generated by the model:
</tr> </tr>
<tr> <tr>
<td ><center> <td ><center>
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424528199927.mp4">Video</a> <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424612211915.mp4">HQ Video</a>
</center></td> </center></td>
<td ><center> <td ><center>
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423539648760.mp4">Video</a> <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424613123188.mp4">HQ Video</a>
</center></td> </center></td>
</tr> </tr>
<tr> <tr>
@ -124,10 +124,10 @@ Below are some examples generated by the model:
</tr> </tr>
<tr> <tr>
<td ><center> <td ><center>
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423884045304.mp44">Video</a> <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424616459162.mp4">HQ Video</a>
</center></td> </center></td>
<td ><center> <td ><center>
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423884473037.mp4">Video</a> <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424614735831.mp4">HQ Video</a>
</center></td> </center></td>
</tr> </tr>
<tr> <tr>
@ -140,10 +140,10 @@ Below are some examples generated by the model:
</tr> </tr>
<tr> <tr>
<td ><center> <td ><center>
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423549068330.mp4">Video</a> <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424617591002.mp4">HQ Video</a>
</center></td> </center></td>
<td ><center> <td ><center>
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423551840372.mp4">Video</a> <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423631572030.mp4">HQ Video</a>
</center></td> </center></td>
</tr> </tr>
<tr> <tr>
@ -156,10 +156,10 @@ Below are some examples generated by the model:
</tr> </tr>
<tr> <tr>
<td ><center> <td ><center>
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423874845411.mp4">Video</a> <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423629092176.mp4">HQ Video</a>
</center></td> </center></td>
<td ><center> <td ><center>
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424528199907.mp4">Video</a> <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424616071017.mp4">HQ Video</a>
</center></td> </center></td>
</tr> </tr>
<tr> <tr>
@ -172,10 +172,10 @@ Below are some examples generated by the model:
</tr> </tr>
<tr> <tr>
<td ><center> <td ><center>
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423534560646.mp4">Video</a> <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424317682762.mp4">HQ Video</a>
</center></td> </center></td>
<td ><center> <td ><center>
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423545600161.mp4">Video</a> <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424313138794.mp4">HQ Video</a>
</center></td> </center></td>
</tr> </tr>
<tr> <tr>
@ -188,10 +188,10 @@ Below are some examples generated by the model:
</tr> </tr>
<tr> <tr>
<td ><center> <td ><center>
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424529635090.mp4">Video</a> <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423631376023.mp4">HQ Video</a>
</center></td> </center></td>
<td ><center> <td ><center>
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423536504779.mp4">Video</a> <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424616459198.mp4">HQ Video</a>
</center></td> </center></td>
</tr> </tr>
<tr> <tr>
@ -204,10 +204,10 @@ Below are some examples generated by the model:
</tr> </tr>
<tr> <tr>
<td ><center> <td ><center>
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424235610879.mp4">Video</a> <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424314646086.mp4">HQ Video</a>
</center></td> </center></td>
<td ><center> <td ><center>
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424533391396.mp4">Video</a> <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424610479196.mp4">HQ Video</a>
</center></td> </center></td>
</tr> </tr>
<tr> <tr>
@ -220,10 +220,10 @@ Below are some examples generated by the model:
</tr> </tr>
<tr> <tr>
<td ><center> <td ><center>
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423551840396.mp4">Video</a> <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424321438157.mp4">HQ Video</a>
</center></td> </center></td>
<td ><center> <td ><center>
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424535435146.mp4">Video</a> <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424614283086.mp4">HQ Video</a>
</center></td> </center></td>
</tr> </tr>
</table> </table>
@ -232,17 +232,26 @@ Below are some examples generated by the model:
### 依赖项 (Dependency) ### 依赖项 (Dependency)
本**MS-Image2Video**项目适配ModelScope代码库以下是本项目需要安装的部分依赖项
首先你需要确定你的系统安装了*ffmpeg*命令,如果没有,可以通过以下命令来安装:
First, you need to ensure that your system has installed the ffmpeg command. If it is not installed, you can install it using the following command:
```bash
sudo apt-get update && apt-get install ffmpeg libsm6 libxext6 -y
```
其次,本**MS-Image2Video**项目适配ModelScope代码库以下是本项目需要安装的部分依赖项。
The **MS-Image2Video** project is compatible with the ModelScope codebase, and the following are some of the dependencies that need to be installed for this project. The **MS-Image2Video** project is compatible with the ModelScope codebase, and the following are some of the dependencies that need to be installed for this project.
```bash ```bash
pip install modelscope==1.4.2 pip install modelscope==1.4.2
pip install -U xformers pip install -U xformers
pip install torch==2.0.1 pip install torch==2.0.1
pip install open_clip_torch>=2.0.2 pip install open_clip_torch>=2.0.2
pip install easydict
pip install numpy
pip install opencv-python-headless pip install opencv-python-headless
pip install opencv-python pip install opencv-python
pip install einops>=0.4 pip install einops>=0.4
@ -263,14 +272,7 @@ For more experiments, please stay tuned for our upcoming technical report and op
### 代码范例 (Code example) ### 代码范例 (Code example)
```python ```python
from modelscope.pipelines import pipeline
from modelscope.outputs import OutputKeys
pipe = pipeline("image-to-video", 'damo/Image-to-Video')
# IMG_PATH: your image path (url or loacl file)
output_video_path = pipe(IMG_PATH, output_video='./output.mp4')[OutputKeys.OUTPUT_VIDEO]
print(output_video_path)
``` ```
@ -343,6 +345,4 @@ The relevant technical report is currently being written, and we welcome you to
Our code and model weights are only available for personal/academic research use and are currently not supported for commercial use. Our code and model weights are only available for personal/academic research use and are currently not supported for commercial use.
## 联系我们 (Contact Us) ## 联系我们 (Contact Us)
如果你想联系我们的算法/产品同学, 或者想加入我们的算法团队(实习/正式), 欢迎发邮件至: <yingya.zyy@alibaba-inc.com> 如果你想联系我们的算法/产品同学, 或者想加入我们的算法团队(实习/正式), 欢迎发邮件至: <yingya.zyy@alibaba-inc.com>
If you would like to contact us, or join our team (internship/formal), please feel free to email us at <yingya.zyy@alibaba-inc.com>.