mirror of
https://modelscope.cn/models/iic/Image-to-Video
synced 2026-04-02 19:42:53 +08:00
Update README.md
This commit is contained in:
76
README.md
76
README.md
@ -2,7 +2,7 @@
|
|||||||
|
|
||||||
本项目**MS-Image2Video**旨在解决根据输入图像生成高清视频任务。**MS-Image2Video**由达摩院研发的高清视频生成基础模型,其核心部分包含两个阶段,分别解决语义一致性和清晰度的问题,参数量共计约37亿,模型经过在大规模视频和图像数据混合预训练,并在少量精品数据上微调得到,该数据分布广泛、类别多样化,模型对不同的数据均有良好的泛化性。项目于现有的视频生成模型,**MS-Image2Video**在清晰度、质感、语义、时序连续性等方面均具有明显的优势。
|
本项目**MS-Image2Video**旨在解决根据输入图像生成高清视频任务。**MS-Image2Video**由达摩院研发的高清视频生成基础模型,其核心部分包含两个阶段,分别解决语义一致性和清晰度的问题,参数量共计约37亿,模型经过在大规模视频和图像数据混合预训练,并在少量精品数据上微调得到,该数据分布广泛、类别多样化,模型对不同的数据均有良好的泛化性。项目于现有的视频生成模型,**MS-Image2Video**在清晰度、质感、语义、时序连续性等方面均具有明显的优势。
|
||||||
|
|
||||||
此外,**MS-Image2Video**的许多设计理念继承于我们以公开工作**VideoComposer**,您可以参考我们的[VideoComposer](https://videocomposer.github.io)和本项目的Github代码库了解详细细节
|
此外,**MS-Image2Video**的许多设计理念继承于我们已经公开的工作**VideoComposer**,您可以参考我们的[VideoComposer](https://videocomposer.github.io)和本项目的Github代码库了解详细细节
|
||||||
|
|
||||||
The **MS-Image2Video** project aims to address the task of generating high-definition videos based on input images. Developed by Alibaba Cloud, the **MS-Image2Video** is a fundamental model for generating high-definition videos. Its core components consist of two stages that address the issues of semantic consistency and clarity, totaling approximately 3.7 billion parameters. The model is pre-trained on a large-scale mix of video and image data and fine-tuned on a small number of high-quality data sets with a wide range of distributions and diverse categories. The model demonstrates good generalization capabilities for different data types. Compared to existing video generation models, **MS-Image2Video** has significant advantages in terms of clarity, texture, semantics, and temporal continuity.
|
The **MS-Image2Video** project aims to address the task of generating high-definition videos based on input images. Developed by Alibaba Cloud, the **MS-Image2Video** is a fundamental model for generating high-definition videos. Its core components consist of two stages that address the issues of semantic consistency and clarity, totaling approximately 3.7 billion parameters. The model is pre-trained on a large-scale mix of video and image data and fine-tuned on a small number of high-quality data sets with a wide range of distributions and diverse categories. The model demonstrates good generalization capabilities for different data types. Compared to existing video generation models, **MS-Image2Video** has significant advantages in terms of clarity, texture, semantics, and temporal continuity.
|
||||||
|
|
||||||
@ -17,7 +17,7 @@ Additionally, many of the design concepts for **MS-Image2Video** are inherited f
|
|||||||
|
|
||||||
## 模型介绍 (Introduction)
|
## 模型介绍 (Introduction)
|
||||||
|
|
||||||
**MS-Image2Video**建立在Stable Diffusion之上,如图Fig.2所示,通过专门设计的时空UNet在隐空间中进行时空建模并通过解码器将重建出最终视频。为能够生成720P视频,我们将**MS-Image2Video**分为两个阶段,第一阶段保证语义一致性但低分辨率,第二阶段通过DDIM逆运算并在新的VLDM上进行去噪以提高视频分辨率已经时间和空间上的一致性。通过在模型、训练和数据上的联合优化,本项目主要具有以下几个特点:
|
**MS-Image2Video**建立在Stable Diffusion之上,如图Fig.2所示,通过专门设计的时空UNet在隐空间中进行时空建模并通过解码器重建出最终视频。为能够生成720P视频,我们将**MS-Image2Video**分为两个阶段,第一阶段保证语义一致性但低分辨率,第二阶段通过DDIM逆运算并在新的VLDM上进行去噪以提高视频分辨率以及同时提升时间和空间上的一致性。通过在模型、训练和数据上的联合优化,本项目主要具有以下几个特点:
|
||||||
|
|
||||||
- 高清&宽屏,可以直接生成720P(1280*720)分辨率的视频,且相比于现有的开源项目,不仅分辨率得到有效提高,其生产的宽屏视频可以适合更多的场景
|
- 高清&宽屏,可以直接生成720P(1280*720)分辨率的视频,且相比于现有的开源项目,不仅分辨率得到有效提高,其生产的宽屏视频可以适合更多的场景
|
||||||
- 无水印,模型通过我们内部大规模无水印视频/图像训练,并在高质量数据微调得到,生成的无水印视频可适用更多视频平台,减少许多限制
|
- 无水印,模型通过我们内部大规模无水印视频/图像训练,并在高质量数据微调得到,生成的无水印视频可适用更多视频平台,减少许多限制
|
||||||
@ -44,7 +44,7 @@ Below are some examples generated by the model:
|
|||||||
<p>
|
<p>
|
||||||
</center>
|
</center>
|
||||||
|
|
||||||
**为方便展示,本页面展示为低分辨率GIF格式,但是GIF会下降视频质量,具体效果可以参下面的视频链接**
|
**为方便展示,本页面展示为低分辨率GIF格式,但是GIF会下降视频质量,720P的视频效果可以参下面对应的视频链接**
|
||||||
|
|
||||||
**For display purposes, this page shows low-resolution GIF format. However, GIF format may reduce video quality. For specific effects, please refer to the video link below.**
|
**For display purposes, this page shows low-resolution GIF format. However, GIF format may reduce video quality. For specific effects, please refer to the video link below.**
|
||||||
|
|
||||||
@ -60,10 +60,10 @@ Below are some examples generated by the model:
|
|||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td ><center>
|
<td ><center>
|
||||||
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424529635078.mp4">Video</a>
|
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424319402790.mp4">HQ Video</a>
|
||||||
</center></td>
|
</center></td>
|
||||||
<td ><center>
|
<td ><center>
|
||||||
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424518475338.mp4">Video</a>
|
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423628044217.mp4">HQ Video</a>
|
||||||
</center></td>
|
</center></td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
@ -76,10 +76,10 @@ Below are some examples generated by the model:
|
|||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td ><center>
|
<td ><center>
|
||||||
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424533127167.mp4">Video</a>
|
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423965629168.mp4">HQ Video</a>
|
||||||
</center></td>
|
</center></td>
|
||||||
<td ><center>
|
<td ><center>
|
||||||
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424521315462.mp4">Video</a>
|
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423969933887.mp4">HQ Video</a>
|
||||||
</center></td>
|
</center></td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
@ -92,10 +92,10 @@ Below are some examples generated by the model:
|
|||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td ><center>
|
<td ><center>
|
||||||
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423542860291.mp4">Video</a>
|
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423966661082.mp4">HQ Video</a>
|
||||||
</center></td>
|
</center></td>
|
||||||
<td ><center>
|
<td ><center>
|
||||||
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424524367930.mp4">Video</a>
|
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424613631285.mp4">HQ Video</a>
|
||||||
</center></td>
|
</center></td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
@ -108,10 +108,10 @@ Below are some examples generated by the model:
|
|||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td ><center>
|
<td ><center>
|
||||||
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424528199927.mp4">Video</a>
|
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424612211915.mp4">HQ Video</a>
|
||||||
</center></td>
|
</center></td>
|
||||||
<td ><center>
|
<td ><center>
|
||||||
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423539648760.mp4">Video</a>
|
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424613123188.mp4">HQ Video</a>
|
||||||
</center></td>
|
</center></td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
@ -124,10 +124,10 @@ Below are some examples generated by the model:
|
|||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td ><center>
|
<td ><center>
|
||||||
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423884045304.mp44">Video</a>
|
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424616459162.mp4">HQ Video</a>
|
||||||
</center></td>
|
</center></td>
|
||||||
<td ><center>
|
<td ><center>
|
||||||
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423884473037.mp4">Video</a>
|
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424614735831.mp4">HQ Video</a>
|
||||||
</center></td>
|
</center></td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
@ -140,10 +140,10 @@ Below are some examples generated by the model:
|
|||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td ><center>
|
<td ><center>
|
||||||
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423549068330.mp4">Video</a>
|
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424617591002.mp4">HQ Video</a>
|
||||||
</center></td>
|
</center></td>
|
||||||
<td ><center>
|
<td ><center>
|
||||||
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423551840372.mp4">Video</a>
|
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423631572030.mp4">HQ Video</a>
|
||||||
</center></td>
|
</center></td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
@ -156,10 +156,10 @@ Below are some examples generated by the model:
|
|||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td ><center>
|
<td ><center>
|
||||||
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423874845411.mp4">Video</a>
|
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423629092176.mp4">HQ Video</a>
|
||||||
</center></td>
|
</center></td>
|
||||||
<td ><center>
|
<td ><center>
|
||||||
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424528199907.mp4">Video</a>
|
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424616071017.mp4">HQ Video</a>
|
||||||
</center></td>
|
</center></td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
@ -172,10 +172,10 @@ Below are some examples generated by the model:
|
|||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td ><center>
|
<td ><center>
|
||||||
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423534560646.mp4">Video</a>
|
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424317682762.mp4">HQ Video</a>
|
||||||
</center></td>
|
</center></td>
|
||||||
<td ><center>
|
<td ><center>
|
||||||
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423545600161.mp4">Video</a>
|
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424313138794.mp4">HQ Video</a>
|
||||||
</center></td>
|
</center></td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
@ -188,10 +188,10 @@ Below are some examples generated by the model:
|
|||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td ><center>
|
<td ><center>
|
||||||
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424529635090.mp4">Video</a>
|
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423631376023.mp4">HQ Video</a>
|
||||||
</center></td>
|
</center></td>
|
||||||
<td ><center>
|
<td ><center>
|
||||||
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423536504779.mp4">Video</a>
|
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424616459198.mp4">HQ Video</a>
|
||||||
</center></td>
|
</center></td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
@ -204,10 +204,10 @@ Below are some examples generated by the model:
|
|||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td ><center>
|
<td ><center>
|
||||||
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424235610879.mp4">Video</a>
|
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424314646086.mp4">HQ Video</a>
|
||||||
</center></td>
|
</center></td>
|
||||||
<td ><center>
|
<td ><center>
|
||||||
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424533391396.mp4">Video</a>
|
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424610479196.mp4">HQ Video</a>
|
||||||
</center></td>
|
</center></td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
@ -220,10 +220,10 @@ Below are some examples generated by the model:
|
|||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td ><center>
|
<td ><center>
|
||||||
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423551840396.mp4">Video</a>
|
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424321438157.mp4">HQ Video</a>
|
||||||
</center></td>
|
</center></td>
|
||||||
<td ><center>
|
<td ><center>
|
||||||
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424535435146.mp4">Video</a>
|
<a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424614283086.mp4">HQ Video</a>
|
||||||
</center></td>
|
</center></td>
|
||||||
</tr>
|
</tr>
|
||||||
</table>
|
</table>
|
||||||
@ -232,17 +232,26 @@ Below are some examples generated by the model:
|
|||||||
|
|
||||||
### 依赖项 (Dependency)
|
### 依赖项 (Dependency)
|
||||||
|
|
||||||
本**MS-Image2Video**项目适配ModelScope代码库,以下是本项目需要安装的部分依赖项:
|
|
||||||
|
首先你需要确定你的系统安装了*ffmpeg*命令,如果没有,可以通过以下命令来安装:
|
||||||
|
|
||||||
|
First, you need to ensure that your system has installed the ffmpeg command. If it is not installed, you can install it using the following command:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo apt-get update && apt-get install ffmpeg libsm6 libxext6 -y
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
其次,本**MS-Image2Video**项目适配ModelScope代码库,以下是本项目需要安装的部分依赖项。
|
||||||
|
|
||||||
The **MS-Image2Video** project is compatible with the ModelScope codebase, and the following are some of the dependencies that need to be installed for this project.
|
The **MS-Image2Video** project is compatible with the ModelScope codebase, and the following are some of the dependencies that need to be installed for this project.
|
||||||
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
pip install modelscope==1.4.2
|
pip install modelscope==1.4.2
|
||||||
pip install -U xformers
|
pip install -U xformers
|
||||||
pip install torch==2.0.1
|
pip install torch==2.0.1
|
||||||
pip install open_clip_torch>=2.0.2
|
pip install open_clip_torch>=2.0.2
|
||||||
pip install easydict
|
|
||||||
pip install numpy
|
|
||||||
pip install opencv-python-headless
|
pip install opencv-python-headless
|
||||||
pip install opencv-python
|
pip install opencv-python
|
||||||
pip install einops>=0.4
|
pip install einops>=0.4
|
||||||
@ -263,14 +272,7 @@ For more experiments, please stay tuned for our upcoming technical report and op
|
|||||||
|
|
||||||
### 代码范例 (Code example)
|
### 代码范例 (Code example)
|
||||||
```python
|
```python
|
||||||
from modelscope.pipelines import pipeline
|
|
||||||
from modelscope.outputs import OutputKeys
|
|
||||||
|
|
||||||
pipe = pipeline("image-to-video", 'damo/Image-to-Video')
|
|
||||||
|
|
||||||
# IMG_PATH: your image path (url or loacl file)
|
|
||||||
output_video_path = pipe(IMG_PATH, output_video='./output.mp4')[OutputKeys.OUTPUT_VIDEO]
|
|
||||||
print(output_video_path)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
@ -343,6 +345,4 @@ The relevant technical report is currently being written, and we welcome you to
|
|||||||
Our code and model weights are only available for personal/academic research use and are currently not supported for commercial use.
|
Our code and model weights are only available for personal/academic research use and are currently not supported for commercial use.
|
||||||
|
|
||||||
## 联系我们 (Contact Us)
|
## 联系我们 (Contact Us)
|
||||||
如果你想联系我们的算法/产品同学, 或者想加入我们的算法团队(实习/正式), 欢迎发邮件至: <yingya.zyy@alibaba-inc.com>。
|
如果你想联系我们的算法/产品同学, 或者想加入我们的算法团队(实习/正式), 欢迎发邮件至: <yingya.zyy@alibaba-inc.com>。
|
||||||
|
|
||||||
If you would like to contact us, or join our team (internship/formal), please feel free to email us at <yingya.zyy@alibaba-inc.com>.
|
|
||||||
Reference in New Issue
Block a user