Update README.md

2026-05-20 00:22:53 +08:00 · 2023-08-21 09:42:03 +00:00
parent 43da671952
commit ef97120203
1 changed files with 38 additions and 38 deletions
--- a/README.md
+++ b/README.md
@ -2,7 +2,7 @@
 本项目**MS-Image2Video**旨在解决根据输入图像生成高清视频任务。**MS-Image2Video**由达摩院研发的高清视频生成基础模型，其核心部分包含两个阶段，分别解决语义一致性和清晰度的问题，参数量共计约37亿，模型经过在大规模视频和图像数据混合预训练，并在少量精品数据上微调得到，该数据分布广泛、类别多样化，模型对不同的数据均有良好的泛化性。项目于现有的视频生成模型，**MS-Image2Video**在清晰度、质感、语义、时序连续性等方面均具有明显的优势。
-此外，**MS-Image2Video**的许多设计理念继承于我们以公开工作**VideoComposer**，您可以参考我们的[VideoComposer](https://videocomposer.github.io)和本项目的Github代码库了解详细细节
+此外，**MS-Image2Video**的许多设计理念继承于我们已经公开的工作**VideoComposer**，您可以参考我们的[VideoComposer](https://videocomposer.github.io)和本项目的Github代码库了解详细细节
 The **MS-Image2Video** project aims to address the task of generating high-definition videos based on input images. Developed by Alibaba Cloud, the **MS-Image2Video** is a fundamental model for generating high-definition videos. Its core components consist of two stages that address the issues of semantic consistency and clarity, totaling approximately 3.7 billion parameters. The model is pre-trained on a large-scale mix of video and image data and fine-tuned on a small number of high-quality data sets with a wide range of distributions and diverse categories. The model demonstrates good generalization capabilities for different data types. Compared to existing video generation models, **MS-Image2Video** has significant advantages in terms of clarity, texture, semantics, and temporal continuity. 
@ -17,7 +17,7 @@ Additionally, many of the design concepts for **MS-Image2Video** are inherited f
 ## 模型介绍 (Introduction)
-**MS-Image2Video**建立在Stable Diffusion之上，如图Fig.2所示，通过专门设计的时空UNet在隐空间中进行时空建模并通过解码器将重建出最终视频。为能够生成720P视频，我们将**MS-Image2Video**分为两个阶段，第一阶段保证语义一致性但低分辨率，第二阶段通过DDIM逆运算并在新的VLDM上进行去噪以提高视频分辨率已经时间和空间上的一致性。通过在模型、训练和数据上的联合优化，本项目主要具有以下几个特点：
+**MS-Image2Video**建立在Stable Diffusion之上，如图Fig.2所示，通过专门设计的时空UNet在隐空间中进行时空建模并通过解码器重建出最终视频。为能够生成720P视频，我们将**MS-Image2Video**分为两个阶段，第一阶段保证语义一致性但低分辨率，第二阶段通过DDIM逆运算并在新的VLDM上进行去噪以提高视频分辨率以及同时提升时间和空间上的一致性。通过在模型、训练和数据上的联合优化，本项目主要具有以下几个特点：
 - 高清&宽屏，可以直接生成720P(1280*720)分辨率的视频，且相比于现有的开源项目，不仅分辨率得到有效提高，其生产的宽屏视频可以适合更多的场景
 - 无水印，模型通过我们内部大规模无水印视频/图像训练，并在高质量数据微调得到，生成的无水印视频可适用更多视频平台，减少许多限制
@ -44,7 +44,7 @@ Below are some examples generated by the model:
 <p>
 </center>
-**为方便展示，本页面展示为低分辨率GIF格式，但是GIF会下降视频质量，具体效果可以参下面的视频链接**
+**为方便展示，本页面展示为低分辨率GIF格式，但是GIF会下降视频质量，720P的视频效果可以参下面对应的视频链接**
 **For display purposes, this page shows low-resolution GIF format. However, GIF format may reduce video quality. For specific effects, please refer to the video link below.**
@ -60,10 +60,10 @@ Below are some examples generated by the model:
  </tr>
  <tr>
    <td ><center>
-        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424529635078.mp4">Video</a>
+        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424319402790.mp4">HQ Video</a>
    </center></td>
    <td ><center>
-        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424518475338.mp4">Video</a>
+        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423628044217.mp4">HQ Video</a>
    </center></td>
  </tr>
    <tr>
@ -76,10 +76,10 @@ Below are some examples generated by the model:
  </tr>
  <tr>
    <td ><center>
-        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424533127167.mp4">Video</a>
+        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423965629168.mp4">HQ Video</a>
    </center></td>
    <td ><center>
-        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424521315462.mp4">Video</a>
+        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423969933887.mp4">HQ Video</a>
    </center></td>
  </tr>
  <tr>
@ -92,10 +92,10 @@ Below are some examples generated by the model:
  </tr>
  <tr>
    <td ><center>
-        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423542860291.mp4">Video</a>
+        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423966661082.mp4">HQ Video</a>
    </center></td>
    <td ><center>
-        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424524367930.mp4">Video</a>
+        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424613631285.mp4">HQ Video</a>
    </center></td>
  </tr>
  <tr>
@ -108,10 +108,10 @@ Below are some examples generated by the model:
  </tr>
  <tr>
    <td ><center>
-        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424528199927.mp4">Video</a>
+        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424612211915.mp4">HQ Video</a>
    </center></td>
    <td ><center>
-        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423539648760.mp4">Video</a>
+        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424613123188.mp4">HQ Video</a>
    </center></td>
  </tr>
    <tr>
@ -124,10 +124,10 @@ Below are some examples generated by the model:
  </tr>
  <tr>
    <td ><center>
-        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423884045304.mp44">Video</a>
+        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424616459162.mp4">HQ Video</a>
    </center></td>
    <td ><center>
-        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423884473037.mp4">Video</a>
+        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424614735831.mp4">HQ Video</a>
    </center></td>
  </tr>
  <tr>
@ -140,10 +140,10 @@ Below are some examples generated by the model:
  </tr>
  <tr>
    <td ><center>
-        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423549068330.mp4">Video</a>
+        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424617591002.mp4">HQ Video</a>
    </center></td>
    <td ><center>
-        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423551840372.mp4">Video</a>
+        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423631572030.mp4">HQ Video</a>
    </center></td>
  </tr>
  <tr>
@ -156,10 +156,10 @@ Below are some examples generated by the model:
  </tr>
  <tr>
    <td ><center>
-        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423874845411.mp4">Video</a>
+        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423629092176.mp4">HQ Video</a>
    </center></td>
    <td ><center>
-        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424528199907.mp4">Video</a>
+        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424616071017.mp4">HQ Video</a>
    </center></td>
  </tr>
  <tr>
@ -172,10 +172,10 @@ Below are some examples generated by the model:
  </tr>
  <tr>
    <td ><center>
-        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423534560646.mp4">Video</a>
+        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424317682762.mp4">HQ Video</a>
    </center></td>
    <td ><center>
-        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423545600161.mp4">Video</a>
+        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424313138794.mp4">HQ Video</a>
    </center></td>
  </tr>
  <tr>
@ -188,10 +188,10 @@ Below are some examples generated by the model:
  </tr>
  <tr>
    <td ><center>
-        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424529635090.mp4">Video</a>
+        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423631376023.mp4">HQ Video</a>
    </center></td>
    <td ><center>
-        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423536504779.mp4">Video</a>
+        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424616459198.mp4">HQ Video</a>
    </center></td>
  </tr>
  <tr>
@ -204,10 +204,10 @@ Below are some examples generated by the model:
  </tr>
  <tr>
    <td ><center>
-        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424235610879.mp4">Video</a>
+        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424314646086.mp4">HQ Video</a>
    </center></td>
    <td ><center>
-        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424533391396.mp4">Video</a>
+        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424610479196.mp4">HQ Video</a>
    </center></td>
  </tr>
  <tr>
@ -220,10 +220,10 @@ Below are some examples generated by the model:
  </tr>
  <tr>
    <td ><center>
-        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/423551840396.mp4">Video</a>
+        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424321438157.mp4">HQ Video</a>
    </center></td>
    <td ><center>
-        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424535435146.mp4">Video</a>
+        <a href="https://cloud.video.taobao.com/play/u/null/p/1/e/6/t/1/424614283086.mp4">HQ Video</a>
    </center></td>
  </tr>
 </table>
@ -232,17 +232,26 @@ Below are some examples generated by the model:
 ### 依赖项 (Dependency)
-本**MS-Image2Video**项目适配ModelScope代码库，以下是本项目需要安装的部分依赖项：
+
 首先你需要确定你的系统安装了*ffmpeg*命令，如果没有，可以通过以下命令来安装：
 First, you need to ensure that your system has installed the ffmpeg command. If it is not installed, you can install it using the following command:
 ```bash
 sudo apt-get update && apt-get install ffmpeg libsm6 libxext6  -y
 ```
 其次，本**MS-Image2Video**项目适配ModelScope代码库，以下是本项目需要安装的部分依赖项。
 The **MS-Image2Video** project is compatible with the ModelScope codebase, and the following are some of the dependencies that need to be installed for this project.
 ```bash
 pip install modelscope==1.4.2
 pip install -U xformers
 pip install torch==2.0.1
 pip install open_clip_torch>=2.0.2
 pip install easydict
 pip install numpy 
 pip install opencv-python-headless
 pip install opencv-python 
 pip install einops>=0.4
@ -263,14 +272,7 @@ For more experiments, please stay tuned for our upcoming technical report and op
 ### 代码范例 (Code example)
 ```python
 from modelscope.pipelines import pipeline
 from modelscope.outputs import OutputKeys
 pipe = pipeline("image-to-video", 'damo/Image-to-Video')
 # IMG_PATH: your image path (url or loacl file)
 output_video_path = pipe(IMG_PATH, output_video='./output.mp4')[OutputKeys.OUTPUT_VIDEO]
 print(output_video_path)
 ```
@ -343,6 +345,4 @@ The relevant technical report is currently being written, and we welcome you to
 Our code and model weights are only available for personal/academic research use and are currently not supported for commercial use.
 ## 联系我们 (Contact Us)
-如果你想联系我们的算法/产品同学, 或者想加入我们的算法团队(实习/正式), 欢迎发邮件至: <yingya.zyy@alibaba-inc.com>。
+如果你想联系我们的算法/产品同学, 或者想加入我们的算法团队(实习/正式), 欢迎发邮件至: <yingya.zyy@alibaba-inc.com>。
 If you would like to contact us, or join our team (internship/formal), please feel free to email us at <yingya.zyy@alibaba-inc.com>.