diff --git a/README.md b/README.md index 6ad609b..81ffc6b 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ 本项目**MS-Image2Video**旨在解决根据输入图像生成高清视频任务。**MS-Image2Video**由达摩院研发的高清视频生成基础模型,其核心部分包含两个阶段,分别解决语义一致性和清晰度的问题,参数量共计约37亿,模型经过在大规模视频和图像数据混合预训练,并在少量精品数据上微调得到,该数据分布广泛、类别多样化,模型对不同的数据均有良好的泛化性。项目于现有的视频生成模型,**MS-Image2Video**在清晰度、质感、语义、时序连续性等方面均具有明显的优势。 -此外,**MS-Image2Video**的许多设计理念继承于我们以公开工作**VideoComposer**,您可以参考我们的[VideoComposer](https://videocomposer.github.io)和本项目的Github代码库了解详细细节 +此外,**MS-Image2Video**的许多设计理念继承于我们已经公开的工作**VideoComposer**,您可以参考我们的[VideoComposer](https://videocomposer.github.io)和本项目的Github代码库了解详细细节 The **MS-Image2Video** project aims to address the task of generating high-definition videos based on input images. Developed by Alibaba Cloud, the **MS-Image2Video** is a fundamental model for generating high-definition videos. Its core components consist of two stages that address the issues of semantic consistency and clarity, totaling approximately 3.7 billion parameters. The model is pre-trained on a large-scale mix of video and image data and fine-tuned on a small number of high-quality data sets with a wide range of distributions and diverse categories. The model demonstrates good generalization capabilities for different data types. Compared to existing video generation models, **MS-Image2Video** has significant advantages in terms of clarity, texture, semantics, and temporal continuity. @@ -17,7 +17,7 @@ Additionally, many of the design concepts for **MS-Image2Video** are inherited f ## 模型介绍 (Introduction) -**MS-Image2Video**建立在Stable Diffusion之上,如图Fig.2所示,通过专门设计的时空UNet在隐空间中进行时空建模并通过解码器将重建出最终视频。为能够生成720P视频,我们将**MS-Image2Video**分为两个阶段,第一阶段保证语义一致性但低分辨率,第二阶段通过DDIM逆运算并在新的VLDM上进行去噪以提高视频分辨率已经时间和空间上的一致性。通过在模型、训练和数据上的联合优化,本项目主要具有以下几个特点: +**MS-Image2Video**建立在Stable Diffusion之上,如图Fig.2所示,通过专门设计的时空UNet在隐空间中进行时空建模并通过解码器重建出最终视频。为能够生成720P视频,我们将**MS-Image2Video**分为两个阶段,第一阶段保证语义一致性但低分辨率,第二阶段通过DDIM逆运算并在新的VLDM上进行去噪以提高视频分辨率以及同时提升时间和空间上的一致性。通过在模型、训练和数据上的联合优化,本项目主要具有以下几个特点: - 高清&宽屏,可以直接生成720P(1280*720)分辨率的视频,且相比于现有的开源项目,不仅分辨率得到有效提高,其生产的宽屏视频可以适合更多的场景 - 无水印,模型通过我们内部大规模无水印视频/图像训练,并在高质量数据微调得到,生成的无水印视频可适用更多视频平台,减少许多限制 @@ -44,7 +44,7 @@ Below are some examples generated by the model:

-**为方便展示,本页面展示为低分辨率GIF格式,但是GIF会下降视频质量,具体效果可以参下面的视频链接** +**为方便展示,本页面展示为低分辨率GIF格式,但是GIF会下降视频质量,720P的视频效果可以参下面对应的视频链接** **For display purposes, this page shows low-resolution GIF format. However, GIF format may reduce video quality. For specific effects, please refer to the video link below.** @@ -60,10 +60,10 @@ Below are some examples generated by the model:

- Video + HQ Video
- Video + HQ Video
@@ -76,10 +76,10 @@ Below are some examples generated by the model:
- Video + HQ Video
- Video + HQ Video
@@ -92,10 +92,10 @@ Below are some examples generated by the model:
- Video + HQ Video
- Video + HQ Video
@@ -108,10 +108,10 @@ Below are some examples generated by the model:
- Video + HQ Video
- Video + HQ Video
@@ -124,10 +124,10 @@ Below are some examples generated by the model:
- Video + HQ Video
- Video + HQ Video
@@ -140,10 +140,10 @@ Below are some examples generated by the model:
- Video + HQ Video
- Video + HQ Video
@@ -156,10 +156,10 @@ Below are some examples generated by the model:
- Video + HQ Video
- Video + HQ Video
@@ -172,10 +172,10 @@ Below are some examples generated by the model:
- Video + HQ Video
- Video + HQ Video
@@ -188,10 +188,10 @@ Below are some examples generated by the model:
- Video + HQ Video
- Video + HQ Video
@@ -204,10 +204,10 @@ Below are some examples generated by the model:
- Video + HQ Video
- Video + HQ Video
@@ -220,10 +220,10 @@ Below are some examples generated by the model:
- Video + HQ Video
- Video + HQ Video
@@ -232,17 +232,26 @@ Below are some examples generated by the model: ### 依赖项 (Dependency) -本**MS-Image2Video**项目适配ModelScope代码库,以下是本项目需要安装的部分依赖项: + +首先你需要确定你的系统安装了*ffmpeg*命令,如果没有,可以通过以下命令来安装: + +First, you need to ensure that your system has installed the ffmpeg command. If it is not installed, you can install it using the following command: + +```bash +sudo apt-get update && apt-get install ffmpeg libsm6 libxext6 -y +``` + + +其次,本**MS-Image2Video**项目适配ModelScope代码库,以下是本项目需要安装的部分依赖项。 The **MS-Image2Video** project is compatible with the ModelScope codebase, and the following are some of the dependencies that need to be installed for this project. + ```bash pip install modelscope==1.4.2 pip install -U xformers pip install torch==2.0.1 pip install open_clip_torch>=2.0.2 -pip install easydict -pip install numpy pip install opencv-python-headless pip install opencv-python pip install einops>=0.4 @@ -263,14 +272,7 @@ For more experiments, please stay tuned for our upcoming technical report and op ### 代码范例 (Code example) ```python -from modelscope.pipelines import pipeline -from modelscope.outputs import OutputKeys -pipe = pipeline("image-to-video", 'damo/Image-to-Video') - -# IMG_PATH: your image path (url or loacl file) -output_video_path = pipe(IMG_PATH, output_video='./output.mp4')[OutputKeys.OUTPUT_VIDEO] -print(output_video_path) ``` @@ -343,6 +345,4 @@ The relevant technical report is currently being written, and we welcome you to Our code and model weights are only available for personal/academic research use and are currently not supported for commercial use. ## 联系我们 (Contact Us) -如果你想联系我们的算法/产品同学, 或者想加入我们的算法团队(实习/正式), 欢迎发邮件至: 。 - -If you would like to contact us, or join our team (internship/formal), please feel free to email us at . \ No newline at end of file +如果你想联系我们的算法/产品同学, 或者想加入我们的算法团队(实习/正式), 欢迎发邮件至: 。 \ No newline at end of file