diff --git a/README.md b/README.md index 1fd6774..d4a0057 100644 --- a/README.md +++ b/README.md @@ -84,7 +84,7 @@ Below are some examples generated by the model:


- Fig.2 VLDM + Fig.2 Architecture of the first stage.

@@ -354,7 +354,7 @@ Please visit ### 模型局限 (Limitation) -本**I2VGen-XL**项目的模型在处理以下情况会存在局限性: +目前,我们发现**I2VGen-XL**方法在处理以下情况会存在一定的局限性: - 小目标生成能力有限,在生成较小目标的时候,会存在一定的错误 - 快速运动目标生成能力有限,当生成快速运动目标时,可能会出现一些假象和不合理的情况 - 生成速度较慢,生成高清视频会明显导致生成速度减慢 @@ -362,10 +362,10 @@ Please visit 此外,我们研究也发现,生成的视频空间上的质量和时序上的变化速度在一定程度上存在互斥现象,在本项目我们选择了其折中的模型,兼顾两者间的平衡。 -The model of the **I2VGen-XL** project has limitations in the following scenarios: -- Limited ability to generate small objects: There may be some errors when generating smaller objects. -- Limited ability to generate fast-moving objects: There may be some artifacts when generating fast-moving objects. -- Slow generation speed: Generating high-definition videos significantly slows down the generation speed. +Currently, we have found certain limitations of the I2VGen-XL method in handling the following situations: +- Limited ability to generate small objects. There may be some errors when generating smaller objects. +- Limited ability to generate fast-moving objects. There may be some artifacts when generating fast-moving objects. +- Slow generation speed. Generating high-definition videos significantly slows down the generation speed. Additionally, our research has found that there is a trade-off between the spatial quality and temporal variability of the generated videos. In this project, we have chosen a model that strikes a balance between the two. @@ -387,10 +387,10 @@ Additionally, our research has found that there is a trade-off between the spati Our training data mainly comes from various sources and has the following attributes: -- Mixed training: The model is trained with a 7:1 ratio of video to image to ensure the quality of video generation. -- Wide class distribution: The data set covers most real-world categories, including people, animals, locomotives, science fiction, scenes, etc. with a total volume of billions of data points. -- Wide source distribution: The data comes from open-source data, video websites, and other internal sources, with varying resolutions and aspect ratios. -- High-quality data construction: To improve the quality of the model-generated videos, we constructed approximately 200,000 high-quality data pairs for fine-tuning the pre-training model. +- Mixed training. The model is trained with a 7:1 ratio of video to image to ensure the quality of video generation. +- Wide class distribution. The data set covers most real-world categories, including people, animals, locomotives, science fiction, scenes, etc. with a total volume of billions of data points. +- Wide source distribution. The data comes from open-source data, video websites, and other internal sources, with varying resolutions and aspect ratios. +- High-quality data construction. To improve the quality of the model-generated videos, we constructed approximately 200,000 high-quality data pairs for fine-tuning the pre-training model. 更强更灵活的视频生成模型会持续发布,及其背后技术报告正在撰写中,欢迎及时关注。