update README (batch 1/1)

2026-05-31 01:22:53 +08:00 · 2025-11-24 16:32:02 +00:00
parent 9d296ee5e3
commit 726390b72a
2 changed files with 44 additions and 14 deletions
--- a/README.md
+++ b/README.md
@ -42,9 +42,11 @@ HunyuanVideo-1.5 is a video generation model that delivers top-tier quality with
  <a href=https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5 target="_blank"><img src= https://img.shields.io/badge/Page-bb8a2e.svg?logo=github height=22px></a>
  <a href="https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5/blob/report/HunyuanVideo_1_5.pdf" target="_blank"><img src=https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv height=22px></a>
  <a href=https://x.com/TencentHunyuan target="_blank"><img src=https://img.shields.io/badge/Hunyuan-black.svg?logo=x height=22px></a>
-  <a href="https://doc.weixin.qq.com/doc/w3_AXcAcwZSAGgCNACVygLxeQjyn4FYS?scode=AJEAIQdfAAoSfXnTj0AAkA-gaeACk" target="_blank"><img src=https://img.shields.io/badge/📚-PromptHandBook-blue.svg?logo=book height=22px></a> <br/>
+  <a href="https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5/blob/main/assets/HunyuanVideo_1_5_Prompt_Handbook_EN.md" target="_blank"><img src=https://img.shields.io/badge/📚-PromptHandBook-blue.svg?logo=book height=22px></a> <br/>
  <a href="./ComfyUI/README.md" target="_blank"><img src=https://img.shields.io/badge/ComfyUI-blue.svg?logo=book height=22px></a>
  <a href="https://github.com/ModelTC/LightX2V" target="_blank"><img src=https://img.shields.io/badge/LightX2V-yellow.svg?logo=book height=22px></a>
  <a href="https://tusi.cn/models/933574988890423836" target="_blank"><img src=https://img.shields.io/badge/吐司-purple.svg?logo=book height=22px></a>
  <a href="https://tensor.art/models/933574988890423836" target="_blank"><img src=https://img.shields.io/badge/TensorArt-cyan.svg?logo=book height=22px></a>
 </div>
@ -55,7 +57,8 @@ HunyuanVideo-1.5 is a video generation model that delivers top-tier quality with
 </p>
 ## 🔥🔥🔥 News
-👋 Nov 20, 2025: We release the inference code and model weights of HunyuanVideo-1.5.
+* 🚀 Nov 24, 2025: We now support cache inference, achieving approximately 2x speedup! Pull the latest code to try it.
 * 👋 Nov 20, 2025: We release the inference code and model weights of HunyuanVideo-1.5.
 ## 🎥 Demo
@ -168,6 +171,7 @@ pip install -i https://mirrors.tencent.com/pypi/simple/ --upgrade tencentcloud-s
  ```bash
  git clone https://github.com/Tencent-Hunyuan/flex-block-attn.git
  cd flex-block-attn
  git submodule update --init --recursive
  python3 setup.py install
  ```
@ -191,7 +195,7 @@ Download the pretrained models before generating videos. Detailed instructions a
 ### Prompt Writing Handbook
 Prompt enhancement plays a crucial role in enabling our model to generate high-quality videos. By writing longer and more detailed prompts, the generated video will be significantly improved. We encourage you to craft comprehensive and descriptive prompts to achieve the best possible video quality. we recommend community partners consulting our official guide on how to write effective prompts. 
-**Reference:** **[HunyuanVideo-1.5 Prompt Handbook](https://doc.weixin.qq.com/doc/w3_AXcAcwZSAGgCNACVygLxeQjyn4FYS?scode=AJEAIQdfAAoSfXnTj0AAkA-gaeACk)**
+**Reference:** **[HunyuanVideo-1.5 Prompt Handbook](https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5/blob/main/assets/HunyuanVideo_1_5_Prompt_Handbook_EN.md)**
 ### System Prompts for Automatic Prompt Enhancement
 For users seeking to optimize prompts for other large models, it is recommended to consult the definition of `t2v_rewrite_system_prompt` in the file `hyvideo/utils/rewrite/t2v_prompt.py` to guide text-to-video rewriting. Similarly, for image-to-video rewriting, refer to the definition of `i2v_rewrite_system_prompt` in `hyvideo/utils/rewrite/i2v_prompt.py`.
@ -229,9 +233,10 @@ OUTPUT_PATH=./outputs/output.mp4
 N_INFERENCE_GPU=8 # Parallel inference GPU count
 CFG_DISTILLED=true # Inference with CFG distilled model, 2x speedup
 SPARSE_ATTN=false # Inference with sparse attention (only 720p models are equipped with sparse attention). Please ensure flex-block-attn is installed
-SAGE_ATTN=false # Inference with SageAttention
+SAGE_ATTN=true # Inference with SageAttention
 REWRITE=true # Enable prompt rewriting. Please ensure rewrite vLLM server is deployed and configured.
 OVERLAP_GROUP_OFFLOADING=true # Only valid when group offloading is enabled, significantly increases CPU memory usage but speeds up inference
 ENABLE_CACHE=true # Enable feature cache during inference. Significantly speeds up inference.
 MODEL_PATH=ckpts # Path to pretrained model
 torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
@ -243,6 +248,7 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
  --cfg_distilled $CFG_DISTILLED \
  --sparse_attn $SPARSE_ATTN \
  --use_sageattn $SAGE_ATTN \
  --enable_cache $ENABLE_CACHE \
  --rewrite $REWRITE \
  --output_path $OUTPUT_PATH \
  --overlap_group_offloading $OVERLAP_GROUP_OFFLOADING \
@ -254,7 +260,11 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
 > ```bash
 > export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True,max_split_size_mb:128
 > ```
-
+> 
 > **Tips:** If you have limited CPU memory and encounter OOM during inference, you can try disable overlapped group offloading by adding the following argument:
 > ```bash
 > --overlap_group_offloading false
 > ```
 ### Command Line Arguments
@ -283,6 +293,11 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
 | `--use_sageattn` | bool | No | `false` | Enable SageAttention (use `--use_sageattn` or `--use_sageattn true/1` to enable, `--use_sageattn false/0` to disable) |
 | `--sage_blocks_range` | str | No | `0-53` | SageAttention blocks range (e.g., `0-5` or `0,1,2,3,4,5`) |
 | `--enable_torch_compile` | bool | No | `false` | Enable torch compile for transformer (use `--enable_torch_compile` or `--enable_torch_compile true/1` to enable, `--enable_torch_compile false/0` to disable) |
 | `--enable_cache` | bool | No | `false` | Enable cache for transformer (use `--enable_cache` or `--enable_cache true/1` to enable, `--enable_cache false/0` to disable) |
 | `--cache_start_step` | int | No | `11` | Start step to skip when using cache |
 | `--cache_end_step` | int | No | `45` | End step to skip when using cache |
 | `--total_steps` | int | No | `50` | Total inference steps |
 | `--cache_step_interval` | int | No | `4` | Step interval to skip when using cache |
 **Note:** Use `--nproc_per_node` to specify the number of GPUs. For example, `--nproc_per_node=8` uses 8 GPUs.
@ -300,8 +315,8 @@ The following table provides the optimal inference configurations (CFG scale, em
 | 480p I2V CFG Distilled | 1 | None | 5 | 50 |
 | 720p T2V CFG Distilled | 1 | None | 9 | 50 |
 | 720p I2V CFG Distilled | 1 | None | 7 | 50 |
-| 720p T2V CFG Distilled Sparse | 1 | None | 7 | 50 |
+| 720p T2V CFG Distilled Sparse | 1 | None | 9 | 50 |
-| 720p I2V CFG Distilled Sparse | 1 | None | 9 | 50 |
+| 720p I2V CFG Distilled Sparse | 1 | None | 7 | 50 |
 | 480→720 SR Step Distilled | 1 | None | 2 | 6 |
 | 720→1080 SR Step Distilled | 1 | None | 2 | 8 |
--- a/README_CN.md
+++ b/README_CN.md
@ -26,10 +26,11 @@ HunyuanVideo-1.5作为一款轻量级视频生成模型，仅需83亿参数即
  <a href=https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5 target="_blank"><img src= https://img.shields.io/badge/Page-bb8a2e.svg?logo=github height=22px></a>
  <a href="https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5/blob/report/HunyuanVideo_1_5.pdf" target="_blank"><img src=https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv height=22px></a>
  <a href=https://x.com/TencentHunyuan target="_blank"><img src=https://img.shields.io/badge/Hunyuan-black.svg?logo=x height=22px></a>
-  <a href="https://doc.weixin.qq.com/doc/w3_AXcAcwZSAGgCNACVygLxeQjyn4FYS?scode=AJEAIQdfAAoSfXnTj0AAkA-gaeACk" target="_blank"><img src=https://img.shields.io/badge/📚-PromptHandBook-blue.svg?logo=book height=22px></a> <br/>
+  <a href="https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5/blob/main/assets/HunyuanVideo_1_5_Prompt_Handbook_EN.md" target="_blank"><img src=https://img.shields.io/badge/📚-PromptHandBook-blue.svg?logo=book height=22px></a> <br/>
  <a href="./ComfyUI/README.md" target="_blank"><img src=https://img.shields.io/badge/ComfyUI-blue.svg?logo=book height=22px></a>
  <a href="https://github.com/ModelTC/LightX2V" target="_blank"><img src=https://img.shields.io/badge/LightX2V-yellow.svg?logo=book height=22px></a>
-
+  <a href="https://tusi.cn/models/933574988890423836" target="_blank"><img src=https://img.shields.io/badge/吐司-purple.svg?logo=book height=22px></a>
  <a href="https://tensor.art/models/933574988890423836" target="_blank"><img src=https://img.shields.io/badge/TensorArt-cyan.svg?logo=book height=22px></a>
 </div>
@ -39,7 +40,8 @@ HunyuanVideo-1.5作为一款轻量级视频生成模型，仅需83亿参数即
 </p>
 ## 🔥🔥🔥 最新动态
-👋 2025年11月20日: 我们开源了 HunyuanVideo-1.5的代码和推理权重
+* 🚀 Nov 24, 2025: 我们现已支持 cache 推理，可实现约两倍加速！请 pull 最新代码体验。
 * 👋 Nov 20, 2025: 我们开源了 HunyuanVideo-1.5的代码和推理权重
 ## 🎥 演示视频
 <div align="center">
@ -151,6 +153,7 @@ pip install -i https://mirrors.tencent.com/pypi/simple/ --upgrade tencentcloud-s
  ```bash
  git clone https://github.com/Tencent-Hunyuan/flex-block-attn.git
  cd flex-block-attn
  git submodule update --init --recursive
  python3 setup.py install
  ```
@ -175,7 +178,7 @@ pip install -i https://mirrors.tencent.com/pypi/simple/ --upgrade tencentcloud-s
 提示词增强在我们的模型生成高质量视频方面起着至关重要的作用。通过撰写更长、更详细的提示词，生成的视频质量将得到显著改善。我们鼓励您编写全面且描述性的提示词，以获得最佳的视频质量。我们建议社区伙伴参考我们的官方指南，了解如何撰写有效的提示词。
-**参考：** **[HunyuanVideo-1.5 提示词手册](https://doc.weixin.qq.com/doc/w3_AXcAcwZSAGgCNhei2zzNUS8O4mKop?scode=AJEAIQdfAAoE1dhviFAAkA-gaeACk)**
+**参考：** **[HunyuanVideo-1.5 提示词手册](https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5/blob/main/assets/HunyuanVideo_1_5_Prompt_Handbook_EN.md)**
 ### 自动提示词增强的系统提示词
@ -216,9 +219,10 @@ OUTPUT_PATH=./outputs/output.mp4
 N_INFERENCE_GPU=8 # 并行推理 GPU 数量
 CFG_DISTILLED=true # 使用 CFG 蒸馏模型进行推理，2倍加速
 SPARSE_ATTN=false # 使用稀疏注意力进行推理（仅 720p 模型配备了稀疏注意力）。请确保 flex-block-attn 已安装
-SAGE_ATTN=false # 使用 SageAttention 进行推理
+SAGE_ATTN=true # 使用 SageAttention 进行推理
 REWRITE=true # 启用提示词重写。请确保 rewrite vLLM server 已部署和配置。
 OVERLAP_GROUP_OFFLOADING=true # 仅在组卸载启用时有效，会显著增加 CPU 内存占用，但能够提速
 ENABLE_CACHE=true # 启用特征缓存进行推理。显著提升推理速度
 MODEL_PATH=ckpts # 预训练模型路径
 torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
@ -230,6 +234,7 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
  --cfg_distilled $CFG_DISTILLED \
  --sparse_attn $SPARSE_ATTN \
  --use_sageattn $SAGE_ATTN \
  --enable_cache $ENABLE_CACHE \
  --rewrite $REWRITE \
  --output_path $OUTPUT_PATH \
  --overlap_group_offloading $OVERLAP_GROUP_OFFLOADING \
@ -241,6 +246,11 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
 > ```bash
 > export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True,max_split_size_mb:128
 > ```
 > 
 > **Tips:** 如果您有 CPU 内存有限并且遇到推理时的 OOM 错误，可以尝试禁用重叠组卸载，通过添加以下参数：
 > ```bash
 > --overlap_group_offloading false
 > ```
 ### 命令行参数
@ -268,6 +278,11 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
 | `--use_sageattn` | bool | 否 | `false` | 启用 SageAttention（使用 `--use_sageattn` 或 `--use_sageattn true/1` 来启用，`--use_sageattn false/0` 来禁用） |
 | `--sage_blocks_range` | str | 否 | `0-53` | SageAttention 块范围（例如：`0-5` 或 `0,1,2,3,4,5`） |
 | `--enable_torch_compile` | bool | 否 | `false` | 启用 torch compile 以优化 transformer（使用 `--enable_torch_compile` 或 `--enable_torch_compile true/1` 来启用，`--enable_torch_compile false/0` 来禁用） |
 | `--enable_cache` | bool | 否 | `false` | 启用 transformer 缓存（使用 `--enable_cache` 或 `--enable_cache true/1` 来启用，`--enable_cache false/0` 来禁用） |
 | `--cache_start_step` | int | 否 | `11` | 使用缓存时跳过的起始步数 |
 | `--cache_end_step` | int | 否 | `45` | 使用缓存时跳过的结束步数 |
 | `--total_steps` | int | 否 | `50` | 总推理步数 |
 | `--cache_step_interval` | int | 否 | `4` | 使用缓存时跳过的步数间隔 |
 **注意：** 使用 `--nproc_per_node` 指定使用的 GPU 数量。例如，`--nproc_per_node=8` 表示使用 8 个 GPU。
@ -285,8 +300,8 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
 | 480p I2V cfg 蒸馏 | 1 | None | 5 | 50 |
 | 720p T2V cfg 蒸馏 | 1 | None | 9 | 50 |
 | 720p I2V cfg 蒸馏 | 1 | None | 7 | 50 |
-| 720p T2V cfg 蒸馏稀疏 | 1 | None | 7 | 50 |
+| 720p T2V cfg 蒸馏稀疏 | 1 | None | 9 | 50 |
-| 720p I2V cfg 蒸馏稀疏 | 1 | None | 9 | 50 |
+| 720p I2V cfg 蒸馏稀疏 | 1 | None | 7 | 50 |
 | 480→720 超分 步数蒸馏 | 1 | None | 2 | 6 |
 | 720→1080 超分 步数蒸馏 | 1 | None | 2 | 8 |