update README (batch 1/1)

2026-04-02 13:52:54 +08:00 · 2025-11-24 16:32:02 +00:00
parent 9d296ee5e3
commit 726390b72a
2 changed files with 44 additions and 14 deletions
--- a/README.md
+++ b/README.md
@ -42,9 +42,11 @@ HunyuanVideo-1.5 is a video generation model that delivers top-tier quality with
  <a href=https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5 target="_blank"><img src= https://img.shields.io/badge/Page-bb8a2e.svg?logo=github height=22px></a>
  <a href="https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5/blob/report/HunyuanVideo_1_5.pdf" target="_blank"><img src=https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv height=22px></a>
  <a href=https://x.com/TencentHunyuan target="_blank"><img src=https://img.shields.io/badge/Hunyuan-black.svg?logo=x height=22px></a>
-  <a href="https://doc.weixin.qq.com/doc/w3_AXcAcwZSAGgCNACVygLxeQjyn4FYS?scode=AJEAIQdfAAoSfXnTj0AAkA-gaeACk" target="_blank"><img src=https://img.shields.io/badge/📚-PromptHandBook-blue.svg?logo=book height=22px></a> <br/>
+  <a href="https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5/blob/main/assets/HunyuanVideo_1_5_Prompt_Handbook_EN.md" target="_blank"><img src=https://img.shields.io/badge/📚-PromptHandBook-blue.svg?logo=book height=22px></a> <br/>
  <a href="./ComfyUI/README.md" target="_blank"><img src=https://img.shields.io/badge/ComfyUI-blue.svg?logo=book height=22px></a>
  <a href="https://github.com/ModelTC/LightX2V" target="_blank"><img src=https://img.shields.io/badge/LightX2V-yellow.svg?logo=book height=22px></a>
+  <a href="https://tusi.cn/models/933574988890423836" target="_blank"><img src=https://img.shields.io/badge/吐司-purple.svg?logo=book height=22px></a>
+  <a href="https://tensor.art/models/933574988890423836" target="_blank"><img src=https://img.shields.io/badge/TensorArt-cyan.svg?logo=book height=22px></a>

 </div>

@ -55,7 +57,8 @@ HunyuanVideo-1.5 is a video generation model that delivers top-tier quality with
 </p>

 ## 🔥🔥🔥 News
-👋 Nov 20, 2025: We release the inference code and model weights of HunyuanVideo-1.5.
+* 🚀 Nov 24, 2025: We now support cache inference, achieving approximately 2x speedup! Pull the latest code to try it.
+* 👋 Nov 20, 2025: We release the inference code and model weights of HunyuanVideo-1.5.


 ## 🎥 Demo
@ -168,6 +171,7 @@ pip install -i https://mirrors.tencent.com/pypi/simple/ --upgrade tencentcloud-s
  ```bash
  git clone https://github.com/Tencent-Hunyuan/flex-block-attn.git
  cd flex-block-attn
+  git submodule update --init --recursive
  python3 setup.py install
  ```

@ -191,7 +195,7 @@ Download the pretrained models before generating videos. Detailed instructions a
 ### Prompt Writing Handbook
 Prompt enhancement plays a crucial role in enabling our model to generate high-quality videos. By writing longer and more detailed prompts, the generated video will be significantly improved. We encourage you to craft comprehensive and descriptive prompts to achieve the best possible video quality. we recommend community partners consulting our official guide on how to write effective prompts. 

-**Reference:** **[HunyuanVideo-1.5 Prompt Handbook](https://doc.weixin.qq.com/doc/w3_AXcAcwZSAGgCNACVygLxeQjyn4FYS?scode=AJEAIQdfAAoSfXnTj0AAkA-gaeACk)**
+**Reference:** **[HunyuanVideo-1.5 Prompt Handbook](https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5/blob/main/assets/HunyuanVideo_1_5_Prompt_Handbook_EN.md)**

 ### System Prompts for Automatic Prompt Enhancement
 For users seeking to optimize prompts for other large models, it is recommended to consult the definition of `t2v_rewrite_system_prompt` in the file `hyvideo/utils/rewrite/t2v_prompt.py` to guide text-to-video rewriting. Similarly, for image-to-video rewriting, refer to the definition of `i2v_rewrite_system_prompt` in `hyvideo/utils/rewrite/i2v_prompt.py`.
@ -229,9 +233,10 @@ OUTPUT_PATH=./outputs/output.mp4
 N_INFERENCE_GPU=8 # Parallel inference GPU count
 CFG_DISTILLED=true # Inference with CFG distilled model, 2x speedup
 SPARSE_ATTN=false # Inference with sparse attention (only 720p models are equipped with sparse attention). Please ensure flex-block-attn is installed
-SAGE_ATTN=false # Inference with SageAttention
+SAGE_ATTN=true # Inference with SageAttention
 REWRITE=true # Enable prompt rewriting. Please ensure rewrite vLLM server is deployed and configured.
 OVERLAP_GROUP_OFFLOADING=true # Only valid when group offloading is enabled, significantly increases CPU memory usage but speeds up inference
+ENABLE_CACHE=true # Enable feature cache during inference. Significantly speeds up inference.
 MODEL_PATH=ckpts # Path to pretrained model

 torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
@ -243,6 +248,7 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
  --cfg_distilled $CFG_DISTILLED \
  --sparse_attn $SPARSE_ATTN \
  --use_sageattn $SAGE_ATTN \
+  --enable_cache $ENABLE_CACHE \
  --rewrite $REWRITE \
  --output_path $OUTPUT_PATH \
  --overlap_group_offloading $OVERLAP_GROUP_OFFLOADING \
@ -254,7 +260,11 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
 > ```bash
 > export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True,max_split_size_mb:128
 > ```
-
+> 
+> **Tips:** If you have limited CPU memory and encounter OOM during inference, you can try disable overlapped group offloading by adding the following argument:
+> ```bash
+> --overlap_group_offloading false
+> ```


 ### Command Line Arguments
@ -283,6 +293,11 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
 | `--use_sageattn` | bool | No | `false` | Enable SageAttention (use `--use_sageattn` or `--use_sageattn true/1` to enable, `--use_sageattn false/0` to disable) |
 | `--sage_blocks_range` | str | No | `0-53` | SageAttention blocks range (e.g., `0-5` or `0,1,2,3,4,5`) |
 | `--enable_torch_compile` | bool | No | `false` | Enable torch compile for transformer (use `--enable_torch_compile` or `--enable_torch_compile true/1` to enable, `--enable_torch_compile false/0` to disable) |
+| `--enable_cache` | bool | No | `false` | Enable cache for transformer (use `--enable_cache` or `--enable_cache true/1` to enable, `--enable_cache false/0` to disable) |
+| `--cache_start_step` | int | No | `11` | Start step to skip when using cache |
+| `--cache_end_step` | int | No | `45` | End step to skip when using cache |
+| `--total_steps` | int | No | `50` | Total inference steps |
+| `--cache_step_interval` | int | No | `4` | Step interval to skip when using cache |

 **Note:** Use `--nproc_per_node` to specify the number of GPUs. For example, `--nproc_per_node=8` uses 8 GPUs.

@ -300,8 +315,8 @@ The following table provides the optimal inference configurations (CFG scale, em
 | 480p I2V CFG Distilled | 1 | None | 5 | 50 |
 | 720p T2V CFG Distilled | 1 | None | 9 | 50 |
 | 720p I2V CFG Distilled | 1 | None | 7 | 50 |
-| 720p T2V CFG Distilled Sparse | 1 | None | 7 | 50 |
-| 720p I2V CFG Distilled Sparse | 1 | None | 9 | 50 |
+| 720p T2V CFG Distilled Sparse | 1 | None | 9 | 50 |
+| 720p I2V CFG Distilled Sparse | 1 | None | 7 | 50 |
 | 480→720 SR Step Distilled | 1 | None | 2 | 6 |
 | 720→1080 SR Step Distilled | 1 | None | 2 | 8 |