udpate readme (batch 1/1)

This commit is contained in:
Cherrytest
2025-11-21 16:25:29 +00:00
parent d719c061ba
commit ec9d48dbd9
9 changed files with 203 additions and 55 deletions

3
.gitattributes vendored
View File

@ -58,3 +58,6 @@ vae/diffusion_pytorch_model.safetensors filter=lfs diff=lfs merge=lfs -text
transformer/480p_t2v/diffusion_pytorch_model.safetensors filter=lfs diff=lfs merge=lfs -text
transformer/720p_sr_distilled/diffusion_pytorch_model.safetensors filter=lfs diff=lfs merge=lfs -text
transformer/480p_t2v_distilled/diffusion_pytorch_model.safetensors filter=lfs diff=lfs merge=lfs -text
upsampler/1080p_sr_distilled/diffusion_pytorch_model.safetensors filter=lfs diff=lfs merge=lfs -text
upsampler/720p_sr_distilled/diffusion_pytorch_model.safetensors filter=lfs diff=lfs merge=lfs -text

106
README.md
View File

@ -1,3 +1,19 @@
---
library_name: HunyuanVideo-1.5
license: other
license_name: tencent-hunyuan-community
license_link: https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5/blob/master/LICENSE
language:
- en
- zh
tags:
- text-to-video
- image-to-video
pipeline_tag: text-to-video
extra_gated_eu_disallowed: true
---
[中文文档](./README_CN.md)
# HunyuanVideo-1.5
@ -26,6 +42,10 @@ HunyuanVideo-1.5 is a video generation model that delivers top-tier quality with
<a href=https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5 target="_blank"><img src= https://img.shields.io/badge/Page-bb8a2e.svg?logo=github height=22px></a>
<a href="https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5/blob/main/assets/HunyuanVideo_1_5.pdf" target="_blank"><img src=https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv height=22px></a>
<a href=https://x.com/TencentHunyuan target="_blank"><img src=https://img.shields.io/badge/Hunyuan-black.svg?logo=x height=22px></a>
<a href="https://doc.weixin.qq.com/doc/w3_AXcAcwZSAGgCNACVygLxeQjyn4FYS?scode=AJEAIQdfAAoSfXnTj0AAkA-gaeACk" target="_blank"><img src=https://img.shields.io/badge/📚-PromptHandBook-blue.svg?logo=book height=22px></a> <br/>
<a href="https://github.com/comfyanonymous/ComfyUI" target="_blank"><img src=https://img.shields.io/badge/ComfyUI-blue.svg?logo=book height=22px></a>
<a href="https://github.com/ModelTC/LightX2V" target="_blank"><img src=https://img.shields.io/badge/LightX2V-yellow.svg?logo=book height=22px></a>
</div>
@ -35,17 +55,17 @@ HunyuanVideo-1.5 is a video generation model that delivers top-tier quality with
</p>
## 🔥🔥🔥 News
👋 Nov 20, 2025: We release the inference code and model weights of HunyuanVideo.
👋 Nov 20, 2025: We release the inference code and model weights of HunyuanVideo-1.5.
## 🎥 Demo
<div align="center">
<video controls src="https://github.com/user-attachments/assets/d45ec78e-ea40-47f1-8d4d-f4d9a0682e2d" width="60%"> </video>
<video src="https://github.com/user-attachments/assets/d45ec78e-ea40-47f1-8d4d-f4d9a0682e2d" width="60%"> </video>
</div>
## 🧩 Community Contributions
If you develop/use HunyuanVideo in your projects, welcome to let us know.
If you develop/use HunyuanVideo-1.5 in your projects, welcome to let us know.
- **ComfyUI** - [ComfyUI](https://github.com/comfyanonymous/ComfyUI): A powerful and modular diffusion model GUI with a graph/nodes interface. ComfyUI supports HunyuanVideo-1.5 with various engineering optimizations for fast inference.
@ -54,6 +74,8 @@ If you develop/use HunyuanVideo in your projects, welcome to let us know.
## 📑 Open-source Plan
- HunyuanVideo-1.5 (T2V/I2V)
- [x] Inference Code and checkpoints
- [x] ComfyUI Support
- [x] LightX2V Support
- [ ] Diffusers Support
- [ ] Release all model weights (Sparse attention, distill model, and SR models)
@ -82,7 +104,7 @@ If you develop/use HunyuanVideo in your projects, welcome to let us know.
## 📖 Introduction
We present HunyuanVideo 1.5, a lightweight yet powerful video generation model that achieves state-of-the-art visual quality and motion coherence with only 8.3 billion parameters, enabling efficient inference on consumer-grade GPUs. This achievement is built upon several key components, including meticulous data curation, an advanced DiT architecture with selective and sliding tile attention(SSTA), enhanced bilingual understanding through glyph-aware text encoding , progressive pre-training and post-training, and an efficient video super-resolution network. Leveraging these designs, we developed a unified framework capable of high-quality text-to-video and image-to-video generation across multiple durations and resolutions. Extensive experiments demonstrate that this compact and proficient model establishes a new state-of-the-art among open-source models. By releasing the code and weights of HunyuanVideo 1.5, we provide the community with a high-performance foundation that significantly lowers the cost of video creation and research, making advanced video generation more accessible to all.
We present HunyuanVideo-1.5, a lightweight yet powerful video generation model that achieves state-of-the-art visual quality and motion coherence with only 8.3 billion parameters, enabling efficient inference on consumer-grade GPUs. This achievement is built upon several key components, including meticulous data curation, an advanced DiT architecture with selective and sliding tile attention(SSTA), enhanced bilingual understanding through glyph-aware text encoding , progressive pre-training and post-training, and an efficient video super-resolution network. Leveraging these designs, we developed a unified framework capable of high-quality text-to-video and image-to-video generation across multiple durations and resolutions. Extensive experiments demonstrate that this compact and proficient model establishes a new state-of-the-art among open-source models. By releasing the code and weights of HunyuanVideo-1.5, we provide the community with a high-performance foundation that significantly lowers the cost of video creation and research, making advanced video generation more accessible to all.
## ✨ Key Features
@ -134,11 +156,11 @@ pip install -i https://mirrors.tencent.com/pypi/simple/ --upgrade tencentcloud-s
### Step 3: Install Attention Libraries
* Flash Attention
* Flash Attention:
It's recommended to install Flash Attention for faster inference and reduced GPU memory consumption.
Detailed installation instructions are available at [Flash Attention](https://github.com/Dao-AILab/flash-attention).
* Flex-Block-Attention
* Flex-Block-Attention:
flex-block-attn is only required for sparse attention to achieve faster inference and can be installed by the following command:
```bash
git clone https://github.com/Tencent-Hunyuan/flex-block-attn.git
@ -146,7 +168,7 @@ Detailed installation instructions are available at [Flash Attention](https://gi
python3 setup.py install
```
* SageAttention
* SageAttention:
```bash
git clone https://github.com/cooper1637/SageAttention.git
cd SageAttention
@ -156,13 +178,15 @@ Detailed installation instructions are available at [Flash Attention](https://gi
## 🧱 Download Pretrained Models
> 💡 Distillation models and sparse attention models are still coming soon. Please stay tuned for the latest updates on the Hugging Face Model Card.
Download the pretrained models before generating videos. Detailed instructions are available at [checkpoints-download.md](checkpoints-download.md).
## 📝 Prompt Guide
### Prompt Writing Handbook
Prompt enhancement plays a crucial role in enabling our model to generate high-quality videos. By writing longer and more detailed prompts, the generated video will be significantly improved. We encourage you to craft comprehensive and descriptive prompts to achieve the best possible video quality. we recommend community partners consulting our official guide on how to write effective prompts.
**Reference:** **[HunyuanVideo 1.5 Prompt Handbook](https://doc.weixin.qq.com/doc/w3_AXcAcwZSAGgCNACVygLxeQjyn4FYS?scode=AJEAIQdfAAoSfXnTj0AAkA-gaeACk)**
**Reference:** **[HunyuanVideo-1.5 Prompt Handbook](https://doc.weixin.qq.com/doc/w3_AXcAcwZSAGgCNACVygLxeQjyn4FYS?scode=AJEAIQdfAAoSfXnTj0AAkA-gaeACk)**
### System Prompts for Automatic Prompt Enhancement
For users seeking to optimize prompts for other large models, it is recommended to consult the definition of `t2v_rewrite_system_prompt` in the file `hyvideo/utils/rewrite/t2v_prompt.py` to guide text-to-video rewriting. Similarly, for image-to-video rewriting, refer to the definition of `i2v_rewrite_system_prompt` in `hyvideo/utils/rewrite/i2v_prompt.py`.
@ -178,7 +202,7 @@ For models with a vLLM API, note that T2V (text-to-video) and I2V (image-to-vide
- I2V: use [Qwen3-VL-235B-A22B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct), configure `I2V_REWRITE_BASE_URL` and `I2V_REWRITE_MODEL_NAME`
> You may set the above model names to any other vLLM-compatible models you have deployed (including HuggingFace models).
> Rewriting is enabled by default; to disable it explicitly, use the `--disable_rewrite` flag. If no vLLM endpoint is configured, the pipeline runs without remote rewriting.
> Rewriting is enabled by default (`--rewrite` defaults to `true`); to disable it explicitly, use `--rewrite false` or `--rewrite 0`. If no vLLM endpoint is configured, the pipeline runs without remote rewriting.
Example: Generate a video (works for both T2V and I2V; set `IMAGE_PATH=none` for T2V or provide an image path for I2V)
@ -188,7 +212,7 @@ export T2V_REWRITE_MODEL_NAME="<your_model_name>"
export I2V_REWRITE_BASE_URL="<your_vllm_server_base_url>"
export I2V_REWRITE_MODEL_NAME="<your_model_name>"
PROMPT="A close-up shot captures a scene on a polished, light-colored granite kitchen counter, illuminated by soft natural light from an unseen window. Initially, the frame focuses on a tall, clear glass filled with golden, translucent apple juice standing next to a single, shiny red apple with a green leaf still attached to its stem. The camera moves horizontally to the right. As the shot progresses, a white ceramic plate smoothly enters the frame, revealing a fresh arrangement of about seven or eight more apples, a mix of vibrant reds and greens, piled neatly upon it. A shallow depth of field keeps the focus sharply on the fruit and glass, while the kitchen backsplash in the background remains softly blurred. The scene is in a realistic style."
PROMPT='A girl holding a paper with words "Hello, world!"'
IMAGE_PATH=./data/reference_image.png # Optional, 'none' or <image path>
SEED=1
@ -202,6 +226,7 @@ CFG_DISTILLED=true # Inference with CFG distilled model, 2x speedup
SPARSE_ATTN=true # Inference with sparse attention
SAGE_ATTN=false # Inference with SageAttention
MODEL_PATH=ckpts # Path to pretrained model
REWRITE=true # Enable prompt rewriting
torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
--prompt "$PROMPT" \
@ -212,6 +237,7 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
--cfg_distilled $CFG_DISTILLED \
--sparse_attn $SPARSE_ATTN \
--use_sageattn $SAGE_ATTN \
--rewrite $REWRITE \
--output_path $OUTPUT_PATH \
--save_pre_sr_video \
--model_path $MODEL_PATH
@ -249,32 +275,33 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
## 🧱 Models Cards
|ModelName| Download |
|-|---------------------------|
|HunyuanVideo 1.5-480P-T2V|[480P-T2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_t2v) |
|HunyuanVideo 1.5-480p-I2V |[480p-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v) |
|HunyuanVideo 1.5-480p-T2V-distill | [480p-T2V-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_t2v_distilled) |
|HunyuanVideo 1.5-480p-I2V-distill |[480p-I2V-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v_distilled) |
|HunyuanVideo 1.5-720P-T2V|[720P-T2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_t2v) |
|HunyuanVideo 1.5-720P-I2V |[720P-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_i2v) |
|HunyuanVideo 1.5-720P-T2V-distiill| Comming soon |
|HunyuanVideo 1.5-720P-I2V-distiill |[720P-I2V-distiill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_i2v_distilled) |
|HunyuanVideo 1.5-720P-T2V-sparse-distiill| Comming soon |
|HunyuanVideo 1.5-720P-I2V-sparse-distiill |[720P-I2V-sparse-distiill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_i2v_distilled_sparse) |
|HunyuanVideo 1.5-720p-sr |[720p-sr](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_sr_distilled) |
|HunyuanVideo 1.5-1080p-sr |[1080p-sr](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/1080p_sr_distilled) |
|HunyuanVideo-1.5-480P-T2V|[480P-T2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_t2v) |
|HunyuanVideo-1.5-480P-I2V |[480P-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v) |
|HunyuanVideo-1.5-480P-T2V-distill | [480P-T2V-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_t2v_distilled) |
|HunyuanVideo-1.5-480P-I2V-distill |[480P-I2V-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v_distilled) |
|HunyuanVideo-1.5-720P-T2V|[720P-T2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_t2v) |
|HunyuanVideo-1.5-720P-I2V |[720P-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_i2v) |
|HunyuanVideo-1.5-720P-T2V-distiill| Comming soon |
|HunyuanVideo-1.5-720P-I2V-distiill |[720P-I2V-distiill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_i2v_distilled) |
|HunyuanVideo-1.5-720P-T2V-sparse-distiill| Comming soon |
|HunyuanVideo-1.5-720P-I2V-sparse-distiill |[720P-I2V-sparse-distiill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_i2v_distilled_sparse) |
|HunyuanVideo-1.5-720P-sr |[720P-sr](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_sr_distilled) |
|HunyuanVideo-1.5-1080P-sr |[1080P-sr](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/1080p_sr_distilled) |
## 🎬 More Examples
|Features|Demo1|Demo2|
|------|------|------|
|Strong Instruction Following|<video controls src="https://github.com/user-attachments/assets/fdc3c27b-69f5-46a1-b707-0b57510fa32f" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```一名哀伤的黑发中国女子凝望天空,复古胶片风格烘托出怀旧戏剧氛围``` </details> <details><summary>📋 Show rewrite prompt</summary> ```俯视角度一位有着深色略带凌乱的长卷发的年轻中国女性佩戴着闪耀的珍珠项链和圆形金色耳环她凌乱的头发被风吹散她微微抬头望向天空神情十分哀伤眼中含着泪水。嘴唇涂着红色口红。背景是带有华丽红色花纹的图案。画面呈现复古电影风格色调低饱和带着轻微柔焦烘托情绪氛围质感仿佛20世纪90年代的经典胶片风格营造出怀旧且富有戏剧性的感觉。``` </details>|<video controls src="https://github.com/user-attachments/assets/3fcb42cc-cdd3-4651-86a6-645a858561c4" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```建筑蓝图上的线条化为实体,瞬间生长出一个完整的复古工业风办公空间。``` </details> <details><summary>📋 Show rewrite prompt</summary> ```一座空旷的现代阁楼里有一张铺展在地板中央的建筑蓝图。忽然间图纸上的线条泛起微光仿佛被某种无形的力量唤醒。紧接着那些发光的线条开始向上延伸从平面中挣脱勾勒出立体的轮廓——就像在空中进行一场无声的3D打印。随后奇迹在加速发生极简的橡木办公桌、优雅的伊姆斯风格皮质椅、高挑的工业风金属书架还有几盏爱迪生灯泡以光纹为骨架迅速“生长”出来。转瞬间线条被真实的材质填充——木材的温润、皮革的质感、金属的冷静都在眨眼间完整呈现。最终所有家具稳固落地蓝图的光芒悄然褪去。一个完整的办公空间就这样从二维的图纸中诞生。``` </details>|
|Smooth Motion Generation|<video controls src="https://github.com/user-attachments/assets/21f9da05-33d0-4521-b188-ea009e7fdd3f" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```A cosmic loaf of bread, with a volcanic black crust, is precisely sliced open to reveal a swirling nebula interior.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```Cinematic 8K footage, with a stark, moody aesthetic. Under a dramatic top-down spotlight, a loaf of what appears to be bread rests on a slab of polished marble, which is flecked with silver that glitters like a starfield. The loaf's crust is a deep, matte black, cracked like cooled volcanic rock. A sleek, modern santoku knife, its sharp edge gleaming under the single light source, begins a series of clean, rhythmic cuts. With each precise, repetitive slice that falls away, the loafs impossible interior is revealed: not dough, but a compressed, swirling nebula of deep purples and blues, alive with pinpricks of glittering light. As the knife continues its precise motion, a fine, shimmering dust of cosmic particles settles on the marble. The extreme macro view focuses on the mesmerizing contrast between the blades cold steel and the ethereal, galaxy-filled substance of the bread. This is hyper-realistic macro videography at its finest.``` </details>|<video controls src="https://github.com/user-attachments/assets/49057fe8-a102-4fd7-bd92-e9561abb9f45" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```A figure skater performs a rapid, graceful Biellmann spin, captured from all angles.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```The video captures a figure skater performing a Biellmann spin on ice. The subject is a female skater in a glittering costume. Initially, she spins on one leg. Then, she reaches back and pulls her free leg up. Next, she spins rapidly, becoming a blur of motion, with ice shavings spraying from her skate blade. The background is an ice rink with blurred advertising boards. The camera circles around the subject to capture the spin from all angles. The lighting is spotlit, creating lens flares and sparkles on her costume. The overall video presents a graceful artistic sports style.``` </details>|
|Cinematic Aesthetics|<video controls src="https://github.com/user-attachments/assets/4098cf72-357d-4b81-97df-6752064ce0c3" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```固定镜头,焦点在图片里的挂钟上镜头轻微摇晃营造手持摄影感wjw,filmphotos,Film Grain,Reversal film photographyWong Kar-wai movies,cinematic photography, HK film style,neon lighting, in the style of Wong Kar Wai film``` </details> <details><summary>📋 Show rewrite prompt</summary> ```Handheld lens shooting, the camera focuses on the wall clock hanging on the green-toned wall, shaking slightly. The second hand sweeps steadily across the clock face, and the shadow of the clock cast on the wall shifts subtly with the movement of the lens.``` </details>|<video controls src="https://github.com/user-attachments/assets/2b4575e5-79f1-4011-bed0-e8380198f7c9" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```The leaves of calamus shine in the sunlight, dotted with dewdrops that trickle down to the ground with the breeze.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```A macro shot focuses on long, slender calamus leaves, rendered in a cinematic photography realistic style. The main leaf, a vibrant, deep green, is positioned diagonally across the frame. Its surface is covered in tiny, glistening spherical dewdrops that catch and refract the bright morning sunlight, creating sparkling highlights. Initially, a larger, perfectly round dewdrop clings to the upper section of the leaf, its surface tension holding it in place. Then, as the leaf sways almost imperceptibly, the dewdrop begins to slowly dislodge. Next, it starts to trickle down the central vein of the leaf, its shape elongating slightly as it moves, leaving a subtle, glistening wet trail in its path. Finally, it reaches the pointed tip of the leaf, hangs for a brief moment, and falls out of the bottom of the frame. In the background, other leaves and blades of grass are softly blurred, creating a beautiful bokeh effect with soft, out-of-focus circles of light. The environment is bathed in the warm, golden glow of early morning sunlight, which streams in from behind the leaves, backlighting them and causing their wet edges to shine brilliantly. The overall impression is one of serene, natural beauty, captured in a highly realistic and detailed manner. This is a macro shot. The camera tilts down very slowly, following the path of the main dewdrop as it travels down the leaf. The lighting is soft and natural, with strong backlighting to create a radiant, glowing effect on the dewdrops and leaf edges, characteristic of professional nature photography. The atmosphere is peaceful and serene. The overall video presents a cinematic photography realistic style.``` </details>|
|Text Rendering|<video controls src="https://github.com/user-attachments/assets/7c964fc5-c27e-4bd0-bf3f-eb8fca2caef6" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```赛博朋克风格的夜晚街角,一个巨大的招牌上, “Hunyuan Video 1.5”的霓虹灯管轮廓已经安装好。镜头推进霓虹灯从“H”开始伴随着滋滋的电流声每个字母依次亮起粉紫色的光芒直到全部点亮照亮了潮湿的街道。赛博朋克城市美学``` </details> <details><summary>📋 Show rewrite prompt</summary> ```On a wet street corner in a cyberpunk city at night, a large neon sign reading "Hunyuan Video 1.5" lights up sequentially, illuminating the dark, rainy environment with a pinkish-purple glow. he scene is a dark, rain-slicked street corner in a futuristic, cinematic cyberpunk city. Mounted on the metallic, weathered facade of a building is a massive, unlit neon sign. The sign's glass tube framework clearly spells out the words "Hunyuan Video 1.5". Initially, the street is dimly lit, with ambient light from distant skyscrapers creating shimmering reflections on the wet asphalt below. Then, the camera zooms in slowly toward the sign. As it moves, a low electrical sizzling sound begins. In the background, the dense urban landscape of the cyberpunk metropolis is visible through a light atmospheric haze, with towering structures adorned with their own flickering advertisements. A complex web of cables and pipes crisscrosses between the buildings. The shot is at a low angle, looking up at the sign to emphasize its grand scale. The lighting is high-contrast and dramatic, dominated by the neon glow which creates sharp, specular reflections and deep shadows. The atmosphere is moody and tech-noir. The overall video presents a cinematic photography realistic style.,``` </details>|<video controls src="https://github.com/user-attachments/assets/94ce62d9-5788-4912-8e89-b7dc84d7bdc4" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```黑色背景上展示着艺术字体"Hunyuan Video 1.5",每个字母都由不同的流体构成,持续缓慢流动。多种不同质地、不互溶的彩色液体(如金属、牛奶、透明凝胶)在无重力环境中漂浮、碰撞``` </details> <details><summary>📋 Show rewrite prompt</summary> ```The artistic words "Hunyuan Video 1.5" are rendered in the center of the screen, with each character composed of a unique, slowly moving fluid, set against a deep black background, while colorful, immiscible liquid blobs float and collide around them in a zero-gravity environment. The main subject is the text "Hunyuan Video 1.5". The characters for "Hunyuan" are filled with a lustrous, molten gold liquid that swirls slowly. The letters for "Video" are composed of a creamy, opaque white fluid resembling milk, with gentle currents visible beneath its surface. The numbers "1.5" are made from a viscous, transparent blue gel that subtly undulates. Each fluid moves independently within the confines of its character's shape, creating a mesmerizing internal motion. This high-quality 3D CGI animation presents the fluids with photorealistic textures. In the surrounding space, several immiscible liquid blobs drift in zero gravity. A large, spherical blob of pearlescent liquid slowly floats from the upper left. A smaller, amorphous blob of shimmering, metallic silver drifts from the lower right, and a translucent, pink gelatinous mass wobbles nearby. Initially, these blobs drift aimlessly. Then, the silver blob slowly collides with the larger pearlescent one. As they make contact, their surfaces deform and ripple dynamically, but the liquids do not mix, pushing against each other before gently bouncing off and continuing their slow, separate paths in the pristine black void. The shot is at an eye-level angle, presenting a front view of the text. The camera remains static, ensuring the entire text "Hunyuan Video 1.5" is fully visible throughout the shot. The scene is lit by a soft, diffused light that highlights the brilliant reflections on the metallic fluids and the inner glow of the translucent gels, enhancing the high-quality 3D CGI animation. The atmosphere is quiet, abstract, and mesmerizing. The overall video has the polished look of a high-quality 3D CGI animation with a focus on abstract fluid dynamics.``` </details>|
|Physics Compliance|<video controls src="https://github.com/user-attachments/assets/07fa4dcd-0bd1-4935-bb89-323428cce6fc" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```The wind blows through the shabby bookshelf, and the pages flutter on it. ``` </details> <details><summary>📋 Show rewrite prompt</summary> ```In a dimly lit, dusty room, a gentle wind causes the pages of old books on a shabby wooden bookshelf to flutter. The bookshelf, made of dark, weathered wood, shows signs of age with peeling varnish, scratches, and a fine layer of settled dust on its surfaces. Several old books with faded, worn covers are arranged on the shelves; some stand upright while others lie on their sides. Initially, the scene is quiet. Then, a soft breeze enters the frame from the left, disturbing the dust on the shelves. Next, the yellowed, brittle pages of an open book lying flat begin to lift and ripple delicately. As the breeze continues, the pages of other books also start to flutter, some turning over slowly and gracefully, revealing aged text and faint illustrations within. In the background, the wall has faded, peeling wallpaper, and the overall atmosphere is one of quiet neglect and the passage of time. The shot is at an eye-level angle with the main subject. The camera pans to the left slowly. Soft, diffused sunlight filters through a dusty, off-camera window, creating distinct beams of light that cut through the dimness. This lighting highlights the texture of the old wood and the floating dust particles in the air, enhancing the photorealistic detail of the scene. The mood is melancholic and peaceful. The overall video presents a cinematic photography realistic style.``` </details>|<video controls src="https://github.com/user-attachments/assets/81065925-c008-421b-8cf0-b3cbf1e77eac" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```An intact soda can is slowly crushed by a hand.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```In a medium close-up, a hand slowly crushes an intact red and white soda can on a wooden table. A male hand with visible, realistic skin texture is wrapped firmly around the middle of an intact, pristine red and white aluminum soda can. The can, covered in glistening condensation droplets, rests on a dark, polished wooden surface. The cinematic realism captures every minute detail of the scene. Initially, the hand's grip is steady, with the can's cylindrical shape perfectly preserved. Then, the fingers begin to tighten slowly, the knuckles whitening slightly from the exertion. Next, the smooth aluminum surface starts to buckle under the controlled pressure, a sharp crease forming vertically down its side as the metallic sheen distorts. As the hand continues its deliberate squeeze, the can collapses inward progressively, the vibrant red paint wrinkling as the metal structure crumples. Finally, the can is left significantly crushed, its form now an irregular, crumpled shape held tightly in the fist. The scene takes place on a dark, polished wooden tabletop that catches soft, diffuse reflections. The grain of the wood is faintly discernible, adding a layer of texture to the foreground. The background is completely out of focus, rendered as a soft, dark, and non-descript blur, which isolates the main action and enhances the photorealistic quality of the shot. The shot is a medium close-up, presented in a cinematic photography realistic style. The camera remains static at a slightly high angle, looking down to provide a clear and unobstructed view of the can's deformation. Soft side lighting creates high contrast, sculpting the muscles and tendons of the hand while casting specular highlights on the metallic can and the water droplets. The atmosphere is focused and intense. The overall video presents a cinematic photography realistic style.``` </details>|
|Camera Movement|<video controls src="https://github.com/user-attachments/assets/6deacbfe-4cca-48d7-a2be-cb638a3e01cb" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```圣诞节的家中,小女孩靠着妈妈听妈妈读书,背景是下着雪的窗外,镜头缓慢下移,一只可爱的长毛小白猫戴着圣诞帽趴在温暖的地摊上``` </details> <details><summary>📋 Show rewrite prompt</summary> ```In a cozy home on Christmas, a young girl leans against her mother as they read a book, and the camera moves down to reveal a fluffy white cat in a Santa hat resting on a warm rug. In a warmly lit living room on a snowy Christmas evening, a young mother and her little daughter are sitting together on a comfortable sofa. The mother, with a gentle expression and wearing a cream-colored knitted sweater, holds an open storybook with colorful illustrations. Her daughter, a small girl with brown hair in pigtails and a red pajama set, leans her head affectionately on her mother's shoulder, her eyes fixed on the book. On the floor below them, a fluffy, long-haired white cat is curled up on a plush, beige wool rug. The cat wears a tiny red and white Santa hat perched between its ears. Initially, the shot focuses on the mother and daughter, capturing their quiet, shared moment. The mothers finger gently rests on the page of the book. Then, the camera slowly moves downward, gliding past the book and their laps. Finally, the camera settles at a low angle, bringing the adorable white cat into sharp focus as the primary subject. The cat's chest gently rises and falls with each breath, its eyes peacefully closed. Through a large window in the background, large, soft snowflakes can be seen falling silently against the dark blue twilight sky, creating a peaceful and serene backdrop. Faint, out-of-focus golden Christmas lights twinkle in the corner of the room, adding to the warm, festive atmosphere. The scene is imbued with a sense of comfort and holiday warmth, creating a beautiful cinematic photography realistic image. The camera slowly moves downward. The shot uses soft, warm interior lighting that casts gentle shadows, creating a high-contrast, cinematic look. A shallow depth of field keeps the focus on the subjects while beautifully blurring the background elements. The mood is heartwarming, peaceful, and festive. The overall video presents a cinematic photography realistic style.``` </details>|<video controls src="https://github.com/user-attachments/assets/8e72ed0f-f8ac-445b-97e5-eb4b16fbc121" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```The hiker begins walking forward along the trail, causing the water bottle to swing rhythmically with each step. The camera gradually pulls back and rises to reveal a vast desert landscape stretching out ahead.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```The hiker begins walking forward along the trail, causing the water bottle to swing rhythmically with each step. The camera gradually pulls back and rises to reveal a vast desert landscape stretching out ahead, while the sun position shifts from afternoon to dusk, casting increasingly longer shadows across the terrain as the figure becomes smaller in the frame.``` </details>|
|Multi-Style Support|<video controls src="https://github.com/user-attachments/assets/65b2c5a5-e6ba-43be-9462-a98b03b675f1" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```Have the cake man begin to take chunks out of himself and eat it.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```The cake man sits on the chair, with his hands resting on his knees. Then, he slowly raises his right hand and breaks off a piece of cake from his left shoulder. Next, he brings the piece of cake to his mouth and begins to chew. At the same time, his eyes widen slightly, and his mouth parts gently. After that, he raises his right hand again, breaks off another piece of cake from his right arm, and repeats the action of bringing it to his mouth to chew.``` </details>|<video controls src="https://github.com/user-attachments/assets/de5f7480-b79c-4fc1-b345-c5880a3b5f9e" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```A little girl, carrying a colorful handbag, skips through the garden. The video uses claymation style.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```A little girl with a colorful handbag skips through a whimsical claymation garden. In a vibrant garden constructed entirely from clay, a young girl, meticulously crafted in a claymation style, skips joyfully. She has chunky, sculpted yellow clay hair tied in pigtails that bounce with a slight stiffness, simple black button eyes, and a wide, permanently etched smile. She wears a simple pink clay dress with a white collar. In her left hand, she carries a small handbag molded from bright red and blue clay, which swings in a slightly jerky arc as she moves. Initially, the girl lifts her right leg high, her body momentarily suspended in a classic stop-motion pose. Then, she hops forward, landing lightly as her left leg swings through for the next skip. Her arms move in an exaggerated, back-and-forth rhythm, characteristic of stop-motion animation. Her movements are intentionally not perfectly fluid, highlighting the frame-by-frame nature of the claymation technique. The garden around her is a whimsical, textured world. In the foreground and mid-ground, oversized flowers with swirled purple and orange petals stand on thick green stems. The ground is a textured mat of green clay, showing subtle fingerprints and tool marks that add to the handmade charm. In the background, a pale blue clay backdrop features a simplified, smiling sun molded from yellow clay. The shot is at an eye-level angle with the main subject. The camera follows the subject, moving smoothly to the right to keep her in the frame. The lighting is bright and even, casting soft shadows that emphasize the rounded, three-dimensional forms of the clay models. The overall video presents a charming and detailed claymation style.``` </details>|
|High Image-Video Consistency|<img src="https://github.com/user-attachments/assets/3bc8e55d-c211-454e-8067-128c0e215eb6"> <video controls src="https://github.com/user-attachments/assets/3e6b7ee9-ec66-4e46-a446-801b1c1a1c81" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```女孩放下书,站起身,转身向屋内走去。镜头拉远。``` </details> <details><summary>📋 Show rewrite prompt</summary> ```女孩合上手中的书,将书放在身侧的窗台上。随后,她缓缓站起身,转身向屋内走去,身影逐渐没入门后的阴影中。镜头缓缓拉远,露出更多被绿植覆盖的屋檐和墙体。``` </details>|<img src="https://github.com/user-attachments/assets/7657ce60-90b5-4fdc-b713-0eaa55829b09"> <video controls src="https://github.com/user-attachments/assets/9ca24021-2353-40d5-8a4d-0f8e67d51826" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```女人手上的鸟亲了女人一口``` </details> <details><summary>📋 Show rewrite prompt</summary> ```女人手臂上的白色鹦鹉缓缓转过头,将喙轻轻触碰女人的脸颊,随后收回头部。女人嘴角微微上扬,目光温柔地注视着鹦鹉。背景中的绿植保持静止。``` </details>|
|Strong Instruction Following|<video src="https://github.com/user-attachments/assets/fdc3c27b-69f5-46a1-b707-0b57510fa32f" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```一名哀伤的黑发中国女子凝望天空,复古胶片风格烘托出怀旧戏剧氛围``` </details> <details><summary>📋 Show rewrite prompt</summary> ```俯视角度一位有着深色略带凌乱的长卷发的年轻中国女性佩戴着闪耀的珍珠项链和圆形金色耳环她凌乱的头发被风吹散她微微抬头望向天空神情十分哀伤眼中含着泪水。嘴唇涂着红色口红。背景是带有华丽红色花纹的图案。画面呈现复古电影风格色调低饱和带着轻微柔焦烘托情绪氛围质感仿佛20世纪90年代的经典胶片风格营造出怀旧且富有戏剧性的感觉。``` </details>|<video src="https://github.com/user-attachments/assets/3fcb42cc-cdd3-4651-86a6-645a858561c4" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```建筑蓝图上的线条化为实体,瞬间生长出一个完整的复古工业风办公空间。``` </details> <details><summary>📋 Show rewrite prompt</summary> ```一座空旷的现代阁楼里有一张铺展在地板中央的建筑蓝图。忽然间图纸上的线条泛起微光仿佛被某种无形的力量唤醒。紧接着那些发光的线条开始向上延伸从平面中挣脱勾勒出立体的轮廓——就像在空中进行一场无声的3D打印。随后奇迹在加速发生极简的橡木办公桌、优雅的伊姆斯风格皮质椅、高挑的工业风金属书架还有几盏爱迪生灯泡以光纹为骨架迅速“生长”出来。转瞬间线条被真实的材质填充——木材的温润、皮革的质感、金属的冷静都在眨眼间完整呈现。最终所有家具稳固落地蓝图的光芒悄然褪去。一个完整的办公空间就这样从二维的图纸中诞生。``` </details>|
|Smooth Motion Generation|<video src="https://github.com/user-attachments/assets/447847f0-490a-45f9-a86d-a67ab1ff4231" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```A DJ is immersed in his musical world. He wears a pair of professional, matte-black headphones, revealing a focused expression. He wears a black bomber jacket, zipped open to reveal a T-shirt underneath. His upper body sways back and forth rhythmically to the throbbing electronic beats, his head moving with precise movement. The mixing console in front of him serves as the primary source of light. In the distance, the cool white glow of several stadium floodlights casts a deep, dark haze across the vast field, casting long shadows across the emerald green grass, creating a stark contrast to the brightly lit area surrounding the DJ booth. His hands danced swiftly and precisely across the equipment. The entire scene was filled with high-tech dynamics and the solitary creative passion. Against the backdrop of the vast and silent night stadium, it created an atmosphere of high focus, energy, and a slightly surreal feeling.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```slowly advancing medium shot, shot from a level angle, focuses on the center of an empty football field, where a DJ is immersed in his musical world. He wears a pair of professional, matte-black headphones, one earcup slightly removed, revealing a focused expression and a brow beaded with sweat from his intense concentration. He wears a black bomber jacket, zipped open to reveal a T-shirt underneath. His upper body sways back and forth rhythmically to the throbbing electronic beats, his head moving with precise movement. The mixing console in front of him serves as the primary source of light. In the distance, the cool white glow of several stadium floodlights casts a deep, dark haze across the vast field, casting long shadows across the emerald green grass, creating a stark contrast to the brightly lit area surrounding the DJ booth. His hands danced swiftly and precisely across the equipment, one hand steadily pushing and pulling a long volume fader, while the fingers of the other nimbly jumped between the illuminated knobs and pads, sometimes decisively cutting a bass line, sometimes triggering an echo effect. The entire scene was filled with high-tech dynamics and the solitary creative passion. Against the backdrop of the vast and silent night stadium, it created an atmosphere of high focus, energy, and a slightly surreal feeling.``` </details>|<video src="https://github.com/user-attachments/assets/49057fe8-a102-4fd7-bd92-e9561abb9f45" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```A figure skater performs a rapid, graceful Biellmann spin, captured from all angles.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```The video captures a figure skater performing a Biellmann spin on ice. The subject is a female skater in a glittering costume. Initially, she spins on one leg. Then, she reaches back and pulls her free leg up. Next, she spins rapidly, becoming a blur of motion, with ice shavings spraying from her skate blade. The background is an ice rink with blurred advertising boards. The camera circles around the subject to capture the spin from all angles. The lighting is spotlit, creating lens flares and sparkles on her costume. The overall video presents a graceful artistic sports style.``` </details>|
|Cinematic Aesthetics|<video src="https://github.com/user-attachments/assets/4098cf72-357d-4b81-97df-6752064ce0c3" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```固定镜头,焦点在图片里的挂钟上镜头轻微摇晃营造手持摄影感wjw,filmphotos,Film Grain,Reversal film photographyWong Kar-wai movies,cinematic photography, HK film style,neon lighting, in the style of Wong Kar Wai film``` </details> <details><summary>📋 Show rewrite prompt</summary> ```Handheld lens shooting, the camera focuses on the wall clock hanging on the green-toned wall, shaking slightly. The second hand sweeps steadily across the clock face, and the shadow of the clock cast on the wall shifts subtly with the movement of the lens.``` </details>|<video src="https://github.com/user-attachments/assets/2b4575e5-79f1-4011-bed0-e8380198f7c9" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```The leaves of calamus shine in the sunlight, dotted with dewdrops that trickle down to the ground with the breeze.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```A macro shot focuses on long, slender calamus leaves, rendered in a cinematic photography realistic style. The main leaf, a vibrant, deep green, is positioned diagonally across the frame. Its surface is covered in tiny, glistening spherical dewdrops that catch and refract the bright morning sunlight, creating sparkling highlights. Initially, a larger, perfectly round dewdrop clings to the upper section of the leaf, its surface tension holding it in place. Then, as the leaf sways almost imperceptibly, the dewdrop begins to slowly dislodge. Next, it starts to trickle down the central vein of the leaf, its shape elongating slightly as it moves, leaving a subtle, glistening wet trail in its path. Finally, it reaches the pointed tip of the leaf, hangs for a brief moment, and falls out of the bottom of the frame. In the background, other leaves and blades of grass are softly blurred, creating a beautiful bokeh effect with soft, out-of-focus circles of light. The environment is bathed in the warm, golden glow of early morning sunlight, which streams in from behind the leaves, backlighting them and causing their wet edges to shine brilliantly. The overall impression is one of serene, natural beauty, captured in a highly realistic and detailed manner. This is a macro shot. The camera tilts down very slowly, following the path of the main dewdrop as it travels down the leaf. The lighting is soft and natural, with strong backlighting to create a radiant, glowing effect on the dewdrops and leaf edges, characteristic of professional nature photography. The atmosphere is peaceful and serene. The overall video presents a cinematic photography realistic style.``` </details>|
|Text Rendering|<video src="https://github.com/user-attachments/assets/7c964fc5-c27e-4bd0-bf3f-eb8fca2caef6" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```赛博朋克风格的夜晚街角,一个巨大的招牌上, “Hunyuan Video 1.5”的霓虹灯管轮廓已经安装好。镜头推进霓虹灯从“H”开始伴随着滋滋的电流声每个字母依次亮起粉紫色的光芒直到全部点亮照亮了潮湿的街道。赛博朋克城市美学``` </details> <details><summary>📋 Show rewrite prompt</summary> ```On a wet street corner in a cyberpunk city at night, a large neon sign reading "Hunyuan Video 1.5" lights up sequentially, illuminating the dark, rainy environment with a pinkish-purple glow. he scene is a dark, rain-slicked street corner in a futuristic, cinematic cyberpunk city. Mounted on the metallic, weathered facade of a building is a massive, unlit neon sign. The sign's glass tube framework clearly spells out the words "Hunyuan Video 1.5". Initially, the street is dimly lit, with ambient light from distant skyscrapers creating shimmering reflections on the wet asphalt below. Then, the camera zooms in slowly toward the sign. As it moves, a low electrical sizzling sound begins. In the background, the dense urban landscape of the cyberpunk metropolis is visible through a light atmospheric haze, with towering structures adorned with their own flickering advertisements. A complex web of cables and pipes crisscrosses between the buildings. The shot is at a low angle, looking up at the sign to emphasize its grand scale. The lighting is high-contrast and dramatic, dominated by the neon glow which creates sharp, specular reflections and deep shadows. The atmosphere is moody and tech-noir. The overall video presents a cinematic photography realistic style.,``` </details>|<video src="https://github.com/user-attachments/assets/73e8b741-baec-4a40-9d36-a1435172ab64" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```一张铺开的中国宣纸上,浓墨滴入水中,晕染出壮丽的山水画轮廓。山峰、云雾、孤舟在墨色中自然形成。随后,这些水墨元素巧妙地流动、重组,在画面的留白处汇聚成"Hunyuan Video 1.5"的书法字体。优雅,诗意,文化底蕴``` </details> <details><summary>📋 Show rewrite prompt</summary> ```A drop of black ink blooms on wet Chinese Xuan paper, forming a landscape painting before the ink elements fluidly reassemble into the calligraphic text "Hunyuan Video 1.5". On a flat, laid-out sheet of off-white Chinese Xuan paper with a subtle, fibrous texture, the scene unfolds. Initially, a single, concentrated drop of deep black ink falls into a clear, wet area at the center of the paper. Then, the ink instantly begins to bloom outwards in intricate, flowing tendrils of varying shades from jet-black to smoky grey. As it spreads, the ink wash naturally and rapidly forms the silhouette of a majestic mountain range with sharp, defined peaks. Next, softer, diluted grey tones billow around the mountains, creating layers of atmospheric mist and clouds, while a simple, dark stroke materializes as a lone boat on a tranquil, watery expanse at the base. As the landscape is formed, the ink elements—the lines of the mountains, wisps of cloud, and the shape of the boat—begin to deconstruct, dissolving into flowing streams of liquid ink. Finally, these streams move gracefully across the paper's empty white space, converging and elegantly reorganizing to form the text "Hunyuan Video 1.5" in a fluid, semi-cursive calligraphic style. The background is the minimalist expanse of the Xuan paper itself, its texture providing a subtle depth. The entire process is lit by soft, even, diffused light from above, which enhances the rich tonal variations of the ink and the delicate texture of the paper without creating harsh shadows. Bird's-eye view. The camera is positioned directly above the subject, capturing the entire process. The camera remains static. The aesthetic is a high-quality, dynamic Chinese ink wash animation style, perfectly simulating the real-world physics of ink spreading on wet paper. The entire sheet of paper and the final text are kept fully within the frame. Poetic, elegant, artistic. The overall video presents a dynamic Chinese ink wash animation style.``` </details>|
|Physics Compliance|<video src="https://github.com/user-attachments/assets/f1d74e48-cc03-415d-b75f-f7186a4fb41d" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```In a sleek museum gallery, a woman pauses before a gilded oil painting. The painted man inside slowly comes alive, lifting a bottle and pouring real wine straight from the canvas into her glass. Surrounded by stylish art critics moving naturally through the hall, she accepts the pour with calm elegance, as if the impossible were routine. ``` </details> <details><summary>📋 Show rewrite prompt</summary> ```In a sleek museum gallery, a woman receives a glass of wine poured directly from an animated oil painting. A sophisticated woman with dark hair tied back elegantly stands in the mid-ground. She is wearing a simple, black silk sleeveless dress and holds a clear, crystal wine glass in her right hand. She is positioned before a large, baroque-style oil painting in an ornate, gilded frame. Inside the painting, an aristocratic man with a mustache, dressed in a dark velvet doublet with a white lace collar, is depicted. His form is defined by visible, impasto oil brushstrokes. Initially, the woman watches the painting with calm poise. Then, the painted man's arm slowly animates, his painted texture retained as he lifts a dark bottle. Next, a photorealistic stream of red wine emerges directly from the flat canvas surface, arcing through the air and splashing gently into the real crystal glass she holds. She remains perfectly still, accepting the impossible pour with a subtle, knowing smile. The setting is a modern art gallery with high white walls and polished dark concrete floors that reflect the ambient light. Focused track lighting from the high ceiling casts a warm, dramatic spotlight on the woman and the painting, creating soft shadows. In the background, two other gallery patrons, a man and a woman in stylish, modern attire, stroll slowly from right to left, their figures slightly blurred by a shallow depth of field, moving naturally through the hall. The shot is at an eye-level angle with the woman. The camera remains static, capturing the surreal event in a steady medium shot. The lighting is high-contrast and dramatic, reminiscent of a cinematic photography realistic style, using soft side lighting to accentuate the woman's features and the texture of the painting. The mood is surreal, elegant, and mysterious. The overall video presents a cinematic photography realistic style.``` </details>|<video src="https://github.com/user-attachments/assets/07bcce06-ff4f-4688-8c60-c02f600635ea" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```An intact soda can is slowly crushed by a hand.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```In a medium close-up, a hand slowly crushes an intact red and white soda can on a wooden table. A male hand with visible, realistic skin texture is wrapped firmly around the middle of an intact, pristine red and white aluminum soda can. The can, covered in glistening condensation droplets, rests on a dark, polished wooden surface. The cinematic realism captures every minute detail of the scene. Initially, the hand's grip is steady, with the can's cylindrical shape perfectly preserved. Then, the fingers begin to tighten slowly, the knuckles whitening slightly from the exertion. Next, the smooth aluminum surface starts to buckle under the controlled pressure, a sharp crease forming vertically down its side as the metallic sheen distorts. As the hand continues its deliberate squeeze, the can collapses inward progressively, the vibrant red paint wrinkling as the metal structure crumples. Finally, the can is left significantly crushed, its form now an irregular, crumpled shape held tightly in the fist. The scene takes place on a dark, polished wooden tabletop that catches soft, diffuse reflections. The grain of the wood is faintly discernible, adding a layer of texture to the foreground. The background is completely out of focus, rendered as a soft, dark, and non-descript blur, which isolates the main action and enhances the photorealistic quality of the shot. The shot is a medium close-up, presented in a cinematic photography realistic style. The camera remains static at a slightly high angle, looking down to provide a clear and unobstructed view of the can's deformation. Soft side lighting creates high contrast, sculpting the muscles and tendons of the hand while casting specular highlights on the metallic can and the water droplets. The atmosphere is focused and intense. The overall video presents a cinematic photography realistic style.``` </details>|
|Camera Movement|<video src="https://github.com/user-attachments/assets/6deacbfe-4cca-48d7-a2be-cb638a3e01cb" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```圣诞节的家中,小女孩靠着妈妈听妈妈读书,背景是下着雪的窗外,镜头缓慢下移,一只可爱的长毛小白猫戴着圣诞帽趴在温暖的地摊上``` </details> <details><summary>📋 Show rewrite prompt</summary> ```In a cozy home on Christmas, a young girl leans against her mother as they read a book, and the camera moves down to reveal a fluffy white cat in a Santa hat resting on a warm rug. In a warmly lit living room on a snowy Christmas evening, a young mother and her little daughter are sitting together on a comfortable sofa. The mother, with a gentle expression and wearing a cream-colored knitted sweater, holds an open storybook with colorful illustrations. Her daughter, a small girl with brown hair in pigtails and a red pajama set, leans her head affectionately on her mother's shoulder, her eyes fixed on the book. On the floor below them, a fluffy, long-haired white cat is curled up on a plush, beige wool rug. The cat wears a tiny red and white Santa hat perched between its ears. Initially, the shot focuses on the mother and daughter, capturing their quiet, shared moment. The mothers finger gently rests on the page of the book. Then, the camera slowly moves downward, gliding past the book and their laps. Finally, the camera settles at a low angle, bringing the adorable white cat into sharp focus as the primary subject. The cat's chest gently rises and falls with each breath, its eyes peacefully closed. Through a large window in the background, large, soft snowflakes can be seen falling silently against the dark blue twilight sky, creating a peaceful and serene backdrop. Faint, out-of-focus golden Christmas lights twinkle in the corner of the room, adding to the warm, festive atmosphere. The scene is imbued with a sense of comfort and holiday warmth, creating a beautiful cinematic photography realistic image. The camera slowly moves downward. The shot uses soft, warm interior lighting that casts gentle shadows, creating a high-contrast, cinematic look. A shallow depth of field keeps the focus on the subjects while beautifully blurring the background elements. The mood is heartwarming, peaceful, and festive. The overall video presents a cinematic photography realistic style.``` </details>|<video src="https://github.com/user-attachments/assets/8e72ed0f-f8ac-445b-97e5-eb4b16fbc121" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```The hiker begins walking forward along the trail, causing the water bottle to swing rhythmically with each step. The camera gradually pulls back and rises to reveal a vast desert landscape stretching out ahead.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```The hiker begins walking forward along the trail, causing the water bottle to swing rhythmically with each step. The camera gradually pulls back and rises to reveal a vast desert landscape stretching out ahead, while the sun position shifts from afternoon to dusk, casting increasingly longer shadows across the terrain as the figure becomes smaller in the frame.``` </details>|
|Multi-Style Support|<video src="https://github.com/user-attachments/assets/65b2c5a5-e6ba-43be-9462-a98b03b675f1" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```Have the cake man begin to take chunks out of himself and eat it.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```The cake man sits on the chair, with his hands resting on his knees. Then, he slowly raises his right hand and breaks off a piece of cake from his left shoulder. Next, he brings the piece of cake to his mouth and begins to chew. At the same time, his eyes widen slightly, and his mouth parts gently. After that, he raises his right hand again, breaks off another piece of cake from his right arm, and repeats the action of bringing it to his mouth to chew.``` </details>|<video src="https://github.com/user-attachments/assets/de5f7480-b79c-4fc1-b345-c5880a3b5f9e" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```A little girl, carrying a colorful handbag, skips through the garden. The video uses claymation style.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```A little girl with a colorful handbag skips through a whimsical claymation garden. In a vibrant garden constructed entirely from clay, a young girl, meticulously crafted in a claymation style, skips joyfully. She has chunky, sculpted yellow clay hair tied in pigtails that bounce with a slight stiffness, simple black button eyes, and a wide, permanently etched smile. She wears a simple pink clay dress with a white collar. In her left hand, she carries a small handbag molded from bright red and blue clay, which swings in a slightly jerky arc as she moves. Initially, the girl lifts her right leg high, her body momentarily suspended in a classic stop-motion pose. Then, she hops forward, landing lightly as her left leg swings through for the next skip. Her arms move in an exaggerated, back-and-forth rhythm, characteristic of stop-motion animation. Her movements are intentionally not perfectly fluid, highlighting the frame-by-frame nature of the claymation technique. The garden around her is a whimsical, textured world. In the foreground and mid-ground, oversized flowers with swirled purple and orange petals stand on thick green stems. The ground is a textured mat of green clay, showing subtle fingerprints and tool marks that add to the handmade charm. In the background, a pale blue clay backdrop features a simplified, smiling sun molded from yellow clay. The shot is at an eye-level angle with the main subject. The camera follows the subject, moving smoothly to the right to keep her in the frame. The lighting is bright and even, casting soft shadows that emphasize the rounded, three-dimensional forms of the clay models. The overall video presents a charming and detailed claymation style.``` </details>|
|High Image-Video Consistency|<img src="https://github.com/user-attachments/assets/3bc8e55d-c211-454e-8067-128c0e215eb6"> <video src="https://github.com/user-attachments/assets/3e6b7ee9-ec66-4e46-a446-801b1c1a1c81" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```女孩放下书,站起身,转身向屋内走去。镜头拉远。``` </details> <details><summary>📋 Show rewrite prompt</summary> ```女孩合上手中的书,将书放在身侧的窗台上。随后,她缓缓站起身,转身向屋内走去,身影逐渐没入门后的阴影中。镜头缓缓拉远,露出更多被绿植覆盖的屋檐和墙体。``` </details>|<img src="https://github.com/user-attachments/assets/7657ce60-90b5-4fdc-b713-0eaa55829b09"> <video src="https://github.com/user-attachments/assets/9ca24021-2353-40d5-8a4d-0f8e67d51826" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```女人手上的鸟亲了女人一口``` </details> <details><summary>📋 Show rewrite prompt</summary> ```女人手臂上的白色鹦鹉缓缓转过头,将喙轻轻触碰女人的脸颊,随后收回头部。女人嘴角微微上扬,目光温柔地注视着鹦鹉。背景中的绿植保持静止。``` </details>|
@ -308,6 +335,17 @@ The GSB(Good/Same/Bad) approach is widely used to evaluate the relative performa
</div>
### Inference speed
We report inference speed with basic engineering-level acceleration techniques enabled on 8 H800 GPUs to demonstrate practical performance achievable in real-world deployment scenarios.
Please note that in this experiment, we do not pursue the most extreme acceleration at the cost of generation quality, but rather to achieve notable speed improvements while maintaining nearly identical output quality.
We report the total inference time for 50 diffusion steps for HunyuanVideo 1.5 below:
<div align="center">
<img src="./assets/speed.png" alt="" width="100%">
</div>
## 📚 Citation
```bibtex
@ -324,3 +362,11 @@ The GSB(Good/Same/Bad) approach is widely used to evaluate the relative performa
We would like to thank the contributors to the [Transformers](https://github.com/huggingface/transformers), [Diffusers](https://github.com/huggingface/diffusers) , [HuggingFace](https://huggingface.co/) and [Qwen-VL](https://github.com/QwenLM/Qwen-VL), for their open research and exploration.
## 🌟 Github Star History
<a href="https://star-history.com/#Tencent-Hunyuan/HunyuanVideo-1.5&Date">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=Tencent-Hunyuan/HunyuanVideo-1.5&type=Date1&theme=dark" />
<source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=Tencent-Hunyuan/HunyuanVideo-1.5&type=Date1" />
<img alt="Star History Chart" src="https://api.star-history.com/svg?repos=Tencent-Hunyuan/HunyuanVideo-1.5&type=Date1" />
</picture>
</a>

View File

@ -26,6 +26,10 @@ HunyuanVideo-1.5作为一款轻量级视频生成模型仅需83亿参数即
<a href=https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5 target="_blank"><img src= https://img.shields.io/badge/Page-bb8a2e.svg?logo=github height=22px></a>
<a href="https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5/blob/main/assets/HunyuanVideo_1_5.pdf" target="_blank"><img src=https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv height=22px></a>
<a href=https://x.com/TencentHunyuan target="_blank"><img src=https://img.shields.io/badge/Hunyuan-black.svg?logo=x height=22px></a>
<a href="https://doc.weixin.qq.com/doc/w3_AXcAcwZSAGgCNACVygLxeQjyn4FYS?scode=AJEAIQdfAAoSfXnTj0AAkA-gaeACk" target="_blank"><img src=https://img.shields.io/badge/📚-PromptHandBook-blue.svg?logo=book height=22px></a> <br/>
<a href="https://github.com/comfyanonymous/ComfyUI" target="_blank"><img src=https://img.shields.io/badge/ComfyUI-blue.svg?logo=book height=22px></a>
<a href="https://github.com/ModelTC/LightX2V" target="_blank"><img src=https://img.shields.io/badge/LightX2V-yellow.svg?logo=book height=22px></a>
</div>
@ -44,7 +48,7 @@ HunyuanVideo-1.5作为一款轻量级视频生成模型仅需83亿参数即
## 🧩 社区贡献
如果您在项目中使用或开发了 HunyuanVideo欢迎告知我们。
如果您在项目中使用或开发了 HunyuanVideo-1.5,欢迎告知我们。
- **ComfyUI** - [ComfyUI](https://github.com/comfyanonymous/ComfyUI): 一个强大且模块化的扩散模型图形界面采用节点式工作流。ComfyUI 支持 HunyuanVideo-1.5,并提供多种工程加速优化以实现快速推理。
@ -53,6 +57,8 @@ HunyuanVideo-1.5作为一款轻量级视频生成模型仅需83亿参数即
## 📑 开源计划
- HunyuanVideo-1.5 (文生视频/图生视频)
- [x] 推理代码和模型权重
- [x] 支持 ComfyUI
- [x] 支持 LightX2V
- [ ] Diffusers 支持
- [ ] 发布所有模型权重(稀疏注意力、蒸馏模型和超分辨率模型)
@ -80,7 +86,7 @@ HunyuanVideo-1.5作为一款轻量级视频生成模型仅需83亿参数即
## 📖 Introduction
我们推出了 HunyuanVideo 1.5一个轻量级但功能强大的视频生成模型。该模型仅使用8.3B参数就实现了开源最先进的视觉质量和运动连贯性,并能在消费级 GPU 上进行高效推理。这一成果基于几个关键组件包括精细的数据整理、采用稀疏注意力SSTA的DiT 架构、通过专用 OCR 编码增强的双语理解能力、渐进式预训练和后训练,以及高效的视频超分辨率网络。利用这些设计,我们开发了一个统一的框架,能够跨多种时长和分辨率生成高质量的文生视频和图生视频。大量实验证明,这个紧凑而高效的模型在开源模型中确立了新的技术标杆。通过发布 HunyuanVideo 1.5 的代码和权重,我们为社区提供了一个高性能的基础,显著降低了视频创作和研究的成本,使先进的视频生成技术对所有人更加触手可及。
我们推出了 HunyuanVideo-1.5一个轻量级但功能强大的视频生成模型。该模型仅使用8.3B参数就实现了开源最先进的视觉质量和运动连贯性,并能在消费级 GPU 上进行高效推理。这一成果基于几个关键组件包括精细的数据整理、采用稀疏注意力SSTA的DiT 架构、通过专用 OCR 编码增强的双语理解能力、渐进式预训练和后训练,以及高效的视频超分辨率网络。利用这些设计,我们开发了一个统一的框架,能够跨多种时长和分辨率生成高质量的文生视频和图生视频。大量实验证明,这个紧凑而高效的模型在开源模型中确立了新的技术标杆。通过发布 HunyuanVideo-1.5 的代码和权重,我们为社区提供了一个高性能的基础,显著降低了视频创作和研究的成本,使先进的视频生成技术对所有人更加触手可及。
## ✨ Key Features
- **轻量级高性能架构**:我们提出了一种高效架构,将 83 亿参数的 Diffusion TransformerDiT与 3D 因果 VAE 相结合,在空间维度实现了 16 倍的压缩,在时间轴上实现了 4 倍的压缩。此外,创新的 SSTA机制修剪了冗余的时空 kv 块,显著减少了长视频序列的计算开销,并加速了推理,在 10 秒 720p 视频合成中,相比 FlashAttention-3 实现了端到端 $1.87 \times $ 的加速。
@ -133,11 +139,11 @@ pip install -i https://mirrors.tencent.com/pypi/simple/ --upgrade tencentcloud-s
### 步骤 3安装注意力库
* Flash Attention
* Flash Attention:
建议安装 Flash Attention 以实现更快的推理速度和更低的 GPU 内存消耗。
详细安装说明请参考 [Flash Attention](https://github.com/Dao-AILab/flash-attention)。
* Flex-Block-Attention
* Flex-Block-Attention:
flex-block-attn 仅在使用稀疏注意力以实现更快推理时需要,可以通过以下命令安装:
```bash
git clone https://github.com/Tencent-Hunyuan/flex-block-attn.git
@ -145,7 +151,7 @@ pip install -i https://mirrors.tencent.com/pypi/simple/ --upgrade tencentcloud-s
python3 setup.py install
```
* SageAttention
* SageAttention:
```bash
git clone https://github.com/cooper1637/SageAttention.git
@ -156,6 +162,8 @@ pip install -i https://mirrors.tencent.com/pypi/simple/ --upgrade tencentcloud-s
## 🧱 下载预训练模型
> 💡 蒸馏模型和稀疏注意力模型即将发布,敬请期待。请关注 Hugging Face 模型卡片获取最新更新。
在生成视频之前,请先下载预训练模型。详细说明请参考 [checkpoints-download.md](checkpoints-download.md)。
## 📝 提示词指南
@ -163,7 +171,7 @@ pip install -i https://mirrors.tencent.com/pypi/simple/ --upgrade tencentcloud-s
提示词增强在我们的模型生成高质量视频方面起着至关重要的作用。通过撰写更长、更详细的提示词,生成的视频质量将得到显著改善。我们鼓励您编写全面且描述性的提示词,以获得最佳的视频质量。我们建议社区伙伴参考我们的官方指南,了解如何撰写有效的提示词。
**参考:** **[HunyuanVideo 1.5 提示词手册](https://doc.weixin.qq.com/doc/w3_AXcAcwZSAGgCNhei2zzNUS8O4mKop?scode=AJEAIQdfAAoE1dhviFAAkA-gaeACk)**
**参考:** **[HunyuanVideo-1.5 提示词手册](https://doc.weixin.qq.com/doc/w3_AXcAcwZSAGgCNhei2zzNUS8O4mKop?scode=AJEAIQdfAAoE1dhviFAAkA-gaeACk)**
### 自动提示词增强的系统提示词
@ -182,7 +190,7 @@ pip install -i https://mirrors.tencent.com/pypi/simple/ --upgrade tencentcloud-s
> 你也可以将上述模型名替换为任何你已部署、与 vLLM 兼容的模型(包括 HuggingFace 等模型)。
>
> 默认为开启提示词重写。若需显式关闭,可以使用 `--rewrite false` 或 `--rewrite 0`。如果未配置 vLLM 提示词重写相关服务,管道会在本地直接生成,无远程重写。
> 默认为开启提示词重写`--rewrite` 默认值为 `true`。若需显式关闭,可以使用 `--rewrite false` 或 `--rewrite 0`。如果未配置 vLLM 提示词重写相关服务,管道会在本地直接生成,无远程重写。
示例:生成视频(支持 T2V/I2V。T2V 模式下设置 `IMAGE_PATH=none`I2V 模式下指定图像路径)
@ -192,7 +200,7 @@ export T2V_REWRITE_MODEL_NAME="<your_model_name>"
export I2V_REWRITE_BASE_URL="<your_vllm_server_base_url>"
export I2V_REWRITE_MODEL_NAME="<your_model_name>"
PROMPT="A close-up shot captures a scene on a polished, light-colored granite kitchen counter, illuminated by soft natural light from an unseen window. Initially, the frame focuses on a tall, clear glass filled with golden, translucent apple juice standing next to a single, shiny red apple with a green leaf still attached to its stem. The camera moves horizontally to the right. As the shot progresses, a white ceramic plate smoothly enters the frame, revealing a fresh arrangement of about seven or eight more apples, a mix of vibrant reds and greens, piled neatly upon it. A shallow depth of field keeps the focus sharply on the fruit and glass, while the kitchen backsplash in the background remains softly blurred. The scene is in a realistic style."
PROMPT='A girl holding a paper with words "Hello, world!"'
IMAGE_PATH=./data/reference_image.png # 可选,'none' 或 <图像路径>
SEED=1
@ -206,6 +214,7 @@ CFG_DISTILLED=true # 使用 CFG 蒸馏模型进行推理2倍加速
SPARSE_ATTN=true # 使用稀疏注意力进行推理
SAGE_ATTN=false # 使用 SageAttention 进行推理
MODEL_PATH=ckpts # 预训练模型路径
REWRITE=true # 启用提示词重写
torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
--prompt "$PROMPT" \
@ -216,6 +225,7 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
--cfg_distilled $CFG_DISTILLED \
--sparse_attn $SPARSE_ATTN \
--use_sageattn $SAGE_ATTN \
--rewrite $REWRITE \
--output_path $OUTPUT_PATH \
--save_pre_sr_video \
--model_path $MODEL_PATH
@ -253,34 +263,34 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
## 🧱 模型卡片
|模型名称| 下载链接 |
|-|---------------------------|
|HunyuanVideo 1.5-480P-T2V|[480P-T2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_t2v) |
|HunyuanVideo 1.5-480p-I2V |[480p-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v) |
|HunyuanVideo 1.5-480p-T2V-distill | [480p-T2V-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_t2v_distilled) |
|HunyuanVideo 1.5-480p-I2V-distill |[480p-I2V-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v_distilled) |
|HunyuanVideo 1.5-720P-T2V|[720P-T2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_t2v) |
|HunyuanVideo 1.5-720P-I2V |[720P-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_i2v) |
|HunyuanVideo 1.5-720P-T2V-distiill| Comming soon |
|HunyuanVideo 1.5-720P-I2V-distiill |[720P-I2V-distiill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_i2v_distilled) |
|HunyuanVideo 1.5-720P-T2V-sparse-distiill| Comming soon |
|HunyuanVideo 1.5-720P-I2V-sparse-distiill |[720P-I2V-sparse-distiill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_i2v_distilled_sparse) |
|HunyuanVideo 1.5-720p-sr |[720p-sr](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_sr_distilled) |
|HunyuanVideo 1.5-1080p-sr |[1080p-sr](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/1080p_sr_distilled) |
|HunyuanVideo-1.5-480P-T2V|[480P-T2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_t2v) |
|HunyuanVideo-1.5-480P-I2V |[480P-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v) |
|HunyuanVideo-1.5-480P-T2V-distill | [480P-T2V-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_t2v_distilled) |
|HunyuanVideo-1.5-480P-I2V-distill |[480P-I2V-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v_distilled) |
|HunyuanVideo-1.5-720P-T2V|[720P-T2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_t2v) |
|HunyuanVideo-1.5-720P-I2V |[720P-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_i2v) |
|HunyuanVideo-1.5-720P-T2V-distiill| Comming soon |
|HunyuanVideo-1.5-720P-I2V-distiill |[720P-I2V-distiill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_i2v_distilled) |
|HunyuanVideo-1.5-720P-T2V-sparse-distiill| Comming soon |
|HunyuanVideo-1.5-720P-I2V-sparse-distiill |[720P-I2V-sparse-distiill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_i2v_distilled_sparse) |
|HunyuanVideo-1.5-720P-sr |[720P-sr](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_sr_distilled) |
|HunyuanVideo-1.5-1080P-sr |[1080P-sr](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/1080p_sr_distilled) |
## 🎬 更多示例
|特性|示例1|示例2|
|------|------|------|
|指令跟随能力|<video src="https://github.com/user-attachments/assets/fdc3c27b-69f5-46a1-b707-0b57510fa32f" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```一名哀伤的黑发中国女子凝望天空,复古胶片风格烘托出怀旧戏剧氛围``` </details> <details><summary>📋 Show rewrite prompt</summary> ```俯视角度一位有着深色略带凌乱的长卷发的年轻中国女性佩戴着闪耀的珍珠项链和圆形金色耳环她凌乱的头发被风吹散她微微抬头望向天空神情十分哀伤眼中含着泪水。嘴唇涂着红色口红。背景是带有华丽红色花纹的图案。画面呈现复古电影风格色调低饱和带着轻微柔焦烘托情绪氛围质感仿佛20世纪90年代的经典胶片风格营造出怀旧且富有戏剧性的感觉。``` </details>|<video src="https://github.com/user-attachments/assets/3fcb42cc-cdd3-4651-86a6-645a858561c4" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```建筑蓝图上的线条化为实体,瞬间生长出一个完整的复古工业风办公空间。``` </details> <details><summary>📋 Show rewrite prompt</summary> ```一座空旷的现代阁楼里有一张铺展在地板中央的建筑蓝图。忽然间图纸上的线条泛起微光仿佛被某种无形的力量唤醒。紧接着那些发光的线条开始向上延伸从平面中挣脱勾勒出立体的轮廓——就像在空中进行一场无声的3D打印。随后奇迹在加速发生极简的橡木办公桌、优雅的伊姆斯风格皮质椅、高挑的工业风金属书架还有几盏爱迪生灯泡以光纹为骨架迅速“生长”出来。转瞬间线条被真实的材质填充——木材的温润、皮革的质感、金属的冷静都在眨眼间完整呈现。最终所有家具稳固落地蓝图的光芒悄然褪去。一个完整的办公空间就这样从二维的图纸中诞生。``` </details>|
|流畅运动生成|<video src="https://github.com/user-attachments/assets/21f9da05-33d0-4521-b188-ea009e7fdd3f" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```A cosmic loaf of bread, with a volcanic black crust, is precisely sliced open to reveal a swirling nebula interior.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```Cinematic 8K footage, with a stark, moody aesthetic. Under a dramatic top-down spotlight, a loaf of what appears to be bread rests on a slab of polished marble, which is flecked with silver that glitters like a starfield. The loaf's crust is a deep, matte black, cracked like cooled volcanic rock. A sleek, modern santoku knife, its sharp edge gleaming under the single light source, begins a series of clean, rhythmic cuts. With each precise, repetitive slice that falls away, the loafs impossible interior is revealed: not dough, but a compressed, swirling nebula of deep purples and blues, alive with pinpricks of glittering light. As the knife continues its precise motion, a fine, shimmering dust of cosmic particles settles on the marble. The extreme macro view focuses on the mesmerizing contrast between the blades cold steel and the ethereal, galaxy-filled substance of the bread. This is hyper-realistic macro videography at its finest.``` </details>|<video src="https://github.com/user-attachments/assets/49057fe8-a102-4fd7-bd92-e9561abb9f45" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```A figure skater performs a rapid, graceful Biellmann spin, captured from all angles.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```The video captures a figure skater performing a Biellmann spin on ice. The subject is a female skater in a glittering costume. Initially, she spins on one leg. Then, she reaches back and pulls her free leg up. Next, she spins rapidly, becoming a blur of motion, with ice shavings spraying from her skate blade. The background is an ice rink with blurred advertising boards. The camera circles around the subject to capture the spin from all angles. The lighting is spotlit, creating lens flares and sparkles on her costume. The overall video presents a graceful artistic sports style.``` </details>|
|流畅运动生成|<video src="https://github.com/user-attachments/assets/447847f0-490a-45f9-a86d-a67ab1ff4231" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```A DJ is immersed in his musical world. He wears a pair of professional, matte-black headphones, revealing a focused expression. He wears a black bomber jacket, zipped open to reveal a T-shirt underneath. His upper body sways back and forth rhythmically to the throbbing electronic beats, his head moving with precise movement. The mixing console in front of him serves as the primary source of light. In the distance, the cool white glow of several stadium floodlights casts a deep, dark haze across the vast field, casting long shadows across the emerald green grass, creating a stark contrast to the brightly lit area surrounding the DJ booth. His hands danced swiftly and precisely across the equipment. The entire scene was filled with high-tech dynamics and the solitary creative passion. Against the backdrop of the vast and silent night stadium, it created an atmosphere of high focus, energy, and a slightly surreal feeling.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```slowly advancing medium shot, shot from a level angle, focuses on the center of an empty football field, where a DJ is immersed in his musical world. He wears a pair of professional, matte-black headphones, one earcup slightly removed, revealing a focused expression and a brow beaded with sweat from his intense concentration. He wears a black bomber jacket, zipped open to reveal a T-shirt underneath. His upper body sways back and forth rhythmically to the throbbing electronic beats, his head moving with precise movement. The mixing console in front of him serves as the primary source of light. In the distance, the cool white glow of several stadium floodlights casts a deep, dark haze across the vast field, casting long shadows across the emerald green grass, creating a stark contrast to the brightly lit area surrounding the DJ booth. His hands danced swiftly and precisely across the equipment, one hand steadily pushing and pulling a long volume fader, while the fingers of the other nimbly jumped between the illuminated knobs and pads, sometimes decisively cutting a bass line, sometimes triggering an echo effect. The entire scene was filled with high-tech dynamics and the solitary creative passion. Against the backdrop of the vast and silent night stadium, it created an atmosphere of high focus, energy, and a slightly surreal feeling.``` </details>|<video src="https://github.com/user-attachments/assets/49057fe8-a102-4fd7-bd92-e9561abb9f45" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```A figure skater performs a rapid, graceful Biellmann spin, captured from all angles.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```The video captures a figure skater performing a Biellmann spin on ice. The subject is a female skater in a glittering costume. Initially, she spins on one leg. Then, she reaches back and pulls her free leg up. Next, she spins rapidly, becoming a blur of motion, with ice shavings spraying from her skate blade. The background is an ice rink with blurred advertising boards. The camera circles around the subject to capture the spin from all angles. The lighting is spotlit, creating lens flares and sparkles on her costume. The overall video presents a graceful artistic sports style.``` </details>|
|电影级美学|<video src="https://github.com/user-attachments/assets/4098cf72-357d-4b81-97df-6752064ce0c3" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```固定镜头,焦点在图片里的挂钟上镜头轻微摇晃营造手持摄影感wjw,filmphotos,Film Grain,Reversal film photographyWong Kar-wai movies,cinematic photography, HK film style,neon lighting, in the style of Wong Kar Wai film``` </details> <details><summary>📋 Show rewrite prompt</summary> ```Handheld lens shooting, the camera focuses on the wall clock hanging on the green-toned wall, shaking slightly. The second hand sweeps steadily across the clock face, and the shadow of the clock cast on the wall shifts subtly with the movement of the lens.``` </details>|<video src="https://github.com/user-attachments/assets/2b4575e5-79f1-4011-bed0-e8380198f7c9" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```The leaves of calamus shine in the sunlight, dotted with dewdrops that trickle down to the ground with the breeze.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```A macro shot focuses on long, slender calamus leaves, rendered in a cinematic photography realistic style. The main leaf, a vibrant, deep green, is positioned diagonally across the frame. Its surface is covered in tiny, glistening spherical dewdrops that catch and refract the bright morning sunlight, creating sparkling highlights. Initially, a larger, perfectly round dewdrop clings to the upper section of the leaf, its surface tension holding it in place. Then, as the leaf sways almost imperceptibly, the dewdrop begins to slowly dislodge. Next, it starts to trickle down the central vein of the leaf, its shape elongating slightly as it moves, leaving a subtle, glistening wet trail in its path. Finally, it reaches the pointed tip of the leaf, hangs for a brief moment, and falls out of the bottom of the frame. In the background, other leaves and blades of grass are softly blurred, creating a beautiful bokeh effect with soft, out-of-focus circles of light. The environment is bathed in the warm, golden glow of early morning sunlight, which streams in from behind the leaves, backlighting them and causing their wet edges to shine brilliantly. The overall impression is one of serene, natural beauty, captured in a highly realistic and detailed manner. This is a macro shot. The camera tilts down very slowly, following the path of the main dewdrop as it travels down the leaf. The lighting is soft and natural, with strong backlighting to create a radiant, glowing effect on the dewdrops and leaf edges, characteristic of professional nature photography. The atmosphere is peaceful and serene. The overall video presents a cinematic photography realistic style.``` </details>|
|文字渲染|<video src="https://github.com/user-attachments/assets/7c964fc5-c27e-4bd0-bf3f-eb8fca2caef6" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```赛博朋克风格的夜晚街角,一个巨大的招牌上, “Hunyuan Video 1.5”的霓虹灯管轮廓已经安装好。镜头推进霓虹灯从“H”开始伴随着滋滋的电流声每个字母依次亮起粉紫色的光芒直到全部点亮照亮了潮湿的街道。赛博朋克城市美学``` </details> <details><summary>📋 Show rewrite prompt</summary> ```On a wet street corner in a cyberpunk city at night, a large neon sign reading "Hunyuan Video 1.5" lights up sequentially, illuminating the dark, rainy environment with a pinkish-purple glow. he scene is a dark, rain-slicked street corner in a futuristic, cinematic cyberpunk city. Mounted on the metallic, weathered facade of a building is a massive, unlit neon sign. The sign's glass tube framework clearly spells out the words "Hunyuan Video 1.5". Initially, the street is dimly lit, with ambient light from distant skyscrapers creating shimmering reflections on the wet asphalt below. Then, the camera zooms in slowly toward the sign. As it moves, a low electrical sizzling sound begins. In the background, the dense urban landscape of the cyberpunk metropolis is visible through a light atmospheric haze, with towering structures adorned with their own flickering advertisements. A complex web of cables and pipes crisscrosses between the buildings. The shot is at a low angle, looking up at the sign to emphasize its grand scale. The lighting is high-contrast and dramatic, dominated by the neon glow which creates sharp, specular reflections and deep shadows. The atmosphere is moody and tech-noir. The overall video presents a cinematic photography realistic style.,``` </details>|<video src="https://github.com/user-attachments/assets/94ce62d9-5788-4912-8e89-b7dc84d7bdc4" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```黑色背景上展示着艺术字体"Hunyuan Video 1.5",每个字母都由不同的流体构成,持续缓慢流动。多种不同质地、不互溶的彩色液体(如金属、牛奶、透明凝胶)在无重力环境中漂浮、碰撞``` </details> <details><summary>📋 Show rewrite prompt</summary> ```The artistic words "Hunyuan Video 1.5" are rendered in the center of the screen, with each character composed of a unique, slowly moving fluid, set against a deep black background, while colorful, immiscible liquid blobs float and collide around them in a zero-gravity environment. The main subject is the text "Hunyuan Video 1.5". The characters for "Hunyuan" are filled with a lustrous, molten gold liquid that swirls slowly. The letters for "Video" are composed of a creamy, opaque white fluid resembling milk, with gentle currents visible beneath its surface. The numbers "1.5" are made from a viscous, transparent blue gel that subtly undulates. Each fluid moves independently within the confines of its character's shape, creating a mesmerizing internal motion. This high-quality 3D CGI animation presents the fluids with photorealistic textures. In the surrounding space, several immiscible liquid blobs drift in zero gravity. A large, spherical blob of pearlescent liquid slowly floats from the upper left. A smaller, amorphous blob of shimmering, metallic silver drifts from the lower right, and a translucent, pink gelatinous mass wobbles nearby. Initially, these blobs drift aimlessly. Then, the silver blob slowly collides with the larger pearlescent one. As they make contact, their surfaces deform and ripple dynamically, but the liquids do not mix, pushing against each other before gently bouncing off and continuing their slow, separate paths in the pristine black void. The shot is at an eye-level angle, presenting a front view of the text. The camera remains static, ensuring the entire text "Hunyuan Video 1.5" is fully visible throughout the shot. The scene is lit by a soft, diffused light that highlights the brilliant reflections on the metallic fluids and the inner glow of the translucent gels, enhancing the high-quality 3D CGI animation. The atmosphere is quiet, abstract, and mesmerizing. The overall video has the polished look of a high-quality 3D CGI animation with a focus on abstract fluid dynamics.``` </details>|
|物理合理性|<video src="https://github.com/user-attachments/assets/07fa4dcd-0bd1-4935-bb89-323428cce6fc" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```The wind blows through the shabby bookshelf, and the pages flutter on it. ``` </details> <details><summary>📋 Show rewrite prompt</summary> ```In a dimly lit, dusty room, a gentle wind causes the pages of old books on a shabby wooden bookshelf to flutter. The bookshelf, made of dark, weathered wood, shows signs of age with peeling varnish, scratches, and a fine layer of settled dust on its surfaces. Several old books with faded, worn covers are arranged on the shelves; some stand upright while others lie on their sides. Initially, the scene is quiet. Then, a soft breeze enters the frame from the left, disturbing the dust on the shelves. Next, the yellowed, brittle pages of an open book lying flat begin to lift and ripple delicately. As the breeze continues, the pages of other books also start to flutter, some turning over slowly and gracefully, revealing aged text and faint illustrations within. In the background, the wall has faded, peeling wallpaper, and the overall atmosphere is one of quiet neglect and the passage of time. The shot is at an eye-level angle with the main subject. The camera pans to the left slowly. Soft, diffused sunlight filters through a dusty, off-camera window, creating distinct beams of light that cut through the dimness. This lighting highlights the texture of the old wood and the floating dust particles in the air, enhancing the photorealistic detail of the scene. The mood is melancholic and peaceful. The overall video presents a cinematic photography realistic style.``` </details>|<video src="https://github.com/user-attachments/assets/81065925-c008-421b-8cf0-b3cbf1e77eac" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```An intact soda can is slowly crushed by a hand.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```In a medium close-up, a hand slowly crushes an intact red and white soda can on a wooden table. A male hand with visible, realistic skin texture is wrapped firmly around the middle of an intact, pristine red and white aluminum soda can. The can, covered in glistening condensation droplets, rests on a dark, polished wooden surface. The cinematic realism captures every minute detail of the scene. Initially, the hand's grip is steady, with the can's cylindrical shape perfectly preserved. Then, the fingers begin to tighten slowly, the knuckles whitening slightly from the exertion. Next, the smooth aluminum surface starts to buckle under the controlled pressure, a sharp crease forming vertically down its side as the metallic sheen distorts. As the hand continues its deliberate squeeze, the can collapses inward progressively, the vibrant red paint wrinkling as the metal structure crumples. Finally, the can is left significantly crushed, its form now an irregular, crumpled shape held tightly in the fist. The scene takes place on a dark, polished wooden tabletop that catches soft, diffuse reflections. The grain of the wood is faintly discernible, adding a layer of texture to the foreground. The background is completely out of focus, rendered as a soft, dark, and non-descript blur, which isolates the main action and enhances the photorealistic quality of the shot. The shot is a medium close-up, presented in a cinematic photography realistic style. The camera remains static at a slightly high angle, looking down to provide a clear and unobstructed view of the can's deformation. Soft side lighting creates high contrast, sculpting the muscles and tendons of the hand while casting specular highlights on the metallic can and the water droplets. The atmosphere is focused and intense. The overall video presents a cinematic photography realistic style.``` </details>|
|文字渲染|<video src="https://github.com/user-attachments/assets/7c964fc5-c27e-4bd0-bf3f-eb8fca2caef6" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```赛博朋克风格的夜晚街角,一个巨大的招牌上, “Hunyuan Video 1.5”的霓虹灯管轮廓已经安装好。镜头推进霓虹灯从“H”开始伴随着滋滋的电流声每个字母依次亮起粉紫色的光芒直到全部点亮照亮了潮湿的街道。赛博朋克城市美学``` </details> <details><summary>📋 Show rewrite prompt</summary> ```On a wet street corner in a cyberpunk city at night, a large neon sign reading "Hunyuan Video 1.5" lights up sequentially, illuminating the dark, rainy environment with a pinkish-purple glow. he scene is a dark, rain-slicked street corner in a futuristic, cinematic cyberpunk city. Mounted on the metallic, weathered facade of a building is a massive, unlit neon sign. The sign's glass tube framework clearly spells out the words "Hunyuan Video 1.5". Initially, the street is dimly lit, with ambient light from distant skyscrapers creating shimmering reflections on the wet asphalt below. Then, the camera zooms in slowly toward the sign. As it moves, a low electrical sizzling sound begins. In the background, the dense urban landscape of the cyberpunk metropolis is visible through a light atmospheric haze, with towering structures adorned with their own flickering advertisements. A complex web of cables and pipes crisscrosses between the buildings. The shot is at a low angle, looking up at the sign to emphasize its grand scale. The lighting is high-contrast and dramatic, dominated by the neon glow which creates sharp, specular reflections and deep shadows. The atmosphere is moody and tech-noir. The overall video presents a cinematic photography realistic style.,``` </details>|<video src="https://github.com/user-attachments/assets/73e8b741-baec-4a40-9d36-a1435172ab64" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```一张铺开的中国宣纸上,浓墨滴入水中,晕染出壮丽的山水画轮廓。山峰、云雾、孤舟在墨色中自然形成。随后,这些水墨元素巧妙地流动、重组,在画面的留白处汇聚成"Hunyuan Video 1.5"的书法字体。优雅,诗意,文化底蕴``` </details> <details><summary>📋 Show rewrite prompt</summary> ```A drop of black ink blooms on wet Chinese Xuan paper, forming a landscape painting before the ink elements fluidly reassemble into the calligraphic text "Hunyuan Video 1.5". On a flat, laid-out sheet of off-white Chinese Xuan paper with a subtle, fibrous texture, the scene unfolds. Initially, a single, concentrated drop of deep black ink falls into a clear, wet area at the center of the paper. Then, the ink instantly begins to bloom outwards in intricate, flowing tendrils of varying shades from jet-black to smoky grey. As it spreads, the ink wash naturally and rapidly forms the silhouette of a majestic mountain range with sharp, defined peaks. Next, softer, diluted grey tones billow around the mountains, creating layers of atmospheric mist and clouds, while a simple, dark stroke materializes as a lone boat on a tranquil, watery expanse at the base. As the landscape is formed, the ink elements—the lines of the mountains, wisps of cloud, and the shape of the boat—begin to deconstruct, dissolving into flowing streams of liquid ink. Finally, these streams move gracefully across the paper's empty white space, converging and elegantly reorganizing to form the text "Hunyuan Video 1.5" in a fluid, semi-cursive calligraphic style. The background is the minimalist expanse of the Xuan paper itself, its texture providing a subtle depth. The entire process is lit by soft, even, diffused light from above, which enhances the rich tonal variations of the ink and the delicate texture of the paper without creating harsh shadows. Bird's-eye view. The camera is positioned directly above the subject, capturing the entire process. The camera remains static. The aesthetic is a high-quality, dynamic Chinese ink wash animation style, perfectly simulating the real-world physics of ink spreading on wet paper. The entire sheet of paper and the final text are kept fully within the frame. Poetic, elegant, artistic. The overall video presents a dynamic Chinese ink wash animation style.``` </details>|
|物理合理性|<video src="https://github.com/user-attachments/assets/f1d74e48-cc03-415d-b75f-f7186a4fb41d" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```In a sleek museum gallery, a woman pauses before a gilded oil painting. The painted man inside slowly comes alive, lifting a bottle and pouring real wine straight from the canvas into her glass. Surrounded by stylish art critics moving naturally through the hall, she accepts the pour with calm elegance, as if the impossible were routine. ``` </details> <details><summary>📋 Show rewrite prompt</summary> ```In a sleek museum gallery, a woman receives a glass of wine poured directly from an animated oil painting. A sophisticated woman with dark hair tied back elegantly stands in the mid-ground. She is wearing a simple, black silk sleeveless dress and holds a clear, crystal wine glass in her right hand. She is positioned before a large, baroque-style oil painting in an ornate, gilded frame. Inside the painting, an aristocratic man with a mustache, dressed in a dark velvet doublet with a white lace collar, is depicted. His form is defined by visible, impasto oil brushstrokes. Initially, the woman watches the painting with calm poise. Then, the painted man's arm slowly animates, his painted texture retained as he lifts a dark bottle. Next, a photorealistic stream of red wine emerges directly from the flat canvas surface, arcing through the air and splashing gently into the real crystal glass she holds. She remains perfectly still, accepting the impossible pour with a subtle, knowing smile. The setting is a modern art gallery with high white walls and polished dark concrete floors that reflect the ambient light. Focused track lighting from the high ceiling casts a warm, dramatic spotlight on the woman and the painting, creating soft shadows. In the background, two other gallery patrons, a man and a woman in stylish, modern attire, stroll slowly from right to left, their figures slightly blurred by a shallow depth of field, moving naturally through the hall. The shot is at an eye-level angle with the woman. The camera remains static, capturing the surreal event in a steady medium shot. The lighting is high-contrast and dramatic, reminiscent of a cinematic photography realistic style, using soft side lighting to accentuate the woman's features and the texture of the painting. The mood is surreal, elegant, and mysterious. The overall video presents a cinematic photography realistic style.``` </details>|<video src="https://github.com/user-attachments/assets/07bcce06-ff4f-4688-8c60-c02f600635ea" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```An intact soda can is slowly crushed by a hand.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```In a medium close-up, a hand slowly crushes an intact red and white soda can on a wooden table. A male hand with visible, realistic skin texture is wrapped firmly around the middle of an intact, pristine red and white aluminum soda can. The can, covered in glistening condensation droplets, rests on a dark, polished wooden surface. The cinematic realism captures every minute detail of the scene. Initially, the hand's grip is steady, with the can's cylindrical shape perfectly preserved. Then, the fingers begin to tighten slowly, the knuckles whitening slightly from the exertion. Next, the smooth aluminum surface starts to buckle under the controlled pressure, a sharp crease forming vertically down its side as the metallic sheen distorts. As the hand continues its deliberate squeeze, the can collapses inward progressively, the vibrant red paint wrinkling as the metal structure crumples. Finally, the can is left significantly crushed, its form now an irregular, crumpled shape held tightly in the fist. The scene takes place on a dark, polished wooden tabletop that catches soft, diffuse reflections. The grain of the wood is faintly discernible, adding a layer of texture to the foreground. The background is completely out of focus, rendered as a soft, dark, and non-descript blur, which isolates the main action and enhances the photorealistic quality of the shot. The shot is a medium close-up, presented in a cinematic photography realistic style. The camera remains static at a slightly high angle, looking down to provide a clear and unobstructed view of the can's deformation. Soft side lighting creates high contrast, sculpting the muscles and tendons of the hand while casting specular highlights on the metallic can and the water droplets. The atmosphere is focused and intense. The overall video presents a cinematic photography realistic style.``` </details>|
|摄像机运动|<video src="https://github.com/user-attachments/assets/6deacbfe-4cca-48d7-a2be-cb638a3e01cb" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```圣诞节的家中,小女孩靠着妈妈听妈妈读书,背景是下着雪的窗外,镜头缓慢下移,一只可爱的长毛小白猫戴着圣诞帽趴在温暖的地摊上``` </details> <details><summary>📋 Show rewrite prompt</summary> ```In a cozy home on Christmas, a young girl leans against her mother as they read a book, and the camera moves down to reveal a fluffy white cat in a Santa hat resting on a warm rug. In a warmly lit living room on a snowy Christmas evening, a young mother and her little daughter are sitting together on a comfortable sofa. The mother, with a gentle expression and wearing a cream-colored knitted sweater, holds an open storybook with colorful illustrations. Her daughter, a small girl with brown hair in pigtails and a red pajama set, leans her head affectionately on her mother's shoulder, her eyes fixed on the book. On the floor below them, a fluffy, long-haired white cat is curled up on a plush, beige wool rug. The cat wears a tiny red and white Santa hat perched between its ears. Initially, the shot focuses on the mother and daughter, capturing their quiet, shared moment. The mothers finger gently rests on the page of the book. Then, the camera slowly moves downward, gliding past the book and their laps. Finally, the camera settles at a low angle, bringing the adorable white cat into sharp focus as the primary subject. The cat's chest gently rises and falls with each breath, its eyes peacefully closed. Through a large window in the background, large, soft snowflakes can be seen falling silently against the dark blue twilight sky, creating a peaceful and serene backdrop. Faint, out-of-focus golden Christmas lights twinkle in the corner of the room, adding to the warm, festive atmosphere. The scene is imbued with a sense of comfort and holiday warmth, creating a beautiful cinematic photography realistic image. The camera slowly moves downward. The shot uses soft, warm interior lighting that casts gentle shadows, creating a high-contrast, cinematic look. A shallow depth of field keeps the focus on the subjects while beautifully blurring the background elements. The mood is heartwarming, peaceful, and festive. The overall video presents a cinematic photography realistic style.``` </details>|<video src="https://github.com/user-attachments/assets/8e72ed0f-f8ac-445b-97e5-eb4b16fbc121" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```The hiker begins walking forward along the trail, causing the water bottle to swing rhythmically with each step. The camera gradually pulls back and rises to reveal a vast desert landscape stretching out ahead.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```The hiker begins walking forward along the trail, causing the water bottle to swing rhythmically with each step. The camera gradually pulls back and rises to reveal a vast desert landscape stretching out ahead, while the sun position shifts from afternoon to dusk, casting increasingly longer shadows across the terrain as the figure becomes smaller in the frame.``` </details>|
|多风格支持|<video src="https://github.com/user-attachments/assets/65b2c5a5-e6ba-43be-9462-a98b03b675f1" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```Have the cake man begin to take chunks out of himself and eat it.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```The cake man sits on the chair, with his hands resting on his knees. Then, he slowly raises his right hand and breaks off a piece of cake from his left shoulder. Next, he brings the piece of cake to his mouth and begins to chew. At the same time, his eyes widen slightly, and his mouth parts gently. After that, he raises his right hand again, breaks off another piece of cake from his right arm, and repeats the action of bringing it to his mouth to chew.``` </details>|<video src="https://github.com/user-attachments/assets/de5f7480-b79c-4fc1-b345-c5880a3b5f9e" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```A little girl, carrying a colorful handbag, skips through the garden. The video uses claymation style.``` </details> <details><summary>📋 Show rewrite prompt</summary> ```A little girl with a colorful handbag skips through a whimsical claymation garden. In a vibrant garden constructed entirely from clay, a young girl, meticulously crafted in a claymation style, skips joyfully. She has chunky, sculpted yellow clay hair tied in pigtails that bounce with a slight stiffness, simple black button eyes, and a wide, permanently etched smile. She wears a simple pink clay dress with a white collar. In her left hand, she carries a small handbag molded from bright red and blue clay, which swings in a slightly jerky arc as she moves. Initially, the girl lifts her right leg high, her body momentarily suspended in a classic stop-motion pose. Then, she hops forward, landing lightly as her left leg swings through for the next skip. Her arms move in an exaggerated, back-and-forth rhythm, characteristic of stop-motion animation. Her movements are intentionally not perfectly fluid, highlighting the frame-by-frame nature of the claymation technique. The garden around her is a whimsical, textured world. In the foreground and mid-ground, oversized flowers with swirled purple and orange petals stand on thick green stems. The ground is a textured mat of green clay, showing subtle fingerprints and tool marks that add to the handmade charm. In the background, a pale blue clay backdrop features a simplified, smiling sun molded from yellow clay. The shot is at an eye-level angle with the main subject. The camera follows the subject, moving smoothly to the right to keep her in the frame. The lighting is bright and even, casting soft shadows that emphasize the rounded, three-dimensional forms of the clay models. The overall video presents a charming and detailed claymation style.``` </details>|
|高图视一致性|<img src="https://github.com/user-attachments/assets/3bc8e55d-c211-454e-8067-128c0e215eb6"> <video src="https://github.com/user-attachments/assets/3e6b7ee9-ec66-4e46-a446-801b1c1a1c81" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```女孩放下书,站起身,转身向屋内走去。镜头拉远。``` </details> <details><summary>📋 Show rewrite prompt</summary> ```女孩合上手中的书,将书放在身侧的窗台上。随后,她缓缓站起身,转身向屋内走去,身影逐渐没入门后的阴影中。镜头缓缓拉远,露出更多被绿植覆盖的屋檐和墙体。``` </details>|<img src="https://github.com/user-attachments/assets/7657ce60-90b5-4fdc-b713-0eaa55829b09"> <video src="https://github.com/user-attachments/assets/9ca24021-2353-40d5-8a4d-0f8e67d51826" width="600"> </video> <details><summary>📋 Show input prompt</summary> ```女人手上的鸟亲了女人一口``` </details> <details><summary>📋 Show rewrite prompt</summary> ```女人手臂上的白色鹦鹉缓缓转过头,将喙轻轻触碰女人的脸颊,随后收回头部。女人嘴角微微上扬,目光温柔地注视着鹦鹉。背景中的绿植保持静止。``` </details>|
## 📊 性能评估
### 评分
我们使用全面的评分方法来评估文生视频生成,考虑了五个关键维度:文本-视频一致性、视觉质量、结构稳定性、运动效果以及单帧的美学质量。对于图生视频生成,评估包括图像-视频一致性、指令响应性、视觉质量、结构稳定性和运动效果。
@ -310,6 +320,16 @@ GSBGood/Same/Bad评估法被广泛用于基于整体视频感知质量来
<img src="./assets/I2V_GSB.png" alt="gsb result of i2v" width="800">
</div>
### 推理速度
我们在8块H800 GPU上启用了基础工程级加速技术报告推理速度以展示在实际部署场景中可实现的实用性能。
请注意,在本实验中,我们不以牺牲生成质量为代价追求最极端的加速,而是在保持几乎相同的输出质量的同时实现显著的速度提升。
我们在下方报告了HunyuanVideo-1.5在50个扩散步数下的总推理时间
<div align="center">
<img src="./assets/speed.png" alt="" width="100%">
</div>
## 📚 引用
```bibtex
@ -326,3 +346,11 @@ GSBGood/Same/Bad评估法被广泛用于基于整体视频感知质量来
我们要感谢 [Transformers](https://github.com/huggingface/transformers), [Diffusers](https://github.com/huggingface/diffusers) , [HuggingFace](https://huggingface.co/) 以及 [Qwen-VL](https://github.com/QwenLM/Qwen-VL)的贡献者,感谢他们的公开研究和探索。
## 🌟 GitHub Star 历史
<a href="https://star-history.com/#Tencent-Hunyuan/HunyuanVideo-1.5&Date">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=Tencent-Hunyuan/HunyuanVideo-1.5&type=Date1&theme=dark" />
<source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=Tencent-Hunyuan/HunyuanVideo-1.5&type=Date1" />
<img alt="Star History Chart" src="https://api.star-history.com/svg?repos=Tencent-Hunyuan/HunyuanVideo-1.5&type=Date1" />
</picture>
</a>

BIN
assets/speed.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 91 KiB

44
config.json Normal file
View File

@ -0,0 +1,44 @@
{
"_class_name": "HunyuanVideo_1_5_Pipeline",
"_diffusers_version": "0.35.0",
"byt5_max_length": 256,
"byt5_model": [
"transformers",
"T5Stack"
],
"byt5_tokenizer": [
"transformers",
"ByT5Tokenizer"
],
"default_negative_prompt": null,
"embedded_guidance_scale": null,
"flow_shift": 7.0,
"glyph_byT5_v2": true,
"guidance_scale": 6.0,
"scheduler": [
"hyvideo.schedulers.scheduling_flow_match_discrete",
"FlowMatchDiscreteScheduler"
],
"text_encoder": [
"hyvideo.models.text_encoders",
"TextEncoder"
],
"text_encoder_2": [
null,
null
],
"transformer": [
"hyvideo.models.transformers.hunyuanvideo_1_5_transformer",
"HunyuanVideo_1_5_DiffusionTransformer"
],
"vae": [
"hyvideo.models.autoencoders.hunyuanvideo_15_vae",
"AutoencoderKLConv3D"
],
"vision_encoder": [
"hyvideo.models.vision_encoder",
"VisionEncoder"
],
"vision_num_semantic_tokens": 729,
"vision_states_dim": 1152
}

View File

@ -0,0 +1,12 @@
{
"_class_name": "SRTo1080pUpsampler",
"_diffusers_version": "0.35.0",
"block_out_channels": [
256,
512
],
"is_residual": false,
"num_res_blocks": 2,
"out_channels": 32,
"z_channels": 32
}

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a3630a7e0ab21084aa69e5a649f59fba34a7110733c9ca3d4fee781323758198
size 201404760

View File

@ -0,0 +1,9 @@
{
"_class_name": "SRTo720pUpsampler",
"_diffusers_version": "0.35.0",
"global_residual": false,
"hidden_channels": 128,
"in_channels": 32,
"num_blocks": 16,
"out_channels": 32
}

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9f04d9306f4f2159bd6351f069c0d0e7e0b8c1dc047eb7fd3954f0d357f85203
size 85854616