Upload folder using ModelScope SDK

This commit is contained in:
Cherrytest
2025-10-16 01:34:34 +00:00
parent bdba42b777
commit 5abc4d1b09
11 changed files with 258 additions and 42 deletions

8
.gitattributes vendored
View File

@ -45,3 +45,11 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text *.wasm filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text *.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text *tfevents* filter=lfs diff=lfs merge=lfs -text
seedvr2_ema_3b_fp8_e4m3fn.safetensors filter=lfs diff=lfs merge=lfs -text
seedvr2_ema_7b_fp8_e4m3fn.safetensors filter=lfs diff=lfs merge=lfs -text
seedvr2_ema_7b_sharp_fp8_e4m3fn.safetensors filter=lfs diff=lfs merge=lfs -text
seedvr2_ema_3b_fp16.safetensors filter=lfs diff=lfs merge=lfs -text
seedvr2_ema_7b_fp16.safetensors filter=lfs diff=lfs merge=lfs -text
seedvr2_ema_7b_sharp_fp16.safetensors filter=lfs diff=lfs merge=lfs -text
ema_vae_fp16.safetensors filter=lfs diff=lfs merge=lfs -text

270
README.md
View File

@ -1,48 +1,234 @@
--- ---
license: Apache License 2.0 license: apache-2.0
tags: [] pipeline_tag: video-to-video
library_name: diffusers
#model-type: tags:
##如 gpt、phi、llama、chatglm、baichuan 等 - art
#- gpt base_model:
- ByteDance-Seed/SeedVR2-7B
#domain: - ByteDance-Seed/SeedVR2-3B
##如 nlp、cv、audio、multi-modal
#- nlp
#language:
##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
#- cn
#metrics:
##如 CIDEr、Blue、ROUGE 等
#- CIDEr
#tags:
##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
#- pretrained
#tools:
##如 vllm、fastchat、llamacpp、AdaSeq 等
#- vllm
--- ---
### 当前模型的贡献者未提供更加详细的模型介绍。模型文件和权重,可浏览“模型文件”页面获取。
#### 您可以通过如下git clone命令或者ModelScope SDK来下载模型
SDK下载 # ComfyUI-SeedVR2_VideoUpscaler
[![View Code](https://img.shields.io/badge/📂_View_Code-GitHub-181717?style=for-the-badge&logo=github)](https://github.com/numz/ComfyUI-SeedVR2_VideoUpscaler)
Official release of [SeedVR2](https://github.com/ByteDance-Seed/SeedVR) for ComfyUI that enables Upscale Video/Images generation.
<img src="https://raw.githubusercontent.com/numz/ComfyUI-SeedVR2_VideoUpscaler/refs/heads/main/docs/demo_01.jpg">
<img src="https://raw.githubusercontent.com/numz/ComfyUI-SeedVR2_VideoUpscaler/refs/heads/main/docs/demo_02.jpg">
<img src="https://raw.githubusercontent.com/numz/ComfyUI-SeedVR2_VideoUpscaler/refs/heads/main/docs/usage.png">
## 📋 Quick Access
- [🆙 Note and futur releases](#-note-and-futur-releases)
- [🚀 Updates](#-updates)
- [🎯 Features](#-features)
- [🔧 Requirements](#-requirements)
- [📦 Installation](#-installation)
- [📖 Usage](#-usage)
- [📊 Benchmarks](#-benchmarks)
- [⚠️ Limitations](#-Limitations)
- [🤝 Contributing](#-contributing)
- [🙏 Credits](#-credits)
- [📄 License](#-license)
## 🆙 Note and futur releases
- Improve FP8 integration, we are loosing some FP8 advantages during the process.
- Tile-VAE integration if it works for video, I have test to do or if some dev want help, you are welcome.
- 7B FP8 model seems to have quality issues, use 7BFP16 instead (If FP8 don't give OOM then FP16 will works) I have to review this.
## 🚀 Updates
**2025.06.30**
- 🚀 Speed Up the process and less VRAM used (see new benchmark).
- 🛠️ Fixed leak memory on 3B models.
- ❌ Can now interrupt process if needed.
- ✅ refactored the code for better sharing with the community, feel free to propose pull requests.
- 🛠️ Removed flash attention dependency
**2025.06.24**
- 🚀 Speed up the process until x4 (see new benchmark)
**2025.06.22**
- 💪 FP8 compatibility !
- 🚀 Speed Up all Process
- 🚀 less VRAM consumption (Stay high, batch_size=1 for RTX4090 max, I'm trying to fix that)
- 🛠️ Better benchmark coming soon
**2025.06.20**
- 🛠️ Initial push
## 🎯 Features
- High-quality Upscaling
- Suitable for any video length once the right settings are found
- Model Will Be Download Automatically from [Models](https://huggingface.co/numz/SeedVR2_comfyUI/tree/main)
## 🔧 Requirements
- A Huge VRAM capabilities is better, from my test, even the 3B version need a lot of VRAM at least 18GB.
- Last ComfyUI version with python 3.12.9 (may be works with older versions but I haven't test it)
## 📦 Installation
1. Clone this repository into your ComfyUI custom nodes directory:
```bash ```bash
#安装ModelScope cd ComfyUI/custom_nodes
pip install modelscope git clone https://github.com/numz/ComfyUI-SeedVR2_VideoUpscaler.git
```
```python
#SDK模型下载
from modelscope import snapshot_download
model_dir = snapshot_download('numz/SeedVR2_comfyUI')
```
Git下载
```
#Git模型下载
git clone https://www.modelscope.cn/numz/SeedVR2_comfyUI.git
``` ```
<p style="color: lightgrey;">如果您是本模型的贡献者,我们邀请您根据<a href="https://modelscope.cn/docs/ModelScope%E6%A8%A1%E5%9E%8B%E6%8E%A5%E5%85%A5%E6%B5%81%E7%A8%8B%E6%A6%82%E8%A7%88" style="color: lightgrey; text-decoration: underline;">模型贡献文档</a>,及时完善模型卡片内容。</p> 2. Install the required dependencies:
load venv and :
```bash
pip install -r ComfyUI-SeedVR2_VideoUpscaler/requirements.txt
```
install flash_attn/triton, 6% faster on process, not a mandatory.
```bash
pip install flash_attn
pip install triton
```
or
```bash
python_embeded\python.exe -m pip install -r flash_attn
```
check here from https://github.com/loscrossos/lib_flashattention/releases and https://github.com/woct0rdho/triton-windows
3. Models
Will be automtically download into :
`models/SEEDVR2`
or can be found here ([MODELS](https://huggingface.co/numz/SeedVR2_comfyUI/tree/main))
## 📖 Usage
1. In ComfyUI, locate the **SeedVR2 Video Upscaler** node in the node menu.
<img src="https://raw.githubusercontent.com/numz/ComfyUI-SeedVR2_VideoUpscaler/refs/heads/main/docs/node.png" width="100%">
2. ⚠️ **THINGS TO KNOW !!**
**temporal consistency** : at least a **batch_size** of 5 is required to activate temporal consistency. SEEDVR2 need at least 5 frames to calculate it. A higher batch_size give better performances/results but need more than 24GB VRAM.
**VRAM usage** : The input video resolution impacts VRAM consumption during the process. The larger the input video, the more VRAM will consume during the process. So, if you experience OOMs with a batch_size of at least 5, try reducing the input video resolution until it resolves.
Of course, the output resolution also has an impact, so if your hardware doesn't allow it, reduce the output resolution.
3. Configure the node parameters:
- `model`: Select your 3B or 7B model
- `seed`: a seed but it generate another seed from this one
- `new_resolution`: New desired short edge in px, will keep ratio on other edge
- `batch_size`: VERY IMPORTANT!, this model consume a lot of VRAM, All your VRAM, even for the 3B model, so for GPU under 24GB VRAM keep this value Low, good value is "1" without temporal consistency, "5" for temporal consistency, but higher is this value better is the result.
- `preserve_vram`: for VRAM < 24GB, If true, It will unload unused models during process, longer but works, otherwise probably OOM with
## 📊 Benchmarks
**7B models on NVIDIA H100 93GB VRAM** (values in parentheses are from the previous benchmark):
| nb frames | Resolution | Batch Size | execution time fp8 (s) | FPS fp8 | execution time fp16 (s) | FPS fp16 | perf progress since start |
| --------- | ------------------- | ---------- | ---------------------- | ----------- | ----------------------- | ------------------ | ------------------------- |
| 15 | 512×768 1080×1620 | 5 | 23.75 (26.71) | 0.63 (0.56) | 24.23 (27.75) | 0.61 (0.54) (0.10) | x6.1 |
| 27 | 512×768 1080×1620 | 9 | 27.75 (33.97) | 0.97 (0.79) | 28.48 (35.08) | 0.94 (0.77) (0.15) | x6.2 |
| 39 | 512×768 1080×1620 | 13 | 32.02 (41.01) | 1.21 (0.95) | 32.62 (42.08) | 1.19 (0.93) (0.19) | x6.2 |
| 51 | 512×768 1080×1620 | 17 | 36.39 (48.12) | 1.40 (1.06) | 37.30 (49.44) | 1.36 (1.03) (0.21) | x6.4 |
| 63 | 512×768 1080×1620 | 21 | 40.80 (55.40) | 1.54 (1.14) | 41.32 (56.70) | 1.52 (1.11) (0.23) | x6.6 |
| 75 | 512×768 1080×1620 | 25 | 45.37 (62.60) | 1.65 (1.20) | 45.79 (63.80) | 1.63 (1.18) (0.24) | x6.8 |
| 123 | 512×768 1080×1620 | 41 | 62.44 (91.38) | 1.96 (1.35) | 62.28 (92.90) | 1.97 (1.32) (0.28) | x7.0 |
| 243 | 512×768 1080×1620 | 81 | 106.13 (164.25) | 2.28 (1.48) | 104.68 (166.09) | 2.32 (1.46) (0.31) | x7.4 |
| 363 | 512×768 1080×1620 | 121 | 151.01 (238.18) | 2.40 (1.52) | 148.67 (239.80) | 2.44 (1.51) (0.33) | x7.4 |
| 453 | 512×768 1080×1620 | 151 | 186.98 (296.52) | 2.42 (1.53) | 184.11 (298.65) | 2.46 (1.52) (0.33) | x7.4 |
| 633 | 512×768 1080×1620 | 211 | 253.77 (406.65) | 2.49 (1.56) | 249.43 (409.44) | 2.53 (1.55) (0.34) | x7.4 |
| 903 | 512×768 1080×1620 | 301 | OOM (OOM) | (OOM) | OOM (OOM) | (OOM) (OOM) | |
| 149 | 854x480 1920x1080 | 149 | | | 450.22 | 0.41 | |
**3B FP8 models on NVIDIA H100 93GB VRAM** (values in parentheses are from the previous benchmark):
| nb frames | Resolution | Batch Size | execution time fp8 (s) | FPS fp8 | execution time fp16 (s) | FPS fp16 |
| --------- | ------------------- | ---------- | ---------------------- | ------- | ----------------------- | -------- |
| 149 | 854x480 1920x1080 | 149 | 361.22 | 0.41 | | |
**NVIDIA RTX4090 24GB VRAM**
| Model | nb frames | Resolution | Batch Size | execution time (seconds) | FPS | Note |
| ------- | --------- | ------------------- | ---------- | ------------------------ | ----------- | ---------------------------------------- |
| 3B fp8 | 5 | 512x768 1080x1620 | 1 | 14.66 (22.52) | 0.34 (0.22) | |
| 3B fp16 | 5 | 512x768 1080x1620 | 1 | 17.02 (27.84) | 0.29 (0.18) | |
| 7B fp8 | 5 | 512x768 1080x1620 | 1 | 46.23 (75.51) | 0.11 (0.07) | preserve_memory=on |
| 7B fp16 | 5 | 512x768 1080x1620 | 1 | 43.58 (78.93) | 0.11 (0.06) | preserve_memory=on |
| 3B fp8 | 10 | 512x768 1080x1620 | 5 | 39.75 | 0.25 | preserve_memory=on |
| 3B fp8 | 100 | 512x768 1080x1620 | 5 | 322.77 | 0.31 | preserve_memory=on |
| 3B fp8 | 1000 | 512x768 1080x1620 | 5 | 3624.08 | 0.28 | preserve_memory=on |
| 3B fp8 | 20 | 512x768 1080x1620 | 1 | 40.71 (65.40) | 0.49 (0.31) | |
| 3B fp16 | 20 | 512x768 1080x1620 | 1 | 44.76 (91.12) | 0.45 (0.22) | |
| 3B fp8 | 20 | 512x768 1280x1920 | 1 | 61.14 (89.10) | 0.33 (0.22) | |
| 3B fp8 | 20 | 512x768 1480x2220 | 1 | 79.66 (136.08) | 0.25 (0.15) | |
| 3B fp8 | 20 | 512x768 1620x2430 | 1 | 125.79 (191.28) | 0.16 (0.10) | preserve_memory=off (preserve_memory=on) |
| 3B fp8 | 149 | 854x480 1920x1080 | 5 | 782.76 | 0.19 | preserve_memory=on |
## ⚠️ Limitations
- Use a lot of VRAM, it will take all!!
- Processing speed depends on GPU capabilities
## 🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
### How to contribute:
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request
### Development Setup:
1. Clone the repository
2. Install dependencies
3. Make your changes
4. Test your changes
5. Submit a pull request
### Code Style:
- Follow the existing code style
- Add comments for complex logic
- Update documentation if needed
- Ensure all tests pass
### Reporting Issues:
When reporting issues, please include:
- Your system specifications
- ComfyUI version
- Python version
- Error messages
- Steps to reproduce the issue
## 🙏 Credits
- Original [SeedVR2](https://github.com/ByteDance-Seed/SeedVR) implementation
# 📜 License
- The code in this repository is released under the MIT license as found in the [LICENSE file](LICENSE).

0
config.json Normal file
View File

1
configuration.json Normal file
View File

@ -0,0 +1 @@
{"framework": "pytorch", "task": "video-to-video", "allow_remote": true}

3
ema_vae_fp16.safetensors Normal file
View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:20678548f420d98d26f11442d3528f8b8c94e57ee046ef93dbb7633da8612ca1
size 501324814

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2fd0e03a3dad24e07086750360727ca437de4ecd456f769856e960ae93e2b304
size 6783018808

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:3bf1e43ebedd570e7e7a0b1b60d6a02e105978f505c8128a241cde99a8240cff
size 3391544696

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7b8241aa957606ab6cfb66edabc96d43234f9819c5392b44d2492d9f0b0bbe4a
size 16479334424

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:1fdbf3877b7d1eb266038d3a165a977f17dbb4daa4a0f0d334d5461476963037
size 8239729704

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:20a93e01ff24beaeebc5de4e4e5be924359606c356c9c51509fba245bd2d77dd
size 16479334424

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4261d18fd9c331f4c8f14b475c9148bd8c3f1512240ace55fe31a179e0a960b0
size 8239729704