update

2026-07-16 14:32:52 +08:00 · 2025-12-15 06:49:33 +00:00
parent be169d77b5
commit 60ce579dc2
1 changed files with 20 additions and 24 deletions
--- a/README.md
+++ b/README.md
@ -1,8 +1,3 @@
 ---
 frameworks:
 - ""
 tasks: []
 ---
 [![SVG Banners](https://svg-banners.vercel.app/api?type=origin&text1=CosyVoice🤠&text2=Text-to-Speech%20💖%20Large%20Language%20Model&width=800&height=210)](https://github.com/Akshay090/svg-banners)
 ## 👉🏻 CosyVoice 👈🏻
@ -17,7 +12,7 @@ tasks: []
 **Fun-CosyVoice 3.0** is an advanced text-to-speech (TTS) system based on large language models (LLM), surpassing its predecessor (CosyVoice 2.0) in content consistency, speaker similarity, and prosody naturalness. It is designed for zero-shot multilingual speech synthesis in the wild.
 ### Key Features
- **Language Coverage**: Covers 9 common languages (Chinese, English, Japanese, Korean, German, Spanish, French, Italian, Russian), 18+ Chinese dialects/accents (Guangdong, Minnan, Sichuan, Dongbei, Shan3xi, Shan1xi, Shanghai, Tianjin, Shan1dong, Ningxia, Gansu, etc.) and meanwhile supports both multi-lingual/cross-lingual zero-shot voice cloning.
+- **Language Coverage**: Covers 9 common languages (Chinese, English, Japanese, Korean, German, Spanish, French, Italian, Russian), 18+ Chinese dialects/accents (Guangdong, Minnan, Sichuan, Dongbei, Shan3xi, Shan1xi, Shanghai, Tianjin, Shandong, Ningxia, Gansu, etc.) and meanwhile supports both multi-lingual/cross-lingual zero-shot voice cloning.
 - **Content Consistency & Naturalness**: Achieves state-of-the-art performance in content consistency, speaker similarity, and prosody naturalness.
 - **Pronunciation Inpainting**: Supports pronunciation inpainting of Chinese Pinyin and English CMU phonemes, providing more controllability and thus suitable for production use.
 - **Text Normalization**: Supports reading of numbers, special symbols and various text formats without a traditional frontend module.
@ -65,23 +60,25 @@ tasks: []
    - [x] Fastapi server and client
 ## Evaluation
-| Model | CER (%) ↓ (test-zh) | WER (%) ↓ (test-en) | CER (%) ↓ (test-hard) |
+
-|-----|------------------|------------------|------------------|
+| Model | Open-Source | Model Size | test-zh<br>CER (%) ↓ | test-zh<br>Speaker Similarity (%) ↑ | test-en<br>WER (%) ↓ | test-en<br>Speaker Similarity (%) ↑ | test-hard<br>CER (%) ↓ | test-hard<br>Speaker Similarity (%) |
-| Human | 1.26 | 2.14 | - |
+| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
-| F5-TTS | 1.53 | 2.00 | 8.67 |
+| Human | - | - | 1.26 | 75.5 | 2.14 | 73.4 | - | - |
-| SparkTTS | 1.20 | 1.98 | - |
+| Seed-TTS | ❌ | - | 1.12 | 79.6 | 2.25 | 76.2 | 7.59 | 77.6 |
-| Seed-TTS | 1.12 | 2.25 | 7.59 |
+| MiniMax-Speech | ❌ | - | 0.83 | 78.3 | 1.65 | 69.2 | - | - |
-| CosyVoice2 | 1.45 | 2.57 | 6.83 |
+| F5-TTS | ✅ | 0.3B | 1.52 | 74.1 | 2.00 | 64.7 | 8.67 | 71.3 |
-| FireRedTTS-2 | 1.14 | 1.95 | - |
+| Spark TTS | ✅ | 0.5B | 1.2 | 66.0 | 1.98 | 57.3 | - | - |
-| IndexTTS2 | 1.01 | 1.52 | 7.12 |
+| CosyVoice2 | ✅ | 0.5B | 1.45 | 75.7 | 2.57 | 65.9 | 6.83 | 72.4 |
-| VibeVoice | 1.16 | 3.04 | - |
+| FireRedTTS 2 | ✅ | 1.5B | 1.14 | 73.2 | 1.95 | 66.5 | - | - |
-| HiggsAudio | 1.79 | 2.44 | - |
+| Index-TTS2 | ✅ | 1.5B | 1.03 | 76.5 | 2.23 | 70.6 | 7.12 | 75.5 |
-| MiniMax-Speech | 0.83 | 1.65 | - |
+| VibeVoice-1.5B | ✅ | 1.5B | 1.16 | 74.4 | 3.04 | 68.9 | - | - |
-| VoxPCM | 0.93 | 1.85 | 8.87 |
+| VibeVoice-Realtime | ✅ | 0.5B | - | - | 2.05 | 63.3 | - | - |
-| GLM-TTS | 1.03 | - | - |
+| HiggsAudio-v2 | ✅ | 3B | 1.50 | 74.0 | 2.44 | 67.7 | - | - |
-| GLM-TTS_RL | 0.89 | - | - |
+| VoxCPM | ✅ | 0.5B | 0.93 | 77.2 | 1.85 | 72.9 | 8.87 | 73.0 |
-| Fun-CosyVoice3-0.5B-2512 | 1.21 |  2.24 | 6.71 |
+| GLM-TTS | ✅ | 1.5B | 1.03 | 76.1 | - | - | - | - |
-| Fun-CosyVoice3-0.5B-2512_RL | 0.81 | 1.68 | 5.44 |
+| GLM-TTS RL | ✅ | 1.5B | 0.89 | 76.4 | - | - | - | - |
 | Fun-CosyVoice3-0.5B | ✅ | 0.5B | 1.21 | 78.0 | 2.24 | 71.8 | 6.71 | 75.8 |
 | Fun-CosyVoice3-0.5B-2512 | ✅ | 0.5B | 0.81 | 77.4 | 1.68 | 69.5 | 5.44 | 75.0 |
 ## Install
@ -252,4 +249,3 @@ You can also scan the QR code to join our official Dingding chat group.
 ## Disclaimer
 The content provided above is for academic purposes only and is intended to demonstrate technical capabilities. Some examples are sourced from the internet. If any content infringes on your rights, please contact us to request its removal.