Update README.md

This commit is contained in:
Cherrytest
2025-09-05 07:37:09 +00:00
parent 026645669b
commit 2b946f9d29
5 changed files with 3763 additions and 4 deletions

2
.gitattributes vendored
View File

@ -47,3 +47,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text *tfevents* filter=lfs diff=lfs merge=lfs -text
s3gen.safetensors filter=lfs diff=lfs merge=lfs -text s3gen.safetensors filter=lfs diff=lfs merge=lfs -text
t3_cfg.safetensors filter=lfs diff=lfs merge=lfs -text t3_cfg.safetensors filter=lfs diff=lfs merge=lfs -text
Cangjie5_TC.json filter=lfs diff=lfs merge=lfs -text

BIN
Cangjie5_TC.json (Stored with Git LFS) Normal file

Binary file not shown.

View File

@ -1,12 +1,36 @@
--- ---
license: mit license: mit
language: language:
- ar
- da
- de
- el
- en - en
- es
- fi
- fr
- he
- hi
- it
- ja
- ko
- ms
- nl
- no
- pl
- pt
- ru
- sv
- sw
- tr
- zh
pipeline_tag: text-to-speech
tags: tags:
- text-to-speech - text-to-speech
- speech generation - speech
- speech-generation
- voice-cloning - voice-cloning
pipeline_tag: text-to-speech - multilingual-tts
library_name: chatterbox library_name: chatterbox
--- ---
@ -31,15 +55,17 @@ library_name: chatterbox
<img width="100" alt="resemble-logo-horizontal" src="https://github.com/user-attachments/assets/35cf756b-3506-4943-9c72-c05ddfa4e525" /> <img width="100" alt="resemble-logo-horizontal" src="https://github.com/user-attachments/assets/35cf756b-3506-4943-9c72-c05ddfa4e525" />
</div> </div>
**09/04 🔥 Introducing Chatterbox Multilingual in 23 Languages!**
We're excited to introduce Chatterbox, [Resemble AI's](https://resemble.ai) first production-grade open source TTS model. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs, and is consistently preferred in side-by-side evaluations. We're excited to introduce **Chatterbox** and **Chatterbox Multilingual**, [Resemble AI's](https://resemble.ai) production-grade open source TTS models. Chatterbox Multilingual supports **Arabic, Danish, German, Greek, English, Spanish, Finnish, French, Hebrew, Hindi, Italian, Japanese, Korean, Malay, Dutch, Norwegian, Polish, Portuguese, Russian, Swedish, Swahili, Turkish, Chinese** out of the box. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs, and is consistently preferred in side-by-side evaluations.
Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. It's also the first open source TTS model to support **emotion exaggeration control**, a powerful feature that makes your voices stand out. Try it now on our [Hugging Face Gradio app.](https://huggingface.co/spaces/ResembleAI/Chatterbox) Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. It's also the first open source TTS model to support **emotion exaggeration control**, a powerful feature that makes your voices stand out. Try it now on our [Hugging Face Gradio app.](https://huggingface.co/spaces/ResembleAI/Chatterbox)
If you like the model but need to scale or tune it for higher accuracy, check out our competitively priced TTS service (<a href="https://resemble.ai">link</a>). It delivers reliable performance with ultra-low latency of sub 200ms—ideal for production use in agents, applications, or interactive media. If you like the model but need to scale or tune it for higher accuracy, check out our competitively priced TTS service (<a href="https://resemble.ai">link</a>). It delivers reliable performance with ultra-low latency of sub 200ms—ideal for production use in agents, applications, or interactive media.
# Key Details # Key Details
- SoTA zeroshot TTS - Multilingual, zero-shot TTS supporting 23 languages
- SoTA zeroshot English TTS
- 0.5B Llama backbone - 0.5B Llama backbone
- Unique exaggeration/intensity control - Unique exaggeration/intensity control
- Ultra-stable with alignment-informed inference - Ultra-stable with alignment-informed inference
@ -57,6 +83,10 @@ If you like the model but need to scale or tune it for higher accuracy, check ou
- Try lower `cfg` values (e.g. `~0.3`) and increase `exaggeration` to around `0.7` or higher. - Try lower `cfg` values (e.g. `~0.3`) and increase `exaggeration` to around `0.7` or higher.
- Higher `exaggeration` tends to speed up speech; reducing `cfg` helps compensate with slower, more deliberate pacing. - Higher `exaggeration` tends to speed up speech; reducing `cfg` helps compensate with slower, more deliberate pacing.
***Note:*** Ensure that the reference clip matches the specified language tag. Otherwise, language transfer outputs may inherit the accent of the reference clips language.
***To mitigate this, set the CFG weight to 0.***
# Installation # Installation
``` ```
@ -80,8 +110,25 @@ AUDIO_PROMPT_PATH="YOUR_FILE.wav"
wav = model.generate(text, audio_prompt_path=AUDIO_PROMPT_PATH) wav = model.generate(text, audio_prompt_path=AUDIO_PROMPT_PATH)
ta.save("test-2.wav", wav, model.sr) ta.save("test-2.wav", wav, model.sr)
``` ```
# Multilingual Quickstart
```python
import torchaudio as ta
from chatterbox.mtl_tts import ChatterboxMultilingualTTS
multilingual_model = ChatterboxMultilingualTTS.from_pretrained(device="cuda")
french_text = "Bonjour, comment ça va? Ceci est le modèle de synthèse vocale multilingue Chatterbox, il prend en charge 23 langues."
wav_french = multilingual_model.generate(french_text, language_id="fr")
ta.save("test-french.wav", wav_french, model.sr)
chinese_text = "你好,今天天气真不错,希望你有一个愉快的周末。"
wav_chinese = multilingual_model.generate(chinese_text, language_id="zh")
ta.save("test-chinese.wav", wav_chinese, model.sr)
```
See `example_tts.py` for more examples. See `example_tts.py` for more examples.
# Acknowledgements # Acknowledgements
- [Cosyvoice](https://github.com/FunAudioLLM/CosyVoice) - [Cosyvoice](https://github.com/FunAudioLLM/CosyVoice)
- [HiFT-GAN](https://github.com/yl4579/HiFTNet) - [HiFT-GAN](https://github.com/yl4579/HiFTNet)

3704
mtl_tokenizer.json Normal file

File diff suppressed because it is too large Load Diff

BIN
t3_23lang.safetensors (Stored with Git LFS) Normal file

Binary file not shown.