Files
IndexTTS-2/README.md
2025-09-08 08:47:32 +00:00

83 lines
3.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

## 👉🏻 IndexTTS2 👈🏻
<center><h3>IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech</h3></center>
<div align="center">
<a href='https://arxiv.org/abs/2506.21619'>
<img src='https://img.shields.io/badge/ArXiv-2506.21619-red?logo=arxiv'/>
</a>
<br/>
<a href='https://github.com/index-tts/index-tts'>
<img src='https://img.shields.io/badge/GitHub-Code-orange?logo=github'/>
</a>
<a href='https://index-tts.github.io/index-tts2.github.io/'>
<img src='https://img.shields.io/badge/GitHub-Demo-orange?logo=github'/>
</a>
<br/>
<a href='https://huggingface.co/spaces/IndexTeam/IndexTTS'>
<img src='https://img.shields.io/badge/HuggingFace-Demo-blue?logo=huggingface'/>
</a>
<a href='https://huggingface.co/IndexTeam/IndexTTS-2.0'>
<img src='https://img.shields.io/badge/HuggingFace-Model-blue?logo=huggingface' />
</a>
<br/>
<a href='https://modelscope.cn/studios/IndexTeam/IndexTTS-Demo'>
<img src='https://img.shields.io/badge/ModelScope-Demo-purple?logo=modelscope'/>
</a>
<a href='https://modelscope.cn/models/IndexTeam/IndexTTS-2.0'>
<img src='https://img.shields.io/badge/ModelScope-Model-purple?logo=modelscope'/>
</a>
</div>
### Contact
QQ Group553460296(No.1) 1048202584(No.2) 764630270(No.3)\
Discordhttps://discord.gg/uT32E7KDmy \
Emalindexspeech@bilibili.com \
欢迎大家来交流讨论!
## 📣 Updates
- `2025/09/08` 🔥🔥🔥 We release the **IndexTTS-2**
- The first autoregressive TTS model with precise synthesis duration control: supporting both controllable and uncontrollable modes
- The model achieves highly expressive emotional speech synthesis, with emotion-controllable capabilities enabled through multiple input modalities.
- `2025/05/14` 🔥🔥 We release the **IndexTTS-1.5**, Significantly improve the model's stability and its performance in the English language.
- `2025/03/25` 🔥 We release **IndexTTS-1.0** model parameters and inference code.
- `2025/02/12` 🔥 We submitted our paper on arXiv, and released our demos and test sets.
## Acknowledge
1. [tortoise-tts](https://github.com/neonbjb/tortoise-tts)
2. [XTTSv2](https://github.com/coqui-ai/TTS)
3. [BigVGAN](https://github.com/NVIDIA/BigVGAN)
4. [wenet](https://github.com/wenet-e2e/wenet/tree/main)
5. [icefall](https://github.com/k2-fsa/icefall)
6. [maskgct](https://github.com/open-mmlab/Amphion/tree/main/models/tts/maskgct)
7. [seed-vc](https://github.com/Plachtaa/seed-vc)
## 📚 Citation
🌟 If you find our work helpful, please leave us a star and cite our paper.
IndexTTS2
```
@article{zhou2025indextts2,
title={IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech},
author={Siyi Zhou, Yiquan Zhou, Yi He, Xun Zhou, Jinchao Wang, Wei Deng, Jingchen Shu},
journal={arXiv preprint arXiv:2506.21619},
year={2025}
}
```
IndexTTS
```
@article{deng2025indextts,
title={IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System},
author={Wei Deng, Siyi Zhou, Jingchen Shu, Jinchao Wang, Lu Wang},
journal={arXiv preprint arXiv:2502.05512},
year={2025},
doi={10.48550/arXiv.2502.05512},
url={https://arxiv.org/abs/2502.05512}
}
```