mirror of
https://www.modelscope.cn/IndexTeam/IndexTTS-2.git
synced 2026-04-02 19:52:53 +08:00
83 lines
3.2 KiB
Markdown
83 lines
3.2 KiB
Markdown
|
||
## 👉🏻 IndexTTS2 👈🏻
|
||
|
||
<center><h3>IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech</h3></center>
|
||
|
||
<div align="center">
|
||
<a href='https://arxiv.org/abs/2506.21619'>
|
||
<img src='https://img.shields.io/badge/ArXiv-2506.21619-red?logo=arxiv'/>
|
||
</a>
|
||
<br/>
|
||
<a href='https://github.com/index-tts/index-tts'>
|
||
<img src='https://img.shields.io/badge/GitHub-Code-orange?logo=github'/>
|
||
</a>
|
||
<a href='https://index-tts.github.io/index-tts2.github.io/'>
|
||
<img src='https://img.shields.io/badge/GitHub-Demo-orange?logo=github'/>
|
||
</a>
|
||
<br/>
|
||
<a href='https://huggingface.co/spaces/IndexTeam/IndexTTS'>
|
||
<img src='https://img.shields.io/badge/HuggingFace-Demo-blue?logo=huggingface'/>
|
||
</a>
|
||
<a href='https://huggingface.co/IndexTeam/IndexTTS-2.0'>
|
||
<img src='https://img.shields.io/badge/HuggingFace-Model-blue?logo=huggingface' />
|
||
</a>
|
||
<br/>
|
||
<a href='https://modelscope.cn/studios/IndexTeam/IndexTTS-Demo'>
|
||
<img src='https://img.shields.io/badge/ModelScope-Demo-purple?logo=modelscope'/>
|
||
</a>
|
||
<a href='https://modelscope.cn/models/IndexTeam/IndexTTS-2.0'>
|
||
<img src='https://img.shields.io/badge/ModelScope-Model-purple?logo=modelscope'/>
|
||
</a>
|
||
</div>
|
||
|
||
### Contact
|
||
QQ Group:553460296(No.1) 1048202584(No.2) 764630270(No.3)\
|
||
Discord:https://discord.gg/uT32E7KDmy \
|
||
Emal:indexspeech@bilibili.com \
|
||
欢迎大家来交流讨论!
|
||
## 📣 Updates
|
||
|
||
- `2025/09/08` 🔥🔥🔥 We release the **IndexTTS-2**
|
||
- The first autoregressive TTS model with precise synthesis duration control: supporting both controllable and uncontrollable modes
|
||
- The model achieves highly expressive emotional speech synthesis, with emotion-controllable capabilities enabled through multiple input modalities.
|
||
- `2025/05/14` 🔥🔥 We release the **IndexTTS-1.5**, Significantly improve the model's stability and its performance in the English language.
|
||
- `2025/03/25` 🔥 We release **IndexTTS-1.0** model parameters and inference code.
|
||
- `2025/02/12` 🔥 We submitted our paper on arXiv, and released our demos and test sets.
|
||
|
||
## Acknowledge
|
||
1. [tortoise-tts](https://github.com/neonbjb/tortoise-tts)
|
||
2. [XTTSv2](https://github.com/coqui-ai/TTS)
|
||
3. [BigVGAN](https://github.com/NVIDIA/BigVGAN)
|
||
4. [wenet](https://github.com/wenet-e2e/wenet/tree/main)
|
||
5. [icefall](https://github.com/k2-fsa/icefall)
|
||
6. [maskgct](https://github.com/open-mmlab/Amphion/tree/main/models/tts/maskgct)
|
||
7. [seed-vc](https://github.com/Plachtaa/seed-vc)
|
||
|
||
|
||
## 📚 Citation
|
||
|
||
🌟 If you find our work helpful, please leave us a star and cite our paper.
|
||
|
||
|
||
IndexTTS2
|
||
```
|
||
@article{zhou2025indextts2,
|
||
title={IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech},
|
||
author={Siyi Zhou, Yiquan Zhou, Yi He, Xun Zhou, Jinchao Wang, Wei Deng, Jingchen Shu},
|
||
journal={arXiv preprint arXiv:2506.21619},
|
||
year={2025}
|
||
}
|
||
```
|
||
|
||
IndexTTS
|
||
```
|
||
@article{deng2025indextts,
|
||
title={IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System},
|
||
author={Wei Deng, Siyi Zhou, Jingchen Shu, Jinchao Wang, Lu Wang},
|
||
journal={arXiv preprint arXiv:2502.05512},
|
||
year={2025},
|
||
doi={10.48550/arXiv.2502.05512},
|
||
url={https://arxiv.org/abs/2502.05512}
|
||
}
|
||
```
|