mirror of
https://www.modelscope.cn/IndexTeam/IndexTTS-2.git
synced 2026-04-02 19:52:53 +08:00
Update README.md
This commit is contained in:
34
README.md
34
README.md
@ -1,15 +1,8 @@
|
|||||||
|
|
||||||
<div align="center">
|
|
||||||
<img src='assets/index_icon.png' width="250"/>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
|
|
||||||
## 👉🏻 IndexTTS2 👈🏻
|
## 👉🏻 IndexTTS2 👈🏻
|
||||||
|
|
||||||
<center><h3>IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech</h3></center>
|
<center><h3>IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech</h3></center>
|
||||||
|
|
||||||
[](assets/IndexTTS2_banner.png)
|
|
||||||
|
|
||||||
<div align="center">
|
<div align="center">
|
||||||
<a href='https://arxiv.org/abs/2506.21619'>
|
<a href='https://arxiv.org/abs/2506.21619'>
|
||||||
<img src='https://img.shields.io/badge/ArXiv-2506.21619-red?logo=arxiv'/>
|
<img src='https://img.shields.io/badge/ArXiv-2506.21619-red?logo=arxiv'/>
|
||||||
@ -43,17 +36,6 @@ Existing autoregressive large-scale text-to-speech (TTS) models have advantages
|
|||||||
|
|
||||||
**Tips:** Please contact authors for more detailed information. For commercial cooperation, please contact <u>indexspeech@bilibili.com</u>
|
**Tips:** Please contact authors for more detailed information. For commercial cooperation, please contact <u>indexspeech@bilibili.com</u>
|
||||||
|
|
||||||
### Feel IndexTTS2
|
|
||||||
<div align="center">
|
|
||||||
|
|
||||||
**IndexTTS2: The Future of Voice, Now Generating**
|
|
||||||
|
|
||||||
[](assets/IndexTTS2.mp4)
|
|
||||||
|
|
||||||
*Click the image to watch IndexTTS2 video*
|
|
||||||
|
|
||||||
</div>
|
|
||||||
|
|
||||||
### Contact
|
### Contact
|
||||||
QQ Group:553460296(No.1) 1048202584(No.2) 764630270(No.3)\
|
QQ Group:553460296(No.1) 1048202584(No.2) 764630270(No.3)\
|
||||||
Discord:https://discord.gg/uT32E7KDmy \
|
Discord:https://discord.gg/uT32E7KDmy \
|
||||||
@ -68,22 +50,6 @@ Emal:indexspeech@bilibili.com \
|
|||||||
- `2025/03/25` 🔥 We release **IndexTTS-1.0** model parameters and inference code.
|
- `2025/03/25` 🔥 We release **IndexTTS-1.0** model parameters and inference code.
|
||||||
- `2025/02/12` 🔥 We submitted our paper on arXiv, and released our demos and test sets.
|
- `2025/02/12` 🔥 We submitted our paper on arXiv, and released our demos and test sets.
|
||||||
|
|
||||||
## 🖥️ Method
|
|
||||||
|
|
||||||
The overview of IndexTTS2 is shown as follows.
|
|
||||||
|
|
||||||
<picture>
|
|
||||||
<img src="assets/IndexTTS2.png" width="800"/>
|
|
||||||
</picture>
|
|
||||||
|
|
||||||
|
|
||||||
The key contributions of **indextts2** are summarized as follows:
|
|
||||||
- We propose a duration adaptation scheme for autoregressive TTS models. IndexTTS2 is the first autoregressive zero-shot TTS model to combine precise duration control with natural duration generation, and the method is scalable for any autoregressive large-scale TTS model.
|
|
||||||
- The emotional and speaker-related features are decoupled from the prompts, and a feature fusion strategy is designed to maintain semantic fluency and pronunciation clarity during emotionally rich expressions. Furthermore, a tool was developed for emotion control, utilising natural language descriptions for the benefit of users.
|
|
||||||
- To address the lack of highly expressive speech data, we propose an effective training strategy, significantly enhancing the emotional expressiveness of zeroshot TTS to State-of-the-Art (SOTA) level.
|
|
||||||
- We will publicly release the code and pre-trained weights to facilitate future research and practical applications.
|
|
||||||
|
|
||||||
|
|
||||||
## Acknowledge
|
## Acknowledge
|
||||||
1. [tortoise-tts](https://github.com/neonbjb/tortoise-tts)
|
1. [tortoise-tts](https://github.com/neonbjb/tortoise-tts)
|
||||||
2. [XTTSv2](https://github.com/coqui-ai/TTS)
|
2. [XTTSv2](https://github.com/coqui-ai/TTS)
|
||||||
|
|||||||
Reference in New Issue
Block a user