Update README.md

2026-05-19 18:12:56 +08:00 · 2025-09-08 08:46:29 +00:00
parent 83a71ad75c
commit e6cb3a1f42
1 changed files with 0 additions and 34 deletions
--- a/README.md
+++ b/README.md
@ -1,15 +1,8 @@

-<div align="center">
-<img src='assets/index_icon.png' width="250"/>
-</div>
-
-
 ## 👉🏻 IndexTTS2 👈🏻

 <center><h3>IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech</h3></center>

-[![IndexTTS2](assets/IndexTTS2_banner.png)](assets/IndexTTS2_banner.png)
-
 <div align="center">
  <a href='https://arxiv.org/abs/2506.21619'>
    <img src='https://img.shields.io/badge/ArXiv-2506.21619-red?logo=arxiv'/>
@ -43,17 +36,6 @@ Existing autoregressive large-scale text-to-speech (TTS) models have advantages

 **Tips:** Please contact authors for more detailed information. For commercial cooperation, please contact <u>indexspeech@bilibili.com</u>

-### Feel IndexTTS2
-<div align="center">
-
-**IndexTTS2: The Future of Voice, Now Generating**
-
-[![IndexTTS2 Demo](assets/IndexTTS2-video-pic.png)](assets/IndexTTS2.mp4)
-
-*Click the image to watch IndexTTS2 video*
-
-</div>
-
 ### Contact
 QQ Group：553460296(No.1) 1048202584(No.2) 764630270(No.3)\
 Discord：https://discord.gg/uT32E7KDmy  \
@ -68,22 +50,6 @@ Emal：indexspeech@bilibili.com  \
 - `2025/03/25` 🔥 We release **IndexTTS-1.0** model parameters and inference code.
 - `2025/02/12` 🔥 We submitted our paper on arXiv, and released our demos and test sets.

-## 🖥️ Method
-
-The overview of IndexTTS2 is shown as follows.
-
-<picture>
-  <img src="assets/IndexTTS2.png"  width="800"/>
-</picture>
-
-
-The key contributions of **indextts2** are summarized as follows:
- - We propose a duration adaptation scheme for autoregressive TTS models. IndexTTS2 is the first autoregressive zero-shot TTS model to combine precise duration control with natural duration generation, and the method is scalable for any autoregressive large-scale TTS model.  
- - The emotional and speaker-related features are decoupled from the prompts, and a feature fusion strategy is designed to maintain semantic fluency and pronunciation clarity during emotionally rich expressions. Furthermore, a tool was developed for emotion control, utilising natural language descriptions for the benefit of users.  
- - To address the lack of highly expressive speech data, we propose an effective training strategy, significantly enhancing the emotional expressiveness of zeroshot TTS to State-of-the-Art (SOTA) level.  
- - We will publicly release the code and pre-trained weights to facilitate future research and practical applications.  
-
-
 ## Acknowledge
 1. [tortoise-tts](https://github.com/neonbjb/tortoise-tts)
 2. [XTTSv2](https://github.com/coqui-ai/TTS)