From e6cb3a1f421cdb04c6809f0413f846516f2abb83 Mon Sep 17 00:00:00 2001 From: indextts Date: Mon, 8 Sep 2025 08:46:29 +0000 Subject: [PATCH] Update README.md --- README.md | 34 ---------------------------------- 1 file changed, 34 deletions(-) diff --git a/README.md b/README.md index 3c5ca44..ef76487 100644 --- a/README.md +++ b/README.md @@ -1,15 +1,8 @@ -
- -
- - ## 👉🏻 IndexTTS2 👈🏻

IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech

-[![IndexTTS2](assets/IndexTTS2_banner.png)](assets/IndexTTS2_banner.png) -
@@ -43,17 +36,6 @@ Existing autoregressive large-scale text-to-speech (TTS) models have advantages **Tips:** Please contact authors for more detailed information. For commercial cooperation, please contact indexspeech@bilibili.com -### Feel IndexTTS2 -
- -**IndexTTS2: The Future of Voice, Now Generating** - -[![IndexTTS2 Demo](assets/IndexTTS2-video-pic.png)](assets/IndexTTS2.mp4) - -*Click the image to watch IndexTTS2 video* - -
- ### Contact QQ Group:553460296(No.1) 1048202584(No.2) 764630270(No.3)\ Discord:https://discord.gg/uT32E7KDmy \ @@ -68,22 +50,6 @@ Emal:indexspeech@bilibili.com \ - `2025/03/25` 🔥 We release **IndexTTS-1.0** model parameters and inference code. - `2025/02/12` 🔥 We submitted our paper on arXiv, and released our demos and test sets. -## 🖥️ Method - -The overview of IndexTTS2 is shown as follows. - - - - - - -The key contributions of **indextts2** are summarized as follows: - - We propose a duration adaptation scheme for autoregressive TTS models. IndexTTS2 is the first autoregressive zero-shot TTS model to combine precise duration control with natural duration generation, and the method is scalable for any autoregressive large-scale TTS model. - - The emotional and speaker-related features are decoupled from the prompts, and a feature fusion strategy is designed to maintain semantic fluency and pronunciation clarity during emotionally rich expressions. Furthermore, a tool was developed for emotion control, utilising natural language descriptions for the benefit of users. - - To address the lack of highly expressive speech data, we propose an effective training strategy, significantly enhancing the emotional expressiveness of zeroshot TTS to State-of-the-Art (SOTA) level. - - We will publicly release the code and pre-trained weights to facilitate future research and practical applications. - - ## Acknowledge 1. [tortoise-tts](https://github.com/neonbjb/tortoise-tts) 2. [XTTSv2](https://github.com/coqui-ai/TTS)