mirror of
https://www.modelscope.cn/IndexTeam/IndexTTS-2.git
synced 2026-04-02 11:42:53 +08:00
Update README.md
This commit is contained in:
174
README.md
174
README.md
@ -84,180 +84,6 @@ The key contributions of **indextts2** are summarized as follows:
|
|||||||
- We will publicly release the code and pre-trained weights to facilitate future research and practical applications.
|
- We will publicly release the code and pre-trained weights to facilitate future research and practical applications.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## Model Download
|
|
||||||
| **HuggingFace** | **ModelScope** |
|
|
||||||
|----------------------------------------------------------|----------------------------------------------------------|
|
|
||||||
| [😁 IndexTTS2](https://huggingface.co/IndexTeam/IndexTTS-2.0) | [IndexTTS-2](https://modelscope.cn/models/IndexTeam/IndexTTS-2.0) |
|
|
||||||
| [IndexTTS-1.5](https://huggingface.co/IndexTeam/IndexTTS-1.5) | [IndexTTS-1.5](https://modelscope.cn/models/IndexTeam/IndexTTS-1.5) |
|
|
||||||
| [IndexTTS](https://huggingface.co/IndexTeam/Index-TTS) | [IndexTTS](https://modelscope.cn/models/IndexTeam/Index-TTS) |
|
|
||||||
|
|
||||||
|
|
||||||
## Usage Instructions
|
|
||||||
### Environment Setup
|
|
||||||
1. Download this repository:
|
|
||||||
```bash
|
|
||||||
git clone https://github.com/index-tts/index-tts.git
|
|
||||||
```
|
|
||||||
2. Install dependencies:
|
|
||||||
```bash
|
|
||||||
conda create -n indextts2 python=3.10
|
|
||||||
conda activate indextts2
|
|
||||||
pip install -r requirements.txt
|
|
||||||
```
|
|
||||||
|
|
||||||
3. Download models:
|
|
||||||
|
|
||||||
Download by `huggingface-cli`:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
huggingface-cli download IndexTeam/IndexTTS-1.5 \
|
|
||||||
config.yaml bigvgan_discriminator.pth bigvgan_generator.pth bpe.model dvae.pth gpt.pth unigram_12000.vocab \
|
|
||||||
--local-dir checkpoints
|
|
||||||
```
|
|
||||||
|
|
||||||
Recommended for China users. 如果下载速度慢,可以使用镜像:
|
|
||||||
```bash
|
|
||||||
export HF_ENDPOINT="https://hf-mirror.com"
|
|
||||||
```
|
|
||||||
|
|
||||||
Or by `wget`:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
wget https://huggingface.co/IndexTeam/IndexTTS-1.5/resolve/main/bigvgan_discriminator.pth -P checkpoints
|
|
||||||
wget https://huggingface.co/IndexTeam/IndexTTS-1.5/resolve/main/bigvgan_generator.pth -P checkpoints
|
|
||||||
wget https://huggingface.co/IndexTeam/IndexTTS-1.5/resolve/main/bpe.model -P checkpoints
|
|
||||||
wget https://huggingface.co/IndexTeam/IndexTTS-1.5/resolve/main/dvae.pth -P checkpoints
|
|
||||||
wget https://huggingface.co/IndexTeam/IndexTTS-1.5/resolve/main/gpt.pth -P checkpoints
|
|
||||||
wget https://huggingface.co/IndexTeam/IndexTTS-1.5/resolve/main/unigram_12000.vocab -P checkpoints
|
|
||||||
wget https://huggingface.co/IndexTeam/IndexTTS-1.5/resolve/main/config.yaml -P checkpoints
|
|
||||||
```
|
|
||||||
|
|
||||||
4. Run test script:
|
|
||||||
|
|
||||||
Do a quick test run
|
|
||||||
|
|
||||||
```bash
|
|
||||||
from indextts.infer_indextts2 import IndexTTS2
|
|
||||||
tts = IndexTTS2(cfg_path="checkpoints/config.yaml", model_dir="checkpoints", is_fp16=False, use_cuda_kernel=False)
|
|
||||||
text="这是一个有很好情感表现力的自回归语音生成大模型,它还可以控制合成语音的时长,希望能受到大家的喜欢。"
|
|
||||||
tts.infer(spk_audio_prompt='test_data/input.wav', text=text, output_path="gen.wav", verbose=True)
|
|
||||||
```
|
|
||||||
|
|
||||||
额外指定一个情感参考音频 Specify an additional emotional reference audio
|
|
||||||
|
|
||||||
```bash
|
|
||||||
from indextts.infer_indextts2 import IndexTTS2
|
|
||||||
tts = IndexTTS2(cfg_path="checkpoints/config.yaml", model_dir="checkpoints", is_fp16=False, use_cuda_kernel=False)
|
|
||||||
text="这是一个有很好情感表现力的自回归语音生成大模型,它还可以控制合成语音的时长,希望能受到大家的喜欢。"
|
|
||||||
tts.infer(spk_audio_prompt='test_data/input.wav', text=text, output_path="gen.wav", emo_audio_prompt="test_data/low.wav", verbose=True)
|
|
||||||
```
|
|
||||||
|
|
||||||
当指定情感参考音频时,还可以额外指定参数emo_alpha,emo_alpha代表参考情感音频的程度,默认为1.0
|
|
||||||
|
|
||||||
```bash
|
|
||||||
from indextts.infer_indextts2 import IndexTTS2
|
|
||||||
tts = IndexTTS2(cfg_path="checkpoints/config.yaml", model_dir="checkpoints", is_fp16=False, use_cuda_kernel=False)
|
|
||||||
text="这是一个有很好情感表现力的自回归语音生成大模型,它还可以控制合成语音的时长,希望能受到大家的喜欢。"
|
|
||||||
tts.infer(spk_audio_prompt='test_data/input.wav', text=text, output_path="gen.wav", emo_audio_prompt="test_data/low.wav", emo_alpha=0.5, verbose=True)
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
也可以不指定情感参考音频,而给定各基础情感(喜|怒|哀|惧|厌恶|低落|惊喜|平静)的强度,包括8个float的list
|
|
||||||
|
|
||||||
```bash
|
|
||||||
from indextts.infer_indextts2 import IndexTTS2
|
|
||||||
tts = IndexTTS2(cfg_path="checkpoints/config.yaml", model_dir="checkpoints", is_fp16=False, use_cuda_kernel=False)
|
|
||||||
text="这是一个有很好情感表现力的自回归语音生成大模型,它还可以控制合成语音的时长,希望能受到大家的喜欢。"
|
|
||||||
tts.infer(spk_audio_prompt='test_data/input.wav', text=text, output_path="gen.wav", emo_vector=[0, 1.0, 0, 0, 0, 0, 0, 0], verbose=True)
|
|
||||||
```
|
|
||||||
|
|
||||||
可以使用文本情感描述指导情感的合成,使用参数use_emo_text
|
|
||||||
|
|
||||||
```bash
|
|
||||||
from indextts.infer_indextts2 import IndexTTS2
|
|
||||||
tts = IndexTTS2(cfg_path="checkpoints/config.yaml", model_dir="checkpoints", is_fp16=False, use_cuda_kernel=False)
|
|
||||||
text="这是一个有很好情感表现力的自回归语音生成大模型,它还可以控制合成语音的时长,希望能受到大家的喜欢。"
|
|
||||||
tts.infer(spk_audio_prompt='test_data/input.wav', text=text, output_path="gen.wav", use_emo_text=True, verbose=True)
|
|
||||||
```
|
|
||||||
|
|
||||||
当不指定emo_text,根据输入的合成文案内容推理,指定时根据指定的文案推
|
|
||||||
|
|
||||||
```bash
|
|
||||||
from indextts.infer_indextts2 import IndexTTS2
|
|
||||||
tts = IndexTTS2(cfg_path="checkpoints/config.yaml", model_dir="checkpoints", is_fp16=False, use_cuda_kernel=False)
|
|
||||||
text="这是一个有很好情感表现力的自回归语音生成大模型,它还可以控制合成语音的时长,希望能受到大家的喜欢。"
|
|
||||||
tts.infer(spk_audio_prompt='test_data/input.wav', text=text, output_path="gen.wav", use_emo_text=True, emo_text='有一丢丢伤心', verbose=True)
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Specify the duration of the synthesized speech
|
|
||||||
|
|
||||||
```bash
|
|
||||||
from indextts.infer_indextts2 import IndexTTS2
|
|
||||||
tts = IndexTTS2(cfg_path="checkpoints/config.yaml", model_dir="checkpoints", is_fp16=False, use_cuda_kernel=False)
|
|
||||||
text="这是一个有很好情感表现力的自回归语音生成大模型,它还可以控制合成语音的时长,希望能受到大家的喜欢。"
|
|
||||||
tts.infer(spk_audio_prompt='test_data/input.wav', text=text, output_path="gen.wav", use_speed=True, target_dur=7.5, verbose=True)
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
5. Use as command line tool:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Make sure pytorch has been installed before running this command
|
|
||||||
pip install -e .
|
|
||||||
indextts "大家好,我现在正在bilibili 体验 ai 科技,说实话,来之前我绝对想不到!AI技术已经发展到这样匪夷所思的地步了!" \
|
|
||||||
--voice reference_voice.wav \
|
|
||||||
--model_dir checkpoints \
|
|
||||||
--config checkpoints/config.yaml \
|
|
||||||
--output output.wav
|
|
||||||
```
|
|
||||||
|
|
||||||
Use `--help` to see more options.
|
|
||||||
```bash
|
|
||||||
indextts --help
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Web Demo
|
|
||||||
```bash
|
|
||||||
pip install -e ".[webui]"
|
|
||||||
python webui.py
|
|
||||||
|
|
||||||
# use another model version:
|
|
||||||
python webui.py --model_dir IndexTTS-1.5
|
|
||||||
```
|
|
||||||
Open your browser and visit `http://127.0.0.1:7860` to see the demo.
|
|
||||||
|
|
||||||
#### Note for Windows Users
|
|
||||||
|
|
||||||
On Windows, you may encounter [an error](https://github.com/index-tts/index-tts/issues/61) when installing `pynini`:
|
|
||||||
`ERROR: Failed building wheel for pynini`
|
|
||||||
|
|
||||||
In this case, please install `pynini` via `conda`:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# after conda activate index-tts
|
|
||||||
conda install -c conda-forge pynini==2.1.5
|
|
||||||
pip install WeTextProcessing==1.0.3
|
|
||||||
pip install -e ".[webui]"
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Sample Code
|
|
||||||
```python
|
|
||||||
from indextts.infer import IndexTTS
|
|
||||||
tts = IndexTTS(model_dir="checkpoints",cfg_path="checkpoints/config.yaml")
|
|
||||||
voice="reference_voice.wav"
|
|
||||||
text="大家好,我现在正在bilibili 体验 ai 科技,说实话,来之前我绝对想不到!AI技术已经发展到这样匪夷所思的地步了!比如说,现在正在说话的其实是B站为我现场复刻的数字分身,简直就是平行宇宙的另一个我了。如果大家也想体验更多深入的AIGC功能,可以访问 bilibili studio,相信我,你们也会吃惊的。"
|
|
||||||
tts.infer(voice, text, output_path)
|
|
||||||
```
|
|
||||||
|
|
||||||
## 👉🏻 IndexTTS 👈🏻
|
|
||||||
### IndexTTS2: [[Paper]](https://arxiv.org/abs/2506.21619); [[Demo]](https://index-tts.github.io/index-tts2.github.io/); [[ModelScope]](); [[HuggingFace]]()
|
|
||||||
|
|
||||||
### IndexTTS1: [[Paper]](https://arxiv.org/abs/2502.05512); [[Demo]](https://index-tts.github.io/); [[ModelScope]](https://huggingface.co/spaces/IndexTeam/IndexTTS); [[HuggingFace]](https://huggingface.co/spaces/IndexTeam/IndexTTS)
|
|
||||||
|
|
||||||
|
|
||||||
## Acknowledge
|
## Acknowledge
|
||||||
1. [tortoise-tts](https://github.com/neonbjb/tortoise-tts)
|
1. [tortoise-tts](https://github.com/neonbjb/tortoise-tts)
|
||||||
2. [XTTSv2](https://github.com/coqui-ai/TTS)
|
2. [XTTSv2](https://github.com/coqui-ai/TTS)
|
||||||
|
|||||||
Reference in New Issue
Block a user