diff --git a/README.md b/README.md index 384df69..3c5ca44 100644 --- a/README.md +++ b/README.md @@ -84,180 +84,6 @@ The key contributions of **indextts2** are summarized as follows: - We will publicly release the code and pre-trained weights to facilitate future research and practical applications. - -## Model Download -| **HuggingFace** | **ModelScope** | -|----------------------------------------------------------|----------------------------------------------------------| -| [😁 IndexTTS2](https://huggingface.co/IndexTeam/IndexTTS-2.0) | [IndexTTS-2](https://modelscope.cn/models/IndexTeam/IndexTTS-2.0) | -| [IndexTTS-1.5](https://huggingface.co/IndexTeam/IndexTTS-1.5) | [IndexTTS-1.5](https://modelscope.cn/models/IndexTeam/IndexTTS-1.5) | -| [IndexTTS](https://huggingface.co/IndexTeam/Index-TTS) | [IndexTTS](https://modelscope.cn/models/IndexTeam/Index-TTS) | - - -## Usage Instructions -### Environment Setup -1. Download this repository: -```bash -git clone https://github.com/index-tts/index-tts.git -``` -2. Install dependencies: -```bash -conda create -n indextts2 python=3.10 -conda activate indextts2 -pip install -r requirements.txt -``` - -3. Download models: - -Download by `huggingface-cli`: - -```bash -huggingface-cli download IndexTeam/IndexTTS-1.5 \ - config.yaml bigvgan_discriminator.pth bigvgan_generator.pth bpe.model dvae.pth gpt.pth unigram_12000.vocab \ - --local-dir checkpoints -``` - -Recommended for China users. 如果下载速度慢,可以使用镜像: -```bash -export HF_ENDPOINT="https://hf-mirror.com" -``` - -Or by `wget`: - -```bash -wget https://huggingface.co/IndexTeam/IndexTTS-1.5/resolve/main/bigvgan_discriminator.pth -P checkpoints -wget https://huggingface.co/IndexTeam/IndexTTS-1.5/resolve/main/bigvgan_generator.pth -P checkpoints -wget https://huggingface.co/IndexTeam/IndexTTS-1.5/resolve/main/bpe.model -P checkpoints -wget https://huggingface.co/IndexTeam/IndexTTS-1.5/resolve/main/dvae.pth -P checkpoints -wget https://huggingface.co/IndexTeam/IndexTTS-1.5/resolve/main/gpt.pth -P checkpoints -wget https://huggingface.co/IndexTeam/IndexTTS-1.5/resolve/main/unigram_12000.vocab -P checkpoints -wget https://huggingface.co/IndexTeam/IndexTTS-1.5/resolve/main/config.yaml -P checkpoints -``` - -4. Run test script: - -Do a quick test run - -```bash -from indextts.infer_indextts2 import IndexTTS2 -tts = IndexTTS2(cfg_path="checkpoints/config.yaml", model_dir="checkpoints", is_fp16=False, use_cuda_kernel=False) -text="这是一个有很好情感表现力的自回归语音生成大模型,它还可以控制合成语音的时长,希望能受到大家的喜欢。" -tts.infer(spk_audio_prompt='test_data/input.wav', text=text, output_path="gen.wav", verbose=True) -``` - -额外指定一个情感参考音频 Specify an additional emotional reference audio - -```bash -from indextts.infer_indextts2 import IndexTTS2 -tts = IndexTTS2(cfg_path="checkpoints/config.yaml", model_dir="checkpoints", is_fp16=False, use_cuda_kernel=False) -text="这是一个有很好情感表现力的自回归语音生成大模型,它还可以控制合成语音的时长,希望能受到大家的喜欢。" -tts.infer(spk_audio_prompt='test_data/input.wav', text=text, output_path="gen.wav", emo_audio_prompt="test_data/low.wav", verbose=True) -``` - -当指定情感参考音频时,还可以额外指定参数emo_alpha,emo_alpha代表参考情感音频的程度,默认为1.0 - -```bash -from indextts.infer_indextts2 import IndexTTS2 -tts = IndexTTS2(cfg_path="checkpoints/config.yaml", model_dir="checkpoints", is_fp16=False, use_cuda_kernel=False) -text="这是一个有很好情感表现力的自回归语音生成大模型,它还可以控制合成语音的时长,希望能受到大家的喜欢。" -tts.infer(spk_audio_prompt='test_data/input.wav', text=text, output_path="gen.wav", emo_audio_prompt="test_data/low.wav", emo_alpha=0.5, verbose=True) -``` - - -也可以不指定情感参考音频,而给定各基础情感(喜|怒|哀|惧|厌恶|低落|惊喜|平静)的强度,包括8个float的list - -```bash -from indextts.infer_indextts2 import IndexTTS2 -tts = IndexTTS2(cfg_path="checkpoints/config.yaml", model_dir="checkpoints", is_fp16=False, use_cuda_kernel=False) -text="这是一个有很好情感表现力的自回归语音生成大模型,它还可以控制合成语音的时长,希望能受到大家的喜欢。" -tts.infer(spk_audio_prompt='test_data/input.wav', text=text, output_path="gen.wav", emo_vector=[0, 1.0, 0, 0, 0, 0, 0, 0], verbose=True) -``` - -可以使用文本情感描述指导情感的合成,使用参数use_emo_text - -```bash -from indextts.infer_indextts2 import IndexTTS2 -tts = IndexTTS2(cfg_path="checkpoints/config.yaml", model_dir="checkpoints", is_fp16=False, use_cuda_kernel=False) -text="这是一个有很好情感表现力的自回归语音生成大模型,它还可以控制合成语音的时长,希望能受到大家的喜欢。" -tts.infer(spk_audio_prompt='test_data/input.wav', text=text, output_path="gen.wav", use_emo_text=True, verbose=True) -``` - -当不指定emo_text,根据输入的合成文案内容推理,指定时根据指定的文案推 - -```bash -from indextts.infer_indextts2 import IndexTTS2 -tts = IndexTTS2(cfg_path="checkpoints/config.yaml", model_dir="checkpoints", is_fp16=False, use_cuda_kernel=False) -text="这是一个有很好情感表现力的自回归语音生成大模型,它还可以控制合成语音的时长,希望能受到大家的喜欢。" -tts.infer(spk_audio_prompt='test_data/input.wav', text=text, output_path="gen.wav", use_emo_text=True, emo_text='有一丢丢伤心', verbose=True) -``` - - - -Specify the duration of the synthesized speech - -```bash -from indextts.infer_indextts2 import IndexTTS2 -tts = IndexTTS2(cfg_path="checkpoints/config.yaml", model_dir="checkpoints", is_fp16=False, use_cuda_kernel=False) -text="这是一个有很好情感表现力的自回归语音生成大模型,它还可以控制合成语音的时长,希望能受到大家的喜欢。" -tts.infer(spk_audio_prompt='test_data/input.wav', text=text, output_path="gen.wav", use_speed=True, target_dur=7.5, verbose=True) -``` - - -5. Use as command line tool: - -```bash -# Make sure pytorch has been installed before running this command -pip install -e . -indextts "大家好,我现在正在bilibili 体验 ai 科技,说实话,来之前我绝对想不到!AI技术已经发展到这样匪夷所思的地步了!" \ - --voice reference_voice.wav \ - --model_dir checkpoints \ - --config checkpoints/config.yaml \ - --output output.wav -``` - -Use `--help` to see more options. -```bash -indextts --help -``` - -#### Web Demo -```bash -pip install -e ".[webui]" -python webui.py - -# use another model version: -python webui.py --model_dir IndexTTS-1.5 -``` -Open your browser and visit `http://127.0.0.1:7860` to see the demo. - -#### Note for Windows Users - -On Windows, you may encounter [an error](https://github.com/index-tts/index-tts/issues/61) when installing `pynini`: -`ERROR: Failed building wheel for pynini` - -In this case, please install `pynini` via `conda`: - -```bash -# after conda activate index-tts -conda install -c conda-forge pynini==2.1.5 -pip install WeTextProcessing==1.0.3 -pip install -e ".[webui]" -``` - -#### Sample Code -```python -from indextts.infer import IndexTTS -tts = IndexTTS(model_dir="checkpoints",cfg_path="checkpoints/config.yaml") -voice="reference_voice.wav" -text="大家好,我现在正在bilibili 体验 ai 科技,说实话,来之前我绝对想不到!AI技术已经发展到这样匪夷所思的地步了!比如说,现在正在说话的其实是B站为我现场复刻的数字分身,简直就是平行宇宙的另一个我了。如果大家也想体验更多深入的AIGC功能,可以访问 bilibili studio,相信我,你们也会吃惊的。" -tts.infer(voice, text, output_path) -``` - -## 👉🏻 IndexTTS 👈🏻 -### IndexTTS2: [[Paper]](https://arxiv.org/abs/2506.21619); [[Demo]](https://index-tts.github.io/index-tts2.github.io/); [[ModelScope]](); [[HuggingFace]]() - -### IndexTTS1: [[Paper]](https://arxiv.org/abs/2502.05512); [[Demo]](https://index-tts.github.io/); [[ModelScope]](https://huggingface.co/spaces/IndexTeam/IndexTTS); [[HuggingFace]](https://huggingface.co/spaces/IndexTeam/IndexTTS) - - ## Acknowledge 1. [tortoise-tts](https://github.com/neonbjb/tortoise-tts) 2. [XTTSv2](https://github.com/coqui-ai/TTS)