README.md 5.2 KB
Newer Older
C
Corentin Jemine 已提交
1
# Real-Time Voice Cloning
C
Corentin Jemine 已提交
2
This repository is an implementation of [Transfer Learning from Speaker Verification to
C
Corentin Jemine 已提交
3
Multispeaker Text-To-Speech Synthesis](https://arxiv.org/pdf/1806.04558.pdf) (SV2TTS) with a vocoder that works in real-time. Feel free to check [my thesis](https://matheo.uliege.be/handle/2268.2/6801) if you're curious or if you're looking for info I haven't documented. Mostly I would recommend giving a quick look to the figures beyond the introduction.
C
Corentin Jemine 已提交
4

C
Corentin Jemine 已提交
5
SV2TTS is a three-stage deep learning framework that allows to create a numerical representation of a voice from a few seconds of audio, and to use it to condition a text-to-speech model trained to generalize to new voices.
C
Corentin Jemine 已提交
6

7
**Video demonstration** (click the picture):
C
Corentin Jemine 已提交
8

C
Corentin Jemine 已提交
9
[![Toolbox demo](https://i.imgur.com/8lFUlgz.png)](https://www.youtube.com/watch?v=-O_hYhToKoA)
C
Corentin Jemine 已提交
10 11


C
Corentin Jemine 已提交
12 13 14 15 16

### Papers implemented  
| URL | Designation | Title | Implementation source |
| --- | ----------- | ----- | --------------------- |
|[**1806.04558**](https://arxiv.org/pdf/1806.04558.pdf) | **SV2TTS** | **Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis** | This repo |
C
Corentin Jemine 已提交
17 18 19 20
|[1802.08435](https://arxiv.org/pdf/1802.08435.pdf) | WaveRNN (vocoder) | Efficient Neural Audio Synthesis | [fatchord/WaveRNN](https://github.com/fatchord/WaveRNN) |
|[1712.05884](https://arxiv.org/pdf/1712.05884.pdf) | Tacotron 2 (synthesizer) | Natural TTS Synthesis by Conditioning Wavenet on Mel Spectrogram Predictions | [Rayhane-mamah/Tacotron-2](https://github.com/Rayhane-mamah/Tacotron-2)
|[1710.10467](https://arxiv.org/pdf/1710.10467.pdf) | GE2E (encoder)| Generalized End-To-End Loss for Speaker Verification | This repo |

21
## News
C
Corentin Jemine 已提交
22
**13/11/19**: I'm now working full time and I will not maintain this repo anymore. To anyone who reads this:
C
Corentin Jemine 已提交
23
- **If you just want to clone your voice (and not someone else's):** I recommend our free plan on [Resemble.AI](https://www.resemble.ai/). Firstly because you will get a better voice quality and less prosody errors, and secondly because it will not require a complex setup like this repo does.
C
Corentin Jemine 已提交
24
- **If this is not your case:** proceed with this repository, but be warned: not only is the environment a mess to setup, but you might end up being disappointed by the results. If you're planning to work on a serious project, my strong advice: find another TTS repo. Go [here](https://github.com/CorentinJ/Real-Time-Voice-Cloning/issues/364) for more info.
C
Corentin Jemine 已提交
25

26 27
**20/08/19:** I'm working on [resemblyzer](https://github.com/resemble-ai/Resemblyzer), an independent package for the voice encoder. You can use your trained encoder models from this repo with it.

C
Corentin Jemine 已提交
28
**06/07/19:** Need to run within a docker container on a remote server? See [here](https://sean.lane.sh/posts/2019/07/Running-the-Real-Time-Voice-Cloning-project-in-Docker/).
C
Corentin Jemine 已提交
29

30
**25/06/19:** Experimental support for low-memory GPUs (~2gb) added for the synthesizer. Pass `--low_mem` to `demo_cli.py` or `demo_toolbox.py` to enable it. It adds a big overhead, so it's not recommended if you have enough VRAM.
C
Corentin Jemine 已提交
31 32


C
Corentin Jemine 已提交
33 34
## Setup

B
blue-fish 已提交
35
### 1. Install Requirements
C
Corentin Jemine 已提交
36

B
blue-fish 已提交
37
**Python 3.6 or 3.7** is needed to run the toolbox.
C
Corentin Jemine 已提交
38

B
blue-fish 已提交
39 40 41
* Install [PyTorch](https://pytorch.org/get-started/locally/) (>=1.0.1).
* Install [ffmpeg](https://ffmpeg.org/download.html#get-packages).
* Run `pip install -r requirements.txt` to install the remaining necessary packages.
C
Corentin Jemine 已提交
42

B
blue-fish 已提交
43
### 2. Download Pretrained Models
44
Download the latest [here](https://github.com/CorentinJ/Real-Time-Voice-Cloning/wiki/Pretrained-models).
V
Valiox 已提交
45

B
blue-fish 已提交
46
### 3. (Optional) Test Configuration
47
Before you download any dataset, you can begin by testing your configuration with:
C
Corentin Jemine 已提交
48

49
`python demo_cli.py`
C
Corentin Jemine 已提交
50

51
If all tests pass, you're good to go.
52

B
blue-fish 已提交
53
### 4. (Optional) Download Datasets
54
For playing with the toolbox alone, I only recommend downloading [`LibriSpeech/train-clean-100`](http://www.openslr.org/resources/12/train-clean-100.tar.gz). Extract the contents as `<datasets_root>/LibriSpeech/train-clean-100` where `<datasets_root>` is a directory of your choosing. Other datasets are supported in the toolbox, see [here](https://github.com/CorentinJ/Real-Time-Voice-Cloning/wiki/Training#datasets). You're free not to download any dataset, but then you will need your own data as audio files or you will have to record it with the toolbox.
C
Corentin Jemine 已提交
55

B
blue-fish 已提交
56
### 5. Launch the Toolbox
57
You can then try the toolbox:
C
Corentin Jemine 已提交
58

59 60 61
`python demo_toolbox.py -d <datasets_root>`  
or  
`python demo_toolbox.py`  
C
Corentin Jemine 已提交
62

63
depending on whether you downloaded any datasets. If you are running an X-server or if you have the error `Aborted (core dumped)`, see [this issue](https://github.com/CorentinJ/Real-Time-Voice-Cloning/issues/11#issuecomment-504733590).
B
blue-fish 已提交
64 65 66 67 68 69 70

### 6. (Optional) Enable GPU Support
Note: Enabling GPU support is a lot of work. You will want to set this up if you are going to train your own models. Somebody took the time to make [a better guide](https://poorlydocumented.com/2019/11/installing-corentinjs-real-time-voice-cloning-project-on-windows-10-from-scratch/) on how to install everything. I recommend using it.

This command installs additional GPU dependencies and recommended packages: `pip install -r requirements_gpu.txt`

Additionally, you will need to ensure GPU drivers are properly installed and that your CUDA version matches your PyTorch and Tensorflow installations.