diff --git a/README.md b/README.md index d19003cff244152d900e652fdcc55a2d4d156cde..5160581d86fa1f6e66f42eb240a08a3162f7cf2f 100644 --- a/README.md +++ b/README.md @@ -31,31 +31,29 @@ SV2TTS is a three-stage deep learning framework that allows to create a numerica ## Setup -Note: setup up this project is a lot of work. Somebody took the time to make [a better guide](https://poorlydocumented.com/2019/11/installing-corentinjs-real-time-voice-cloning-project-on-windows-10-from-scratch/) on how to install everything. I recommend using it. -### Requirements -You will need the following whether you plan to use the toolbox only or to retrain the models. +### 1. Install Requirements -**Python 3.6+**. +**Python 3.6 or 3.7** is needed to run the toolbox. -Run `pip install -r requirements.txt` to install the necessary packages. Additionally you will need [PyTorch](https://pytorch.org/get-started/locally/) (>=1.0.1). +* Install [PyTorch](https://pytorch.org/get-started/locally/) (>=1.0.1). +* Install [ffmpeg](https://ffmpeg.org/download.html#get-packages). +* Run `pip install -r requirements.txt` to install the remaining necessary packages. -If you have a GPU, run `pip install -r requirements_gpu.txt` to enable GPU support. A GPU is recommended, but it is not required to use the toolbox. - -### Pretrained models +### 2. Download Pretrained Models Download the latest [here](https://github.com/CorentinJ/Real-Time-Voice-Cloning/wiki/Pretrained-models). -### Preliminary +### 3. (Optional) Test Configuration Before you download any dataset, you can begin by testing your configuration with: `python demo_cli.py` If all tests pass, you're good to go. -### Datasets +### 4. (Optional) Download Datasets For playing with the toolbox alone, I only recommend downloading [`LibriSpeech/train-clean-100`](http://www.openslr.org/resources/12/train-clean-100.tar.gz). Extract the contents as `/LibriSpeech/train-clean-100` where `` is a directory of your choosing. Other datasets are supported in the toolbox, see [here](https://github.com/CorentinJ/Real-Time-Voice-Cloning/wiki/Training#datasets). You're free not to download any dataset, but then you will need your own data as audio files or you will have to record it with the toolbox. -### Toolbox +### 5. Launch the Toolbox You can then try the toolbox: `python demo_toolbox.py -d ` @@ -63,3 +61,10 @@ or `python demo_toolbox.py` depending on whether you downloaded any datasets. If you are running an X-server or if you have the error `Aborted (core dumped)`, see [this issue](https://github.com/CorentinJ/Real-Time-Voice-Cloning/issues/11#issuecomment-504733590). + +### 6. (Optional) Enable GPU Support +Note: Enabling GPU support is a lot of work. You will want to set this up if you are going to train your own models. Somebody took the time to make [a better guide](https://poorlydocumented.com/2019/11/installing-corentinjs-real-time-voice-cloning-project-on-windows-10-from-scratch/) on how to install everything. I recommend using it. + +This command installs additional GPU dependencies and recommended packages: `pip install -r requirements_gpu.txt` + +Additionally, you will need to ensure GPU drivers are properly installed and that your CUDA version matches your PyTorch and Tensorflow installations.