# PaddleSpeech

## What is PaddleSpeech?
PaddleSpeech is an open-source toolkit on the PaddlePaddle platform for two critical tasks in Speech -  Speech-to-Text (Automatic Speech Recognition, ASR) and Text-to-Speech Synthesis (TTS), with modules involving state-of-art and influential models.

## What can PaddleSpeech do?

### Speech-to-Text
PaddleSpeech ASR mainly consists of components below:
- Implementation of models and commonly used neural network layers.
- Dataset abstraction and common data preprocessing pipelines.
- Ready-to-run experiments.

PaddleSpeech ASR provides you with a complete ASR pipeline, including:
- Data Preparation
    - Build vocabulary
    - Compute Cepstral mean and variance normalization (CMVN)
    - Featrue extraction
        - linear
        - fbank (also support kaldi feature)
        - mfcc
- Acoustic Models
    - Deepspeech2 (Streaming and Non-Streaming)
    - Transformer (Streaming and Non-Streaming)
    - Conformer (Streaming and Non-Streaming)
- Decoder
    - ctc greedy search (used in DeepSpeech2, Transformer and Conformer)
    - ctc beam search (used in DeepSpeech2, Transformer and Conformer)
    - attention decoding (used in Transformer and Conformer)
    - attention rescoring (used in Transformer and Conformer)

Speech-to-Text helps you train the ASR model very simply.

### Text-to-Speech
TTS mainly consists of components below:
- Implementation of models and commonly used neural network layers.
- Dataset abstraction and common data preprocessing pipelines.
- Ready-to-run experiments.

PaddleSpeech TTS provides you with a complete TTS pipeline, including:
- Text FrontEnd
    - Rule based Chinese frontend.
- Acoustic Models
    - FastSpeech2
    - SpeedySpeech
    - TransformerTTS
    - Tacotron2
- Vocoders
    - Multi Band MelGAN
    - Parallel WaveGAN
    - WaveFlow
- Voice Cloning
    - Transfer Learning from Speaker Verification to Multispeaker Text-to-Speech Synthesis
    - GE2E

Text-to-Speech helps you to train TTS models with simple commands.