README.md 10.8 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
([简体中文](./README_cn.md)|English)
# Speech Verification)

## Introduction

Speaker Verification, refers to the problem of getting a speaker embedding from an audio. 

This demo is an implementation to extract speaker embedding from a specific audio file. It can be done by a single command or a few lines in python using `PaddleSpeech`. 

## Usage
### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).

You can choose one way from easy, meduim and hard to install paddlespeech.

### 2. Prepare Input File
The input of this demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.

Here are sample files for this demo that can be downloaded:
```bash
wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
```

### 3. Usage
- Command Line(Recommended)
  ```bash
  paddlespeech vector --task spk --input 85236145389.wav

  echo -e "demo1 85236145389.wav" > vec.job
  paddlespeech vector --task spk --input vec.job

  echo -e "demo2 85236145389.wav \n demo3 85236145389.wav" | paddlespeech vector --task spk
33 34 35 36 37

  paddlespeech vector --task score --input "./85236145389.wav ./123456789.wav"
  
  echo -e "demo4 85236145389.wav 85236145389.wav \n demo5 85236145389.wav 123456789.wav" > vec.job
  paddlespeech vector --task score --input vec.job
38 39 40 41
  ```
  
  Usage:
  ```bash
42
  paddlespeech vector --help
43 44 45
  ```
  Arguments:
  - `input`(required): Audio file to recognize.
46
  - `task` (required): Specify `vector` task. Default `spk`
47
  - `model`: Model type of vector task. Default: `ecapatdnn_voxceleb12`.
48
  - `sample_rate`: Sample rate of the model. Default: `16000`.
49
  - `config`: Config of vector task. Use pretrained model when it is None. Default: `None`.
50 51 52 53 54
  - `ckpt_path`: Model checkpoint. Use pretrained model when it is None. Default: `None`.
  - `device`: Choose device to execute model inference. Default: default device of paddlepaddle in current environment.

  Output:

55
  ```bash
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94
    demo [  1.4217498    5.626253    -5.342073     1.1773866    3.308055
    1.756596     5.167894    10.80636     -3.8226728   -5.6141334
    2.623845    -0.8072968    1.9635103   -7.3128724    0.01103897
    -9.723131     0.6619743   -6.976803    10.213478     7.494748
    2.9105635    3.8949256    3.7999806    7.1061673   16.905321
    -7.1493764    8.733103     3.4230042   -4.831653   -11.403367
    11.232214     7.1274667   -4.2828417    2.452362    -5.130748
    -18.177666    -2.6116815  -11.000337    -6.7314315    1.6564683
    0.7618269    1.1253023   -2.083836     4.725744    -8.782597
    -3.539873     3.814236     5.1420674    2.162061     4.096431
    -6.4162116   12.747448     1.9429878  -15.152943     6.417416
    16.097002    -9.716668    -1.9920526   -3.3649497   -1.871939
    11.567354     3.69788     11.258265     7.442363     9.183411
    4.5281515   -1.2417862    4.3959084    6.6727695    5.8898783
    7.627124    -0.66919386 -11.889693    -9.208865    -7.4274073
    -3.7776625    6.917234    -9.848748    -2.0944717   -5.135116
    0.49563864   9.317534    -5.9141874   -1.8098574   -0.11738578
    -7.169265    -1.0578263   -5.7216787   -5.1173844   16.137651
    -4.473626     7.6624317   -0.55381083   9.631587    -6.4704556
    -8.548508     4.3716145   -0.79702514   4.478997    -2.9758704
    3.272176     2.8382776    5.134597    -9.190781    -0.5657382
    -4.8745747    2.3165567   -5.984303    -2.1798875    0.35541576
    -0.31784213   9.493548     2.1144536    4.358092   -12.089823
    8.451689    -7.925461     4.6242585    4.4289427   18.692003
    -2.6204622   -5.149185    -0.35821092   8.488551     4.981496
    -9.32683     -2.2544234    6.6417594    1.2119585   10.977129
    16.555033     3.3238444    9.551863    -1.6676947   -0.79539716
    -8.605674    -0.47356385   2.6741948   -5.359179    -2.6673796
    0.66607     15.443222     4.740594    -3.4725387   11.592567
    -2.054497     1.7361217   -8.265324    -9.30447      5.4068313
    -1.5180256   -7.746615    -6.089606     0.07112726  -0.34904733
    -8.649895    -9.998958    -2.564841    -0.53999114   2.601808
    -0.31927416  -1.8815292   -2.07215     -3.4105783   -8.2998085
    1.483641   -15.365992    -8.288208     3.8847756   -3.4876456
    7.3629923    0.4657332    3.132599    12.438889    -1.8337058
    4.532936     2.7264361   10.145339    -6.521951     2.897153
    -3.3925855    5.079156     7.759716     4.677565     5.8457737
    2.402413     7.7071047    3.9711342   -6.390043     6.1268735
    -3.7760346  -11.118123  ]
95 96 97 98 99 100 101 102 103 104 105
  ```

- Python API
  ```python
  import paddle
  from paddlespeech.cli import VectorExecutor

  vector_executor = VectorExecutor()
  audio_emb = vector_executor(
      model='ecapatdnn_voxceleb12',
      sample_rate=16000,
106
      config=None,  # Set `config` and `ckpt_path` to None to use pretrained model.
107 108 109 110
      ckpt_path=None,
      audio_file='./85236145389.wav',
      device=paddle.get_device())
  print('Audio embedding Result: \n{}'.format(audio_emb))
111 112 113 114 115 116 117 118 119 120 121

  test_emb = vector_executor(
      model='ecapatdnn_voxceleb12',
      sample_rate=16000,
      config=None,  # Set `config` and `ckpt_path` to None to use pretrained model.
      ckpt_path=None,
      audio_file='./123456789.wav',
      device=paddle.get_device())
  print('Test embedding Result: \n{}'.format(test_emb))
  score = vector_executor.get_embeddings_score(audio_emb, test_emb)
  print(f"Eembeddings Score: {score}")
122 123
  ```

124 125
  Output:

126 127
  ```bash
  # Vector Result:
128
   Audio embedding Result:
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167
    [  1.4217498    5.626253    -5.342073     1.1773866    3.308055
    1.756596     5.167894    10.80636     -3.8226728   -5.6141334
    2.623845    -0.8072968    1.9635103   -7.3128724    0.01103897
    -9.723131     0.6619743   -6.976803    10.213478     7.494748
    2.9105635    3.8949256    3.7999806    7.1061673   16.905321
    -7.1493764    8.733103     3.4230042   -4.831653   -11.403367
    11.232214     7.1274667   -4.2828417    2.452362    -5.130748
    -18.177666    -2.6116815  -11.000337    -6.7314315    1.6564683
    0.7618269    1.1253023   -2.083836     4.725744    -8.782597
    -3.539873     3.814236     5.1420674    2.162061     4.096431
    -6.4162116   12.747448     1.9429878  -15.152943     6.417416
    16.097002    -9.716668    -1.9920526   -3.3649497   -1.871939
    11.567354     3.69788     11.258265     7.442363     9.183411
    4.5281515   -1.2417862    4.3959084    6.6727695    5.8898783
    7.627124    -0.66919386 -11.889693    -9.208865    -7.4274073
    -3.7776625    6.917234    -9.848748    -2.0944717   -5.135116
    0.49563864   9.317534    -5.9141874   -1.8098574   -0.11738578
    -7.169265    -1.0578263   -5.7216787   -5.1173844   16.137651
    -4.473626     7.6624317   -0.55381083   9.631587    -6.4704556
    -8.548508     4.3716145   -0.79702514   4.478997    -2.9758704
    3.272176     2.8382776    5.134597    -9.190781    -0.5657382
    -4.8745747    2.3165567   -5.984303    -2.1798875    0.35541576
    -0.31784213   9.493548     2.1144536    4.358092   -12.089823
    8.451689    -7.925461     4.6242585    4.4289427   18.692003
    -2.6204622   -5.149185    -0.35821092   8.488551     4.981496
    -9.32683     -2.2544234    6.6417594    1.2119585   10.977129
    16.555033     3.3238444    9.551863    -1.6676947   -0.79539716
    -8.605674    -0.47356385   2.6741948   -5.359179    -2.6673796
    0.66607     15.443222     4.740594    -3.4725387   11.592567
    -2.054497     1.7361217   -8.265324    -9.30447      5.4068313
    -1.5180256   -7.746615    -6.089606     0.07112726  -0.34904733
    -8.649895    -9.998958    -2.564841    -0.53999114   2.601808
    -0.31927416  -1.8815292   -2.07215     -3.4105783   -8.2998085
    1.483641   -15.365992    -8.288208     3.8847756   -3.4876456
    7.3629923    0.4657332    3.132599    12.438889    -1.8337058
    4.532936     2.7264361   10.145339    -6.521951     2.897153
    -3.3925855    5.079156     7.759716     4.677565     5.8457737
    2.402413     7.7071047    3.9711342   -6.390043     6.1268735
    -3.7760346  -11.118123  ]
168 169
    # get the test embedding
    Test embedding Result:
170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208
    [ -1.902964     2.0690894   -8.034194     3.5472693    0.18089125
      6.9085927    1.4097427   -1.9487704  -10.021278    -0.20755845
      -8.04332      4.344489     2.3200977  -14.306299     5.184692
    -11.55602     -3.8497238    0.6444722    1.2833948    2.6766639
      0.5878921    0.7946299    1.7207596    2.5791872   14.998469
      -1.3385371   15.031221    -0.8006958    1.99287     -9.52007
      2.435466     4.003221    -4.33817     -4.898601    -5.304714
    -18.033886    10.790787   -12.784645    -5.641755     2.9761686
    -10.566622     1.4839455    6.152458    -5.7195854    2.8603241
      6.112133     8.489869     5.5958056    1.2836679   -1.2293907
      0.89927405   7.0288725   -2.854029    -0.9782962    5.8255906
      14.905906    -5.025907     0.7866458   -4.2444224  -16.354029
      10.521315     0.9604709   -3.3257897    7.144871   -13.592733
      -8.568869    -1.7953678    0.26313916  10.916714    -6.9374123
      1.857403    -6.2746415    2.8154466   -7.2338667   -2.293357
      -0.05452765   5.4287076    5.0849075   -6.690375    -1.6183422
      3.654291     0.94352573  -9.200294    -5.4749465   -3.5235846
      1.3420814    4.240421    -2.772944    -2.8451524   16.311104
      4.2969875   -1.762936   -12.5758915    8.595198    -0.8835239
      -1.5708797    1.568961     1.1413603    3.5032008   -0.45251232
      -6.786333    16.89443      5.3366146   -8.789056     0.6355629
      3.2579517   -3.328322     7.5969577    0.66025066  -6.550468
      -9.148656     2.020372    -0.4615173    1.1965656   -3.8764873
      11.6562195   -6.0750933   12.182899     3.2218833    0.81969476
      5.570001    -3.8459578   -7.205299     7.9262037   -7.6611166
      -5.249467    -2.2671914    7.2658715  -13.298164     4.821147
      -2.7263982   11.691089    -3.8918593   -2.838112    -1.0336838
      -3.8034165    2.8536487   -5.60398     -1.1972581    1.3455094
      -3.4903061    2.2408795    5.5010734   -3.970756    11.99696
      -7.8858757    0.43160373  -5.5059714    4.3426995   16.322706
      11.635366     0.72157705  -9.245714    -3.91465     -4.449838
      -1.5716927    7.713747    -2.2430465   -6.198303   -13.481864
      2.8156567   -5.7812386    5.1456156    2.7289324  -14.505571
      13.270688     3.448231    -7.0659585    4.5886116   -4.466099
      -0.296428   -11.463529    -2.6076477   14.110243    -6.9725137
      -1.9962958    2.7119343   19.391657     0.01961198  14.607133
      -1.6695905   -4.391516     1.3131028   -6.670972    -5.888604
      12.0612335    5.9285784    3.3715196    1.492534    10.723728
      -0.95514804 -12.085431  ]
209
    # get the score between enroll and test
210
    Eembeddings Score: 0.4292638301849365
211 212 213 214 215 216 217 218 219
  ```

### 4.Pretrained Models

Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API:

| Model | Sample Rate
| :--- | :---: |
| ecapatdnn_voxceleb12 | 16k