README.md 11.0 KB
Newer Older
1
([简体中文](./README_cn.md)|English)
H
Hui Zhang 已提交
2
# Speech Verification
3 4 5 6 7 8 9 10 11 12 13 14 15 16

## Introduction

Speaker Verification, refers to the problem of getting a speaker embedding from an audio. 

This demo is an implementation to extract speaker embedding from a specific audio file. It can be done by a single command or a few lines in python using `PaddleSpeech`. 

## Usage
### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).

You can choose one way from easy, meduim and hard to install paddlespeech.

### 2. Prepare Input File
X
xiongxinlei 已提交
17
The input of this cli demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
18 19 20 21

Here are sample files for this demo that can be downloaded:
```bash
wget -c https://paddlespeech.bj.bcebos.com/vector/audio/85236145389.wav
小湉湉's avatar
小湉湉 已提交
22
wget -c https://paddlespeech.bj.bcebos.com/vector/audio/123456789.wav
23 24 25 26 27 28 29 30 31 32 33
```

### 3. Usage
- Command Line(Recommended)
  ```bash
  paddlespeech vector --task spk --input 85236145389.wav

  echo -e "demo1 85236145389.wav" > vec.job
  paddlespeech vector --task spk --input vec.job

  echo -e "demo2 85236145389.wav \n demo3 85236145389.wav" | paddlespeech vector --task spk
34 35 36 37 38

  paddlespeech vector --task score --input "./85236145389.wav ./123456789.wav"
  
  echo -e "demo4 85236145389.wav 85236145389.wav \n demo5 85236145389.wav 123456789.wav" > vec.job
  paddlespeech vector --task score --input vec.job
39 40 41 42
  ```
  
  Usage:
  ```bash
43
  paddlespeech vector --help
44 45 46
  ```
  Arguments:
  - `input`(required): Audio file to recognize.
47
  - `task` (required): Specify `vector` task. Default `spk`
48
  - `model`: Model type of vector task. Default: `ecapatdnn_voxceleb12`.
49
  - `sample_rate`: Sample rate of the model. Default: `16000`.
50
  - `config`: Config of vector task. Use pretrained model when it is None. Default: `None`.
51 52 53 54 55
  - `ckpt_path`: Model checkpoint. Use pretrained model when it is None. Default: `None`.
  - `device`: Choose device to execute model inference. Default: default device of paddlepaddle in current environment.

  Output:

56
  ```bash
X
xiongxinlei 已提交
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
    demo [ -1.3251206    7.8606825   -4.620626     0.3000721    2.2648535
    -1.1931441    3.0647137    7.673595    -6.0044727  -12.02426
    -1.9496069    3.1269536    1.618838    -7.6383104   -1.2299773
  -12.338331     2.1373026   -5.3957124    9.717328     5.6752305
    3.7805123    3.0597172    3.429692     8.97601     13.174125
    -0.53132284   8.9424715    4.46511     -4.4262476   -9.726503
    8.399328     7.2239175   -7.435854     2.9441683   -4.3430395
  -13.886965    -1.6346735  -10.9027405   -5.311245     3.8007221
    3.8976038   -2.1230774   -2.3521194    4.151031    -7.4048667
    0.13911647   2.4626107    4.9664545    0.9897574    5.4839754
    -3.3574002   10.1340065   -0.6120171  -10.403095     4.6007543
    16.00935     -7.7836914   -4.1945305   -6.9368606    1.1789556
    11.490801     4.2380238    9.550931     8.375046     7.5089145
    -0.65707296  -0.30051577   2.8406055    3.0828028    0.730817
    6.148354     0.13766119 -13.424735    -7.7461405   -2.3227983
    -8.305252     2.9879124  -10.995229     0.15211068  -2.3820348
    -1.7984174    8.495629    -5.8522367   -3.755498     0.6989711
    -5.2702994   -2.6188622   -1.8828466   -4.64665     14.078544
    -0.5495333   10.579158    -3.2160501    9.349004    -4.381078
  -11.675817    -2.8630207    4.5721755    2.246612    -4.574342
    1.8610188    2.3767874    5.6257877   -9.784078     0.64967257
    -1.4579505    0.4263264   -4.9211264   -2.454784     3.4869802
    -0.42654222   8.341269     1.356552     7.0966883  -13.102829
    8.016734    -7.1159344    1.8699781    0.208721    14.699384
    -1.025278    -2.6107233   -2.5082312    8.427193     6.9138527
    -6.2912464    0.6157366    2.489688    -3.4668267    9.921763
    11.200815    -0.1966403    7.4916005   -0.62312716  -0.25848144
    -9.947997    -0.9611041    1.1649219   -2.1907122   -1.5028487
    -0.51926106  15.165954     2.4649463   -0.9980445    7.4416637
    -2.0768049    3.5896823   -7.3055434   -7.5620847    4.323335
    0.0804418   -6.56401     -2.3148053   -1.7642345   -2.4708817
    -7.675618    -9.548878    -1.0177554    0.16986446   2.5877135
    -1.8752296   -0.36614323  -6.0493784   -2.3965611   -5.9453387
    0.9424033  -13.155974    -7.457801     0.14658108  -3.742797
    5.8414927   -1.2872906    5.5694313   12.57059      1.0939219
    2.2142086    1.9181576    6.9914207   -5.888139     3.1409824
    -2.003628     2.4434285    9.973139     5.03668      2.0051203
    2.8615603    5.860224     2.9176188   -1.6311141    2.0292206
    -4.070415    -6.831437  ]
96 97 98 99
  ```

- Python API
  ```python
K
KP 已提交
100
  from paddlespeech.cli.vector import VectorExecutor
101 102 103 104 105

  vector_executor = VectorExecutor()
  audio_emb = vector_executor(
      model='ecapatdnn_voxceleb12',
      sample_rate=16000,
106
      config=None,  # Set `config` and `ckpt_path` to None to use pretrained model.
107 108 109 110
      ckpt_path=None,
      audio_file='./85236145389.wav',
      device=paddle.get_device())
  print('Audio embedding Result: \n{}'.format(audio_emb))
111 112 113 114 115 116 117 118 119

  test_emb = vector_executor(
      model='ecapatdnn_voxceleb12',
      sample_rate=16000,
      config=None,  # Set `config` and `ckpt_path` to None to use pretrained model.
      ckpt_path=None,
      audio_file='./123456789.wav',
      device=paddle.get_device())
  print('Test embedding Result: \n{}'.format(test_emb))
X
xiongxinlei 已提交
120 121

  # score range [0, 1]
122 123
  score = vector_executor.get_embeddings_score(audio_emb, test_emb)
  print(f"Eembeddings Score: {score}")
124 125
  ```

126 127
  Output:

128 129
  ```bash
  # Vector Result:
130
   Audio embedding Result:
X
xiongxinlei 已提交
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169
    [ -1.3251206    7.8606825   -4.620626     0.3000721    2.2648535
      -1.1931441    3.0647137    7.673595    -6.0044727  -12.02426
      -1.9496069    3.1269536    1.618838    -7.6383104   -1.2299773
    -12.338331     2.1373026   -5.3957124    9.717328     5.6752305
      3.7805123    3.0597172    3.429692     8.97601     13.174125
      -0.53132284   8.9424715    4.46511     -4.4262476   -9.726503
      8.399328     7.2239175   -7.435854     2.9441683   -4.3430395
    -13.886965    -1.6346735  -10.9027405   -5.311245     3.8007221
      3.8976038   -2.1230774   -2.3521194    4.151031    -7.4048667
      0.13911647   2.4626107    4.9664545    0.9897574    5.4839754
      -3.3574002   10.1340065   -0.6120171  -10.403095     4.6007543
      16.00935     -7.7836914   -4.1945305   -6.9368606    1.1789556
      11.490801     4.2380238    9.550931     8.375046     7.5089145
      -0.65707296  -0.30051577   2.8406055    3.0828028    0.730817
      6.148354     0.13766119 -13.424735    -7.7461405   -2.3227983
      -8.305252     2.9879124  -10.995229     0.15211068  -2.3820348
      -1.7984174    8.495629    -5.8522367   -3.755498     0.6989711
      -5.2702994   -2.6188622   -1.8828466   -4.64665     14.078544
      -0.5495333   10.579158    -3.2160501    9.349004    -4.381078
    -11.675817    -2.8630207    4.5721755    2.246612    -4.574342
      1.8610188    2.3767874    5.6257877   -9.784078     0.64967257
      -1.4579505    0.4263264   -4.9211264   -2.454784     3.4869802
      -0.42654222   8.341269     1.356552     7.0966883  -13.102829
      8.016734    -7.1159344    1.8699781    0.208721    14.699384
      -1.025278    -2.6107233   -2.5082312    8.427193     6.9138527
      -6.2912464    0.6157366    2.489688    -3.4668267    9.921763
      11.200815    -0.1966403    7.4916005   -0.62312716  -0.25848144
      -9.947997    -0.9611041    1.1649219   -2.1907122   -1.5028487
      -0.51926106  15.165954     2.4649463   -0.9980445    7.4416637
      -2.0768049    3.5896823   -7.3055434   -7.5620847    4.323335
      0.0804418   -6.56401     -2.3148053   -1.7642345   -2.4708817
      -7.675618    -9.548878    -1.0177554    0.16986446   2.5877135
      -1.8752296   -0.36614323  -6.0493784   -2.3965611   -5.9453387
      0.9424033  -13.155974    -7.457801     0.14658108  -3.742797
      5.8414927   -1.2872906    5.5694313   12.57059      1.0939219
      2.2142086    1.9181576    6.9914207   -5.888139     3.1409824
      -2.003628     2.4434285    9.973139     5.03668      2.0051203
      2.8615603    5.860224     2.9176188   -1.6311141    2.0292206
      -4.070415    -6.831437  ]
170 171
    # get the test embedding
    Test embedding Result:
X
xiongxinlei 已提交
172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210
    [  2.5247195    5.119042    -4.335273     4.4583654    5.047907
      3.5059214    1.6159848    0.49364898 -11.6899185   -3.1014526
      -5.6589785   -0.42684984   2.674276   -11.937654     6.2248464
    -10.776924    -5.694543     1.112041     1.5709964    1.0961034
      1.3976512    2.324352     1.339981     5.279319    13.734659
      -2.5753925   13.651442    -2.2357535    5.1575427   -3.251567
      1.4023279    6.1191974   -6.0845175   -1.3646189   -2.6789894
    -15.220778     9.779349    -9.411551    -6.388947     6.8313975
      -9.245996     0.31196198   2.5509644   -4.413065     6.1649427
      6.793837     2.6328635    8.620976     3.4832475    0.52491665
      2.9115407    5.8392377    0.6702376   -3.2726715    2.6694255
      16.91701     -5.5811176    0.23362345  -4.5573606  -11.801059
      14.728292    -0.5198082   -3.999922     7.0927105   -7.0459595
      -5.4389      -0.46420583  -5.1085467   10.376568    -8.889225
      -0.37705845  -1.659806     2.6731026   -7.1909504    1.4608804
      -2.163136    -0.17949677   4.0241547    0.11319201   0.601279
      2.039692     3.1910992  -11.649526    -8.121584    -4.8707457
      0.3851982    1.4231744   -2.3321972    0.99332285  14.121717
      5.899413     0.7384519  -17.760096    10.555021     4.1366534
      -0.3391071   -0.20792882   3.208204     0.8847948   -8.721497
      -6.432868    13.006379     4.8956      -9.155822    -1.9441519
      5.7815638   -2.066733    10.425042    -0.8802383   -2.4314315
      -9.869258     0.35095334  -5.3549943    2.1076174   -8.290468
      8.4433365   -4.689333     9.334139    -2.172678    -3.0250976
      8.394216    -3.2110903   -7.93868      2.3960824   -2.3213403
      -1.4963245   -3.476059     4.132903   -10.893354     4.362673
      -0.45456508  10.258634    -1.1655927   -6.7799754    0.22885278
      -4.399287     2.333433    -4.84745     -4.2752337   -1.3577863
      -1.0685898    9.505196     7.3062205    0.08708266  12.927811
      -9.57974      1.3936648   -1.9444873    5.776769    15.251903
      10.6118355   -1.4903594   -9.535318    -3.6553776   -1.6699586
      -0.5933151    7.600357    -4.8815503   -8.698617   -15.855757
      0.25632986  -7.2235737    0.9506656    0.7128582   -9.051738
      8.74869     -1.6426028   -6.5762258    2.506905    -6.7431564
      5.129912   -12.189555    -3.6435068   12.068113    -6.0059533
      -2.3535995    2.9014351   22.3082      -1.5563312   13.193291
      2.7583609   -7.468798     1.3407065   -4.599617    -6.2345777
      10.7689295    7.137627     5.099476     0.3473359    9.647881
      -2.0484571   -5.8549366 ]
211
    # get the score between enroll and test
X
xiongxinlei 已提交
212
    Eembeddings Score: 0.45332613587379456
213 214 215 216 217 218 219 220 221
  ```

### 4.Pretrained Models

Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API:

| Model | Sample Rate
| :--- | :---: |
| ecapatdnn_voxceleb12 | 16k