README.md 3.1 KB
Newer Older
1 2
([简体中文](./README_cn.md)|English)

K
KP 已提交
3 4 5
# Audio Tagging

## Introduction
小湉湉's avatar
小湉湉 已提交
6
Audio tagging is the task of labeling an audio clip with one or more labels or tags, including music tagging, acoustic scene classification, audio event classification, etc.
K
KP 已提交
7

K
KP 已提交
8
This demo is an implementation to tag an audio file with 527 [AudioSet](https://research.google.com/audioset/) labels. It can be done by a single command or a few lines in python using `PaddleSpeech`. 
K
KP 已提交
9 10 11

## Usage
### 1. Installation
小湉湉's avatar
小湉湉 已提交
12
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
J
Jackwaterveg 已提交
13 14

You can choose one way from easy, meduim and hard to install paddlespeech.
K
KP 已提交
15 16

### 2. Prepare Input File
小湉湉's avatar
小湉湉 已提交
17
The input of this demo should be a WAV file(`.wav`).
K
KP 已提交
18 19

Here are sample files for this demo that can be downloaded:
K
KP 已提交
20
```bash
K
KP 已提交
21
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/cat.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/dog.wav
K
KP 已提交
22 23 24 25
```

### 3. Usage
- Command Line(Recommended)
K
KP 已提交
26
  ```bash
K
KP 已提交
27
  paddlespeech cls --input ./cat.wav --topk 10
K
KP 已提交
28
  ```
K
KP 已提交
29 30 31 32 33
  Usage:
  ```bash
  paddlespeech cls --help
  ```
  Arguments:
小湉湉's avatar
小湉湉 已提交
34
  - `input`(required): The audio file to tag.
K
KP 已提交
35
  - `model`: Model type of tagging task. Default: `panns_cnn14`.
小湉湉's avatar
小湉湉 已提交
36 37 38 39 40
  - `config`: Config of tagging task. Use a pretrained model when it is None. Default: `None`.
  - `ckpt_path`: Model checkpoint. Use a pretrained model when it is None. Default: `None`.
  - `label_file`: Label file of tagging task. Use audio set labels when it is None. Default: `None`.
  - `topk`: Show topk tagging labels of the result. Default: `1`.
  - `device`: Choose the device to execute model inference. Default: default device of paddlepaddle in the current environment.
K
KP 已提交
41 42

  Output:
K
KP 已提交
43
  ```bash
K
KP 已提交
44 45 46 47 48 49 50 51 52 53 54 55 56 57
  [2021-12-08 14:49:40,671] [    INFO] [utils.py] [L225] - CLS Result:
  Cat: 0.8991316556930542
  Domestic animals, pets: 0.8806838393211365
  Meow: 0.8784668445587158
  Animal: 0.8776564598083496
  Caterwaul: 0.2232048511505127
  Speech: 0.03101264126598835
  Music: 0.02870696596801281
  Inside, small room: 0.016673989593982697
  Purr: 0.008387474343180656
  Bird: 0.006304860580712557
  ```

- Python API
K
KP 已提交
58
  ```python
K
KP 已提交
59
  import paddle
K
KP 已提交
60
  from paddlespeech.cli.cls import CLSExecutor
K
KP 已提交
61 62 63

  cls_executor = CLSExecutor()
  result = cls_executor(
K
KP 已提交
64 65
      model='panns_cnn14',
      config=None,  # Set `config` and `ckpt_path` to None to use pretrained model.
K
KP 已提交
66 67 68 69
      label_file=None,
      ckpt_path=None,
      audio_file='./cat.wav',
      topk=10,
K
KP 已提交
70
      device=paddle.get_device())
K
KP 已提交
71
  print('CLS Result: \n{}'.format(result))
K
KP 已提交
72 73
  ```
  Output:
K
KP 已提交
74
  ```bash
K
KP 已提交
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
  CLS Result:
  Cat: 0.8991316556930542
  Domestic animals, pets: 0.8806838393211365
  Meow: 0.8784668445587158
  Animal: 0.8776564598083496
  Caterwaul: 0.2232048511505127
  Speech: 0.03101264126598835
  Music: 0.02870696596801281
  Inside, small room: 0.016673989593982697
  Purr: 0.008387474343180656
  Bird: 0.006304860580712557
  ```

### 4.Pretrained Models

小湉湉's avatar
小湉湉 已提交
90
Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API:
K
KP 已提交
91 92 93 94 95 96

| Model | Sample Rate
| :--- | :---: 
| panns_cnn6| 32000
| panns_cnn10| 32000
| panns_cnn14| 32000