提交 b3e4e26c 编写于 作者: K KP

Update asr and audio tagging demo.

上级 f053dce3
......@@ -7,7 +7,7 @@ This demo is an implementation to tag an audio file with 527 [AudioSet](https://
## Usage
### 1. Installation
```sh
```bash
pip install paddlespeech
```
......@@ -15,16 +15,20 @@ pip install paddlespeech
Input of this demo should be a WAV file(`.wav`).
Here are sample files for this demo that can be downloaded:
```sh
```bash
wget https://paddlespeech.bj.bcebos.com/PaddleAudio/cat.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/dog.wav
```
### 3. Usage
- Command Line(Recommended)
```sh
```bash
paddlespeech cls --input ~/cat.wav --topk 10
```
Command usage:
Usage:
```bash
paddlespeech cls --help
```
Arguments:
- `input`(required): Audio file to tag.
- `model`: Model type of tagging task. Default: `panns_cnn14`.
- `config`: Config of tagging task. Use pretrained model when it is None. Default: `None`.
......@@ -34,7 +38,7 @@ wget https://paddlespeech.bj.bcebos.com/PaddleAudio/cat.wav https://paddlespeech
- `device`: Choose device to execute model inference. Default: default device of paddlepaddle in current environment.
Output:
```sh
```bash
[2021-12-08 14:49:40,671] [ INFO] [utils.py] [L225] - CLS Result:
Cat: 0.8991316556930542
Domestic animals, pets: 0.8806838393211365
......@@ -49,11 +53,23 @@ wget https://paddlespeech.bj.bcebos.com/PaddleAudio/cat.wav https://paddlespeech
```
- Python API
```sh
python tag.py --input ~/cat.wav
```bash
import paddle
from paddlespeech.cli import CLSExecutor
cls_executor = CLSExecutor()
result = cls_executor(
model_type='panns_cnn14',
cfg_path=None, # Set `cfg_path` and `ckpt_path` to None to use pretrained model.
label_file=None,
ckpt_path=None,
audio_file='./cat.wav',
topk=10,
device=paddle.get_device(), )
print('CLS Result: \n{}'.format(result))
```
Output:
```sh
```bash
CLS Result:
Cat: 0.8991316556930542
Domestic animals, pets: 0.8806838393211365
......
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import paddle
from paddlespeech.cli import CLSExecutor
# yapf: disable
parser = argparse.ArgumentParser()
parser.add_argument(
'--input', type=str, required=True, help='Audio file to recognize.')
args = parser.parse_args()
# yapf: enable
if __name__ == '__main__':
cls_executor = CLSExecutor()
result = cls_executor(
model_type='panns_cnn14',
cfg_path=None, # Set `cfg_path` and `ckpt_path` to None to use pretrained model.
label_file=None,
ckpt_path=None,
audio_file=args.input,
topk=10,
device=paddle.get_device(), )
print('CLS Result: \n{}'.format(result))
......@@ -7,7 +7,7 @@ This demo is an implementation to recognize text from a specific audio file. It
## Usage
### 1. Installation
```sh
```bash
pip install paddlespeech
```
......@@ -15,16 +15,20 @@ pip install paddlespeech
Input of this demo should be a WAV file(`.wav`), and the sample rate must be same as the model's.
Here are sample files for this demo that can be downloaded:
```sh
```bash
wget https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
```
### 3. Usage
- Command Line(Recommended)
```sh
```bash
paddlespeech asr --input ~/zh.wav
```
Command usage:
Usage:
```bash
paddlespeech asr --help
```
Arguments:
- `input`(required): Audio file to recognize.
- `model`: Model type of asr task. Default: `conformer_wenetspeech`.
- `lang`: Model language. Default: `zh`.
......@@ -34,16 +38,29 @@ wget https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.
- `device`: Choose device to execute model inference. Default: default device of paddlepaddle in current environment.
Output:
```sh
```bash
[2021-12-08 13:12:34,063] [ INFO] [utils.py] [L225] - ASR Result: 我认为跑步最重要的就是给我带来了身体健康
```
- Python API
```sh
python asr.py --input ~/zh.wav
```python
import paddle
from paddlespeech.cli import ASRExecutor
asr_executor = ASRExecutor()
text = asr_executor(
model='conformer_wenetspeech',
lang='zh',
sample_rate=16000,
config=None, # Set `conf` and `ckpt_path` to None to use pretrained model.
ckpt_path=None,
audio_file='./zh.wav',
device=paddle.get_device())
print('ASR Result: \n{}'.format(text))
```
Output:
```sh
```bash
ASR Result:
我认为跑步最重要的就是给我带来了身体健康
```
......
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import paddle
from paddlespeech.cli import ASRExecutor
# yapf: disable
parser = argparse.ArgumentParser()
parser.add_argument(
'--input', type=str, required=True, help='Audio file to recognize.')
args = parser.parse_args()
# yapf: enable
if __name__ == '__main__':
asr_executor = ASRExecutor()
text = asr_executor(
model='conformer_wenetspeech',
lang='zh',
sample_rate=16000,
config=None, # Set `conf` and `ckpt_path` to None to use pretrained model.
ckpt_path=None,
audio_file=args.input,
device=paddle.get_device(), )
print('ASR Result: \n{}'.format(text))
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册