提交 1a0c2bea 编写于 作者: X xiongxinlei

add streaming asr demo, test=doc

上级 0b46e423
([简体中文](./README_cn.md)|English)
# Speech Server
## Introduction
This demo is an implementation of starting the voice service and accessing the service. It can be achieved with a single command using `paddlespeech_server` and `paddlespeech_client` or a few lines of code in python.
## Usage
### 1. Installation
see [installation](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
It is recommended to use **paddlepaddle 2.2.1** or above.
You can choose one way from meduim and hard to install paddlespeech.
### 2. Prepare config File
The configuration file can be found in `conf/application.yaml` .
Among them, `engine_list` indicates the speech engine that will be included in the service to be started, in the format of `<speech task>_<engine type>`.
At present, the speech tasks integrated by the service include: asr (speech recognition), tts (text to sppech) and cls (audio classification).
Currently the engine type supports two forms: python and inference (Paddle Inference)
The input of ASR client demo should be a WAV file(`.wav`), and the sample rate must be the same as the model.
Here are sample files for thisASR client demo that can be downloaded:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
```
### 3. Server Usage
- Command Line (Recommended)
```bash
# start the service
paddlespeech_server start --config_file ./conf/application.yaml
```
Usage:
```bash
paddlespeech_server start --help
```
Arguments:
- `config_file`: yaml file of the app, defalut: ./conf/application.yaml
- `log_file`: log file. Default: ./log/paddlespeech.log
Output:
```bash
[2022-02-23 11:17:32] [INFO] [server.py:64] Started server process [6384]
INFO: Waiting for application startup.
[2022-02-23 11:17:32] [INFO] [on.py:26] Waiting for application startup.
INFO: Application startup complete.
[2022-02-23 11:17:32] [INFO] [on.py:38] Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
[2022-02-23 11:17:32] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
```
- Python API
```python
from paddlespeech.server.bin.paddlespeech_server import ServerExecutor
server_executor = ServerExecutor()
server_executor(
config_file="./conf/application.yaml",
log_file="./log/paddlespeech.log")
```
Output:
```bash
INFO: Started server process [529]
[2022-02-23 14:57:56] [INFO] [server.py:64] Started server process [529]
INFO: Waiting for application startup.
[2022-02-23 14:57:56] [INFO] [on.py:26] Waiting for application startup.
INFO: Application startup complete.
[2022-02-23 14:57:56] [INFO] [on.py:38] Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
[2022-02-23 14:57:56] [INFO] [server.py:204] Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
```
### 4. ASR Client Usage
**Note:** The response time will be slightly longer when using the client for the first time
- Command Line (Recommended)
```
paddlespeech_client asr --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
```
Usage:
```bash
paddlespeech_client asr --help
```
Arguments:
- `server_ip`: server ip. Default: 127.0.0.1
- `port`: server port. Default: 8090
- `input`(required): Audio file to be recognized.
- `sample_rate`: Audio ampling rate, default: 16000.
- `lang`: Language. Default: "zh_cn".
- `audio_format`: Audio format. Default: "wav".
Output:
```bash
[2022-02-23 18:11:22,819] [ INFO] - {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'transcription': '我认为跑步最重要的就是给我带来了身体健康'}}
[2022-02-23 18:11:22,820] [ INFO] - time cost 0.689145 s.
```
- Python API
```python
from paddlespeech.server.bin.paddlespeech_client import ASRClientExecutor
import json
asrclient_executor = ASRClientExecutor()
res = asrclient_executor(
input="./zh.wav",
server_ip="127.0.0.1",
port=8090,
sample_rate=16000,
lang="zh_cn",
audio_format="wav")
print(res.json())
```
Output:
```bash
{'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'transcription': '我认为跑步最重要的就是给我带来了身体健康'}}
```
### 5. TTS Client Usage
**Note:** The response time will be slightly longer when using the client for the first time
- Command Line (Recommended)
```bash
paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,欢迎使用百度飞桨语音合成服务。" --output output.wav
```
Usage:
```bash
paddlespeech_client tts --help
```
Arguments:
- `server_ip`: server ip. Default: 127.0.0.1
- `port`: server port. Default: 8090
- `input`(required): Input text to generate.
- `spk_id`: Speaker id for multi-speaker text to speech. Default: 0
- `speed`: Audio speed, the value should be set between 0 and 3. Default: 1.0
- `volume`: Audio volume, the value should be set between 0 and 3. Default: 1.0
- `sample_rate`: Sampling rate, choice: [0, 8000, 16000], the default is the same as the model. Default: 0
- `output`: Output wave filepath. Default: None, which means not to save the audio to the local.
Output:
```bash
[2022-02-23 15:20:37,875] [ INFO] - {'description': 'success.'}
[2022-02-23 15:20:37,875] [ INFO] - Save synthesized audio successfully on output.wav.
[2022-02-23 15:20:37,875] [ INFO] - Audio duration: 3.612500 s.
[2022-02-23 15:20:37,875] [ INFO] - Response time: 0.348050 s.
```
- Python API
```python
from paddlespeech.server.bin.paddlespeech_client import TTSClientExecutor
import json
ttsclient_executor = TTSClientExecutor()
res = ttsclient_executor(
input="您好,欢迎使用百度飞桨语音合成服务。",
server_ip="127.0.0.1",
port=8090,
spk_id=0,
speed=1.0,
volume=1.0,
sample_rate=0,
output="./output.wav")
response_dict = res.json()
print(response_dict["message"])
print("Save synthesized audio successfully on %s." % (response_dict['result']['save_path']))
print("Audio duration: %f s." %(response_dict['result']['duration']))
```
Output:
```bash
{'description': 'success.'}
Save synthesized audio successfully on ./output.wav.
Audio duration: 3.612500 s.
```
### 6. CLS Client Usage
**Note:** The response time will be slightly longer when using the client for the first time
- Command Line (Recommended)
```
paddlespeech_client cls --server_ip 127.0.0.1 --port 8090 --input ./zh.wav
```
Usage:
```bash
paddlespeech_client cls --help
```
Arguments:
- `server_ip`: server ip. Default: 127.0.0.1
- `port`: server port. Default: 8090
- `input`(required): Audio file to be classified.
- `topk`: topk scores of classification result.
Output:
```bash
[2022-03-09 20:44:39,974] [ INFO] - {'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'topk': 1, 'results': [{'class_name': 'Speech', 'prob': 0.9027184844017029}]}}
[2022-03-09 20:44:39,975] [ INFO] - Response time 0.104360 s.
```
- Python API
```python
from paddlespeech.server.bin.paddlespeech_client import CLSClientExecutor
import json
clsclient_executor = CLSClientExecutor()
res = clsclient_executor(
input="./zh.wav",
server_ip="127.0.0.1",
port=8090,
topk=1)
print(res.json())
```
Output:
```bash
{'success': True, 'code': 200, 'message': {'description': 'success'}, 'result': {'topk': 1, 'results': [{'class_name': 'Speech', 'prob': 0.9027184844017029}]}}
```
## Models supported by the service
### ASR model
Get all models supported by the ASR service via `paddlespeech_server stats --task asr`, where static models can be used for paddle inference inference.
### TTS model
Get all models supported by the TTS service via `paddlespeech_server stats --task tts`, where static models can be used for paddle inference inference.
### CLS model
Get all models supported by the CLS service via `paddlespeech_server stats --task cls`, where static models can be used for paddle inference inference.
此差异已折叠。
# This is the parameter configuration file for PaddleSpeech Serving.
#################################################################################
# SERVER SETTING #
#################################################################################
host: 0.0.0.0
port: 8090
# The task format in the engin_list is: <speech task>_<engine type>
# task choices = ['asr_online', 'tts_online']
# protocol = ['websocket', 'http'] (only one can be selected).
# websocket only support online engine type.
protocol: 'websocket'
engine_list: ['asr_online']
#################################################################################
# ENGINE CONFIG #
#################################################################################
################################### ASR #########################################
################### speech task: asr; engine_type: online #######################
asr_online:
model_type: 'deepspeech2online_aishell'
am_model: # the pdmodel file of am static model [optional]
am_params: # the pdiparams file of am static model [optional]
lang: 'zh'
sample_rate: 16000
cfg_path:
decode_method:
force_yes: True
am_predictor_conf:
device: # set 'gpu:id' or 'cpu'
switch_ir_optim: True
glog_info: False # True -> print glog
summary: True # False -> do not show predictor config
chunk_buffer_conf:
frame_duration_ms: 80
shift_ms: 40
sample_rate: 16000
sample_width: 2
window_n: 7 # frame
shift_n: 4 # frame
window_ms: 20 # ms
shift_ms: 10 # ms
# This is the parameter configuration file for PaddleSpeech Serving.
#################################################################################
# SERVER SETTING #
#################################################################################
host: 0.0.0.0
port: 8090
# The task format in the engin_list is: <speech task>_<engine type>
# task choices = ['asr_online', 'tts_online']
# protocol = ['websocket', 'http'] (only one can be selected).
# websocket only support online engine type.
protocol: 'websocket'
engine_list: ['asr_online']
#################################################################################
# ENGINE CONFIG #
#################################################################################
################################### ASR #########################################
################### speech task: asr; engine_type: online #######################
asr_online:
model_type: 'conformer_online_multicn'
am_model: # the pdmodel file of am static model [optional]
am_params: # the pdiparams file of am static model [optional]
lang: 'zh'
sample_rate: 16000
cfg_path:
decode_method:
force_yes: True
device: 'cpu' # cpu or gpu:id
am_predictor_conf:
device: # set 'gpu:id' or 'cpu'
switch_ir_optim: True
glog_info: False # True -> print glog
summary: True # False -> do not show predictor config
chunk_buffer_conf:
window_n: 7 # frame
shift_n: 4 # frame
window_ms: 25 # ms
shift_ms: 10 # ms
sample_rate: 16000
sample_width: 2
\ No newline at end of file
# start the streaming asr service
paddlespeech_server start --config_file ./conf/ws_conformer_application.yaml
\ No newline at end of file
# download the test wav
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
# read the wav and pass it to service
python3 websocket_client.py --wavfile ./zh.wav
......@@ -16,102 +16,24 @@
import argparse
import asyncio
import codecs
import json
import logging
import os
import numpy as np
import soundfile
import websockets
class ASRAudioHandler:
def __init__(self, url="127.0.0.1", port=8090):
self.url = url
self.port = port
self.url = "ws://" + self.url + ":" + str(self.port) + "/ws/asr"
def read_wave(self, wavfile_path: str):
samples, sample_rate = soundfile.read(wavfile_path, dtype='int16')
x_len = len(samples)
chunk_size = 85 * 16 #80ms, sample_rate = 16kHz
if x_len % chunk_size!= 0:
padding_len_x = chunk_size - x_len % chunk_size
else:
padding_len_x = 0
padding = np.zeros((padding_len_x), dtype=samples.dtype)
padded_x = np.concatenate([samples, padding], axis=0)
assert (x_len + padding_len_x) % chunk_size == 0
num_chunk = (x_len + padding_len_x) / chunk_size
num_chunk = int(num_chunk)
for i in range(0, num_chunk):
start = i * chunk_size
end = start + chunk_size
x_chunk = padded_x[start:end]
yield x_chunk
async def run(self, wavfile_path: str):
logging.info("send a message to the server")
# self.read_wave()
# send websocket handshake protocal
async with websockets.connect(self.url) as ws:
# server has already received handshake protocal
# client start to send the command
audio_info = json.dumps(
{
"name": "test.wav",
"signal": "start",
"nbest": 5
},
sort_keys=True,
indent=4,
separators=(',', ': '))
await ws.send(audio_info)
msg = await ws.recv()
logging.info("receive msg={}".format(msg))
# send chunk audio data to engine
for chunk_data in self.read_wave(wavfile_path):
await ws.send(chunk_data.tobytes())
msg = await ws.recv()
msg = json.loads(msg)
logging.info("receive msg={}".format(msg))
# finished
audio_info = json.dumps(
{
"name": "test.wav",
"signal": "end",
"nbest": 5
},
sort_keys=True,
indent=4,
separators=(',', ': '))
await ws.send(audio_info)
msg = await ws.recv()
# decode the bytes to str
msg = json.loads(msg)
logging.info("final receive msg={}".format(msg))
result = msg
return result
from paddlespeech.cli.log import logger
from paddlespeech.server.utils.audio_handler import ASRAudioHandler
def main(args):
logging.basicConfig(level=logging.INFO)
logging.info("asr websocket client start")
logger.info("asr websocket client start")
handler = ASRAudioHandler("127.0.0.1", 8090)
loop = asyncio.get_event_loop()
# support to process single audio file
if args.wavfile and os.path.exists(args.wavfile):
logging.info(f"start to process the wavscp: {args.wavfile}")
logger.info(f"start to process the wavscp: {args.wavfile}")
result = loop.run_until_complete(handler.run(args.wavfile))
result = result["asr_results"]
logging.info(f"asr websocket client finished : {result}")
logger.info(f"asr websocket client finished : {result}")
# support to process batch audios from wav.scp
if args.wavscp and os.path.exists(args.wavscp):
......@@ -126,6 +48,7 @@ def main(args):
if __name__ == "__main__":
logger.info("Start to do streaming asr client")
parser = argparse.ArgumentParser()
parser.add_argument(
"--wavfile",
......
......@@ -30,11 +30,14 @@ from ..executor import BaseExecutor
from ..util import cli_client_register
from ..util import stats_wrapper
from paddlespeech.cli.log import logger
from paddlespeech.server.tests.asr.online.websocket_client import ASRAudioHandler
from paddlespeech.server.utils.audio_handler import ASRAudioHandler
from paddlespeech.server.utils.audio_process import wav2pcm
from paddlespeech.server.utils.util import wav2base64
__all__ = ['TTSClientExecutor', 'ASRClientExecutor', 'CLSClientExecutor']
__all__ = [
'TTSClientExecutor', 'ASRClientExecutor', 'ASRClientExecutor',
'CLSClientExecutor'
]
@cli_client_register(
......@@ -236,11 +239,11 @@ class ASRClientExecutor(BaseExecutor):
@cli_client_register(
name='paddlespeech_client.asr_online',
description='visit asr online service')
class ASRClientExecutor(BaseExecutor):
class ASROnlineClientExecutor(BaseExecutor):
def __init__(self):
super(ASRClientExecutor, self).__init__()
super(ASROnlineClientExecutor, self).__init__()
self.parser = argparse.ArgumentParser(
prog='paddlespeech_client.asr', add_help=True)
prog='paddlespeech_client.asr_online', add_help=True)
self.parser.add_argument(
'--server_ip', type=str, default='127.0.0.1', help='server ip')
self.parser.add_argument(
......@@ -305,6 +308,7 @@ class ASRClientExecutor(BaseExecutor):
return res['asr_results']
@cli_client_register(
name='paddlespeech_client.cls', description='visit cls service')
class CLSClientExecutor(BaseExecutor):
......
......@@ -29,7 +29,7 @@ asr_online:
cfg_path:
decode_method:
force_yes: True
device: 'cpu' # cpu or gpu:id
am_predictor_conf:
device: # set 'gpu:id' or 'cpu'
switch_ir_optim: True
......
......@@ -1028,6 +1028,17 @@ class ASREngine(BaseEngine):
self.output = ""
self.executor = ASRServerExecutor()
self.config = config
try:
if self.config.get("device", None):
self.device = self.config.device
else:
self.device = paddle.get_device()
logger.info(f"paddlespeech_server set the device: {self.device}")
paddle.set_device(self.device)
except BaseException:
logger.error(
"Set device failed, please check if device is already used and the parameter 'device' in the yaml file"
)
self.executor._init_from_path(
model_type=self.config.model_type,
......
([简体中文](./README_cn.md)|English)
# 语音服务
## 介绍
本文档介绍如何使用流式ASR的三种不同客户端:网页、麦克风、Python模拟流式服务。
## 使用方法
### 1. 安装
请看 [安装文档](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install.md).
推荐使用 **paddlepaddle 2.2.1** 或以上版本。
你可以从 medium,hard 三中方式中选择一种方式安装 PaddleSpeech。
### 2. 准备测试文件
这个 ASR client 的输入应该是一个 WAV 文件(`.wav`),并且采样率必须与模型的采样率相同。
可以下载此 ASR client的示例音频:
```bash
wget -c https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav https://paddlespeech.bj.bcebos.com/PaddleAudio/en.wav
```
### 2. 流式 ASR 客户端使用方法
- Python模拟流式服务命令行
```
# 流式ASR
paddlespeech_client asr_online --server_ip 127.0.0.1 --port 8091 --input ./zh.wav
```
- 麦克风
```
# 直接调用麦克风设备
python microphone_client.py
```
- 网页
```
# 进入web目录后参考相关readme.md
```
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
record wave from the mic
"""
import asyncio
import json
import logging
import threading
import wave
from signal import SIGINT
from signal import SIGTERM
import pyaudio
import websockets
class ASRAudioHandler(threading.Thread):
def __init__(self, url="127.0.0.1", port=8091):
threading.Thread.__init__(self)
self.url = url
self.port = port
self.url = "ws://" + self.url + ":" + str(self.port) + "/ws/asr"
self.fileName = "./output.wav"
self.chunk = 5120
self.format = pyaudio.paInt16
self.channels = 1
self.rate = 16000
self._running = True
self._frames = []
self.data_backup = []
def startrecord(self):
"""
start a new thread to record wave
"""
threading._start_new_thread(self.recording, ())
def recording(self):
"""
recording wave
"""
self._running = True
self._frames = []
p = pyaudio.PyAudio()
stream = p.open(
format=self.format,
channels=self.channels,
rate=self.rate,
input=True,
frames_per_buffer=self.chunk)
while (self._running):
data = stream.read(self.chunk)
self._frames.append(data)
self.data_backup.append(data)
stream.stop_stream()
stream.close()
p.terminate()
def save(self):
"""
save wave data
"""
p = pyaudio.PyAudio()
wf = wave.open(self.fileName, 'wb')
wf.setnchannels(self.channels)
wf.setsampwidth(p.get_sample_size(self.format))
wf.setframerate(self.rate)
wf.writeframes(b''.join(self.data_backup))
wf.close()
p.terminate()
def stoprecord(self):
"""
stop recording
"""
self._running = False
async def run(self):
aa = input("是否开始录音? (y/n)")
if aa.strip() == "y":
self.startrecord()
logging.info("*" * 10 + "开始录音,请输入语音")
async with websockets.connect(self.url) as ws:
# 发送开始指令
audio_info = json.dumps(
{
"name": "test.wav",
"signal": "start",
"nbest": 5
},
sort_keys=True,
indent=4,
separators=(',', ': '))
await ws.send(audio_info)
msg = await ws.recv()
logging.info("receive msg={}".format(msg))
# send bytes data
logging.info("结束录音请: Ctrl + c。继续请按回车。")
try:
while True:
while len(self._frames) > 0:
await ws.send(self._frames.pop(0))
msg = await ws.recv()
logging.info("receive msg={}".format(msg))
except asyncio.CancelledError:
# quit
# send finished
audio_info = json.dumps(
{
"name": "test.wav",
"signal": "end",
"nbest": 5
},
sort_keys=True,
indent=4,
separators=(',', ': '))
await ws.send(audio_info)
msg = await ws.recv()
logging.info("receive msg={}".format(msg))
self.stoprecord()
logging.info("*" * 10 + "录音结束")
self.save()
elif aa.strip() == "n":
exit()
else:
print("无效输入!")
exit()
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
logging.info("asr websocket client start")
handler = ASRAudioHandler("127.0.0.1", 8091)
loop = asyncio.get_event_loop()
main_task = asyncio.ensure_future(handler.run())
for signal in [SIGINT, SIGTERM]:
loop.add_signal_handler(signal, main_task.cancel)
try:
loop.run_until_complete(main_task)
finally:
loop.close()
logging.info("asr websocket client finished")
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册