请问一下是什么原因导致识别效果很差呢?
Created by: xylcbd
测试了2个模型,aishell和baidu_cn1.2k,在warm_up阶段看起来效果都很好,在实际使用中效果则非常差,请问可能是什么原因呢?谢谢!
我采用的是android平板进行录音,16000 sample rate,16bit signed PCM转32 bit signed PCM的wav。
听了一下,感觉会有一些背景噪声。
日志如下:
----------- Configuration Arguments -----------
alpha: 2.5
beam_size: 500
beta: 0.3
cutoff_prob: 1.0
cutoff_top_n: 40
decoding_method: ctc_beam_search
host_ip: 0.0.0.0
host_port: 8086
lang_model_path: ./models/lm/zh_giga.no_cna_cmn.prune01244.klm
mean_std_path: ./models/aishell/mean_std.npz
model_path: ./models/aishell/params.tar.gz
num_conv_layers: 2
num_rnn_layers: 3
rnn_layer_size: 1024
share_rnn_weights: True
specgram_type: linear
speech_save_dir: demo_cache
use_gpu: True
use_gru: True
vocab_path: ./models/aishell/vocab.txt
warmup_manifest: data/aishell/manifest.test
------------------------------------------------
I0511 17:51:47.988626 26012 Util.cpp:166] commandline: --use_gpu=True --trainer_count=1
[INFO 2018-05-11 17:51:55,003 layers.py:2714] output for __conv_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2018-05-11 17:51:55,004 layers.py:3282] output for __batch_norm_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2018-05-11 17:51:55,005 layers.py:7454] output for __scale_sub_region_0__: c = 32, h = 81, w = 54, size = 139968
[INFO 2018-05-11 17:51:55,005 layers.py:2714] output for __conv_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2018-05-11 17:51:55,006 layers.py:3282] output for __batch_norm_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2018-05-11 17:51:55,007 layers.py:7454] output for __scale_sub_region_1__: c = 32, h = 41, w = 54, size = 70848
[INFO 2018-05-11 17:52:00,335 model.py:243] begin to initialize the external scorer for decoding
[INFO 2018-05-11 17:52:00,453 model.py:253] language model: is_character_based = 1, max_order = 5, dict_size = 0
[INFO 2018-05-11 17:52:00,453 model.py:254] end initializing scorer
-----------------------------------------------------------
Warming up ...
('Warm-up Test Case %d: %s', 0, u'/home/xxx/.cache/paddle/dataset/speech/Aishell/data_aishell/wav/test/S0913/BAC009S0913W0265.wav')
Response Time: 2.252022, Transcript: 导致技术支持世界超过了预期分配时间
('Warm-up Test Case %d: %s', 1, u'/home/xxx/.cache/paddle/dataset/speech/Aishell/data_aishell/wav/test/S0912/BAC009S0912W0378.wav')
Response Time: 1.892228, Transcript: 早在申办北京冬奥会的时候
('Warm-up Test Case %d: %s', 2, u'/home/xxx/.cache/paddle/dataset/speech/Aishell/data_aishell/wav/test/S0902/BAC009S0902W0485.wav')