小资源下语音识别算法设计与优化

删除或更新信息，请邮件至freekaoyan#163.com(#换成@)

清华大学辅仁网/2017-07-07

小资源下语音识别算法设计与优化

张鹏远¹, 计哲², 侯炜², 金鑫², 韩卫生¹

1. 中国科学院声学研究所, 语言声学与内容理解重点实验室, 北京 100190;
2. 国家计算机网络应急技术处理协调中心, 北京 100029

Design and optimization of a low resource speech recognition system

ZHANG Pengyuan¹, JI Zhe², HOU Wei², JIN Xin², HAN Weisheng¹

1. Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China;
2. National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100029, China

摘要:

输出: BibTeX | EndNote (RIS)

摘要可穿戴设备和智能家居系统需要语音识别引擎占用极小的资源并具有较强的拒识能力。传统的语音识别算法无法满足小资源系统的这种需求。该文针对小资源下语音识别系统，在解码策略和拒识算法设计上均提出了改进方法。在解码策略上，通过修改垃圾音素的重入，使得集外语音的拒识率提高到64.8%，而内存占用只增加了8.5 kB。在拒识算法上，提出了离线计算背景概率和在线查表的方法，与基线系统相比，在集内识别率略有损失的情况下，集外拒识率达到93.8%，而内存占用和计算速度也得到了优化。

关键词 ：语音识别,小资源,置信度

Abstract：Wearable devices and smart home systems need speech recognition engines with few resources and high rejection rates. Traditional methods cannot provide such systems. This paper presents algorithms for decoding and rejection for a low source speech recognition system. The decoding improves the rejection rate up to 64.8% by changing the filler reentry while the memory is only increased 8.5 kB compared with the baseline system. The rejection algorithm computes a background probability which is compared to similar probabilities calculated in advance online decoding. The system gives a rejection rate of 93.8% with little loss in the recognition rate. The memory and computational speed are also optimized.

Key words：speech recognition low resource confidence measure

收稿日期: 2016-06-29 出版日期: 2017-02-21

ZTFLH:

TN912.34

引用本文:

张鹏远, 计哲, 侯炜, 金鑫, 韩卫生. 小资源下语音识别算法设计与优化[J]. 清华大学学报（自然科学版）, 2017, 57(2): 147-152.
ZHANG Pengyuan, JI Zhe, HOU Wei, JIN Xin, HAN Weisheng. Design and optimization of a low resource speech recognition system. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 147-152.

链接本文:

http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2017.22.006或 http://jst.tsinghuajournals.com/CN/Y2017/V57/I2/147

图表:

图1 命令词网络

图2 垃圾音素网络

图3 垃圾音素的拒识算法流程图

图4 垃圾音素的重入示例

图5 在线置信度计算流程图

表1 不同垃圾音素处理策略的性能对比

表2 不同置信度策略的性能对比

参考文献:

[1]	韩娜, 钟卓成, 吴振权, 等. 基于体感控制的智能家居系统设计与实现[J]. 信息技术, 2015(12):91-93.HAN Na, ZHONG Zhuocheng, WU Zhenquan, et al. Design and implementation of smart home system based on somatosensory control[J]. Information Technology, 2015(12):91-93. (in Chinese)
[2]	叶高扬, 毕冉. 基于物联网的智能家居系统设计与实现[J]. 计算机应用, 2014(S1):318-319.YE Gaoyang, BI Ran. Design and implementation of smart home system based on Internet of things[J]. Journal of Computer Applications, 2014(S1):318-319. (in Chinese)
[3]	Joshi V, Bilgi R, Umesh S, et al. Sub-band based histogram equalization in cepstral domain for speech recognition[J]. Speech Communication, 2015, 69:46-65.
[4]	王智国. 嵌入式人机语音交互系统关键技术研究[D]. 合肥:中国科学技术大学, 2014.WANG Zhiguo. Research on Key Technologies of Embedded Human-Machine Speech Interaction System[D]. Hefei:University of Science and Technology of China, 2014. (in Chinese)
[5]	邵健, 韩疆, 颜永红. 嵌入式语音识别中一种高效的搜索树构造方法[C]//第8届全国人机语音通讯学术会议. 北京, 2005.SHAO Jian, HAN Jiang, YAN Yonghong. An efficient search algorithm in embed speech recognition[C]//The Eighth National Conference on Man-Machine Speech Communication. Beijing, China, 2005. (in Chinese)
[6]	Jiang H. Confidence measures for speech recognition:A survey[J]. Speech Communication, 2005, 45(4):455-470.
[7]	Sanchez-Cortina I, Andrés-Ferrer J, Sanchis A, et al. Speaker-adapted confidence measures for speech recognition of video lectures[J]. Computer Speech & Language, 2016, 37:11-23.
[8]	Young S R. Detecting misrecognitions and out-of-vocabulary words[C]//Acoustics, Speech, and Signal Processing. Adelaide, SA, Australia, 1994, 2:21-24.
[9]	Wessel F, Schluter R, Macherey K, et al. Confidence measures for large vocabulary continuous speech recognition[J]. IEEE Transactions on Speech and Audio Processing, 2001, 9(3):288-298.
[10]	Yoma N B, Carrasco J, Molina C. Bayes-based confidence measure in speech recognition[J]. IEEE Signal Processing Letters, 2005, 12(11):745-748.
[11]	Sherif A, Scordilis M S. Beam search pruning in speech recognition using a posterior probability-based confidence measure[J]. Speech Communication, 2003, 42:409-428.
[12]	Sanchis A, Juan A, Vidal E. A word-based naïve Bayes classifier for confidence estimation in speech recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(2):565-574.

[1]	米吉提·阿不里米提, 艾克白尔·帕塔尔, 艾斯卡尔·艾木都拉. 基于层次化结构的语言模型单元集优化[J]. 清华大学学报（自然科学版）, 2017, 57(3): 257-263.
[2]	艾斯卡尔·肉孜, 殷实, 张之勇, 王东, 艾斯卡尔·艾木都拉, 郑方. THUYG-20:免费的维吾尔语语音数据库[J]. 清华大学学报（自然科学版）, 2017, 57(2): 182-187.
[3]	王建荣, 张句, 路文焕, 魏建国, 党建武. 机器人自身噪声环境下的自动语音识别[J]. 清华大学学报（自然科学版）, 2017, 57(2): 153-157.
[4]	邢安昊, 张鹏远, 潘接林, 颜永红. 基于SVD的DNN裁剪方法和重训练[J]. 清华大学学报（自然科学版）, 2016, 56(7): 772-776.