删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

基于Transformer的普通话语声识别模型位置编码选择

本站小编 Free考研考试/2022-01-02

-->
徐冬冬.基于Transformer的普通话语声识别模型位置编码选择[J].,2021,40(2):194-199
基于Transformer的普通话语声识别模型位置编码选择
Transformer-based position coding selection of Mandarin speech recognition model
投稿时间:2020-05-23修订日期:2021-03-01
中文摘要:
具有自注意机制的Transformer网络在语音识别研究领域渐渐得到广泛关注。本文围绕着将位置信息嵌入与语音特征相结合的方向,研究更加适合普通话语音识别模型的位置编码方法。实验结果得出,采用卷积编码的输入表示代替正弦位置编码,可以更好地融合语音特征上下文联系和相对位置信息,获得较好的识别效果。训练的语音识别系统是在Transformer模型基础上,比较四种不同的位置编码方法。结合3-gram语言模型,所提出的卷积位置编码方法,在中文语音数据集AISHELL-1上的字识别错误率降低至8.16%。
英文摘要:
The Transformer network with self-attention mechanism has gradually gained wide attention in the field of speech recognition research. This paper revolves around the direction of embedding location information and speech features, and studies the location coding method that is more suitable for Mandarin speech recognition model. The experimental results show that the input representation of convolutional coding instead of sinusoidal position coding can better integrate the contextual relationship of speech features and relative position information, and obtain better recognition results. The trained speech recognition system is based on the Transformer model and compares four different position coding methods. Combined with the 3-gram language model and the proposed convolutional position coding method, the word recognition error rate on the Chinese speech data set AISHELL-1 is reduced to 8.16%.
DOI:10.11684/j.issn.1000-310X.2021.02.004
中文关键词:自注意力位置编码卷积
英文关键词:self-attentionposition codingconvolution
基金项目:
作者单位E-mail
徐冬冬中国航天科工集团第二研究院 研究生院 北京329696974@qq.com
摘要点击次数:485
全文下载次数:419
查看全文查看/发表评论下载PDF阅读器
相关附件:修改说明1修改说明1修改说明2附件1
关闭








PDF全文下载地址:

http://yysx.cnjournals.cn/ch/reader/create_pdf.aspx?file_no=20084&flag=1&journal_id=yysx&year_id=2021
相关话题/中文 北京 研究生院 中国航天科工集团第二研究院 英文