删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

双向长短时记忆模型训练中的空间平滑正则化方法研究

本站小编 Free考研考试/2022-01-03

李文洁1, 2,
葛凤培1, 2,
张鹏远1, 2,,,
颜永红1, 2, 3
1.中国科学院声学研究所语言声学与内容理解重点实验室 ??北京 ??100190
2.中国科学院大学 ??北京 ??100049
3.中国科学院新疆理化技术研究所新疆民族语音语言信息处理实验室 ??乌鲁木齐 ??830011
基金项目:国家重点研发计划重点专项(2016YFB0801203, 2016YFB0801200),国家自然科学基金(11590770-4, U1536117, 11504406, 11461141004),新疆维吾尔自治区科技重大专项(2016A03007-1)

详细信息
作者简介:李文洁:女,1993年生,博士生,研究方向为语音信号处理、语音识别、声学模型、远场语音识别等
葛凤培:女,1982年生,副研究员,研究方向为语音识别、发音质量评估、声学建模及自适应等
张鹏远:男,1978年生,研究员,硕士生导师,研究方向为大词表非特定人连续语音识别、关键词检索、声学模型、鲁棒语音识别等
颜永红:男,1967年生,研究员,博士生导师,研究方向为语音信号处理、语音识别、口语系统及多模系统、人机界面技术等
通讯作者:张鹏远 pzhang@hccl.ioa.ac.cn
中图分类号:TN912.34

计量

文章访问数:1894
HTML全文浏览量:365
PDF下载量:57
被引次数:0
出版历程

收稿日期:2018-04-03
修回日期:2018-11-22
网络出版日期:2018-12-03
刊出日期:2019-03-01

Spatial Smoothing Regularization for Bi-direction Long Short-term Memory Model

Wenjie LI1, 2,
Fengpei GE1, 2,
Pengyuan ZHANG1, 2,,,
Yonghong YAN1, 2, 3
1. Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Acadamy of Sciences, Beijing 100190, China
2. University of Chinese Academy of Sciences, Beijing 100049, China
3. Xinjiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
Funds:The National Key Research and Development Plan (2016YFB0801203, 2016YFB0801200), The National Natural Science Foundation of China (11590770-4, U1536117, 11504406, 11461141004), The Key Science and Technology Project of the Xinjiang Uygur Autonomous Region (2016A03007-1)


摘要
摘要:双向长短时记忆模型(BLSTM)由于其强大的时间序列建模能力,以及良好的训练稳定性,已经成为语音识别领域主流的声学模型结构。但是该模型结构拥有更大计算量以及参数数量,因此在神经网络训练的过程当中很容易过拟合,进而无法获得理想的识别效果。在实际应用中,通常会使用一些技巧来缓解过拟合问题,例如在待优化的目标函数中加入L2正则项就是常用的方法之一。该文提出一种空间平滑的方法,把BLSTM模型激活值的向量重组成一个2维图,通过滤波变换得到它的空间信息,并将平滑该空间信息作为辅助优化目标,与传统的损失函数一起,作为优化神经网络参数的学习准则。实验表明,在电话交谈语音识别任务上,这种方法相比于基线模型取得了相对4%的词错误率(WER)下降。进一步探索了L2范数正则技术和空间平滑方法的互补性,实验结果表明,同时应用这2种算法,能够取得相对8.6%的WER下降。
关键词:语音信号处理/
空间平滑/
双向长短时记忆模型(LSTM)/
正则化/
过拟合
Abstract:Bi-direction Long Short-Term Memory (BLSTM) model is widely used in large scale acoustic modeling recently. It is superior to many other neural networks on performance and stability. The reason may be that the BLSTM model gets complicated structure and computation with cell and gates, taking more context and time dependence into account during training. However, one of the biggest problem of BLSTM is overfitting, there are some common ways to get over it, for example, multitask learning, L2 model regularization. A method of spatial smoothing is proposed on BLSTM model to relieve the overfitting problem. First, the activations on the hidden layer are reorganized to a 2-D grid, then a filter transform is used to induce smoothness over the grid, finally adding the smooth information to the objective function, to train a BLSTM network. Experiment results show that the proposed spatial smoothing way achieves 4% relative reduction on Word Error Ratio (WER), when adding the L2 norm to model, which can lower the relative WER by 8.6% jointly.
Key words:Speech signal processing/
Spatial smoothing/
Long Short-Term Memory (LSTM)/
Regularization/
Overfitting



PDF全文下载地址:

https://jeit.ac.cn/article/exportPdf?id=40cbedfe-857a-46f4-8d26-edea283b0698
相关话题/空间 声学 信息 优化 北京