张晓宇,张华熊,高强.基于深度学习的多模式权重网络语音情感识别[J].,2022,62(5):526-534 |
基于深度学习的多模式权重网络语音情感识别 |
Multi-modal weighted network for speech emotion recognition based on deep learning |
|
DOI:10.7511/dllgxb202205011 |
中文关键词:多模式语音情感识别权重网络频谱机器学习深度学习 |
英文关键词:multi-modal speech emotion recognitionweighted networkspectrummachine learningdeep learning |
基金项目:浙江省重点研发计划资助项目(2020C03104). |
|
摘要点击次数:233 |
全文下载次数:259 |
中文摘要: |
语音的单一特征如声学信息往往不能涵盖所有的语音特性,从而导致语音情感识别率低,为此提出了一种基于深度学习的多模式权重网络模型.该模型先采用循环神经网络和ResNet-50网络基于语音声学特征、语音转化成文本信息后的语义特征、语音频谱特征建立3种分类器,然后将这3种分类器的分类结果作为权重网络模型的输入进行训练以获取最优权重,最终基于该权重构建权重网络分类器实现语音情感识别.实验结果表明,其语音情感识别率达到了75.4%,比单一特征语音情感识别模型提高了5%以上,相对应用集成学习结合3种特征的模型提高了10.4%,并且与现有语音情感识别模型相比,多模式权重网络模型具有更高的识别率. |
英文摘要: |
A single feature of speech, such as acoustic information, often can not cover all speech features, which results in a low accuracy of speech emotion recognition. Therefore, a multi-modal weighted network model based on deep learning is proposed. In this model, recurrent neural network and ResNet-50 network are firstly used to establish three classifiers based on the acoustic features of speech, the semantic features of text transformed from speech, and the frequency spectrum features of speech. Then, the classification results of the above three classifiers are used as the input of the weighted network model for training to obtain the optimal weight, and based on the weight the weighted network classifier is constructed to realize the speech emotion recognition finally. Experimental results show that the recognition accuracy reaches 75.4%, which is improved by more than 5% compared with the single feature speech emotion recognition model and 10.4% higher than the ensemble learning methods combining three features. Moreover, compared with the existing speech emotion recognition model, the multi-modal weighted network model has a higher recognition accuracy. |
查看全文查看/发表评论下载PDF阅读器 |
| --> 关闭 |