删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

基于多模态生成对抗网络和三元组损失的说话人识别

本站小编 Free考研考试/2022-01-03

陈莹,,
陈湟康
江南大学轻工过程先进控制教育部重点实验室 无锡 214122
基金项目:国家自然科学基金(61573168)

详细信息
作者简介:陈莹:女,1976年生,教授,博士,研究方向为信息融合、模式识别
陈湟康:男,1994年生,硕士生,研究方向为说话人识别
通讯作者:陈莹 chenying@jiangnan.edu.cn
中图分类号:TN912.3, TP391

计量

文章访问数:2425
HTML全文浏览量:1216
PDF下载量:102
被引次数:0
出版历程

收稿日期:2019-03-15
修回日期:2019-09-09
网络出版日期:2019-09-19
刊出日期:2020-02-19

Speaker Recognition Based on Multimodal GenerativeAdversarial Nets with Triplet-loss

Ying CHEN,,
Huangkang CHEN
Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education),Jiangnan University, Wuxi 214122, China
Funds:The National Natural Science Foundation of China (61573168))


摘要
摘要:为了挖掘说话人识别领域中人脸和语音的相关性,该文设计多模态生成对抗网络(GAN),将人脸特征和语音特征映射到联系更加紧密的公共空间,随后利用3元组损失对两个模态的联系进一步约束,拉近相同个体跨模态样本的特征距离,拉远不同个体跨模态样本的特征距离。最后通过计算公共空间特征的跨模态余弦距离判断人脸和语音是否匹配,并使用Softmax识别说话人身份。实验结果表明,该方法能有效地提升说话人识别准确率。
关键词:说话人识别/
跨模态/
生成对抗网络/
3元组损失
Abstract:In order to explore the correlation between face and audio in the field of speaker recognition, a novel multimodal Generative Adversarial Network (GAN) is designed to map face features and audio features to a more closely connected common space. Then the Triplet-loss is used to constrain further the relationship between the two modals, with which the intra-class distance of the two modals is narrowed, and the inter-class distance of the two modals is extended. Finally, the cosine distance of the common space features of the two modals is calculated to judge whether the face and the voice are matched, and Softmax is used to recognize the speaker identity. Experimental results show that this method can effectively improve the accuracy of speaker recognition.
Key words:Speaker recognition/
Cross-modal/
Generative Adversarial Network (GAN)/
Triplet-loss



PDF全文下载地址:

https://jeit.ac.cn/article/exportPdf?id=b1c89c52-2983-442d-9e06-e846c4a77741
相关话题/网络 公共 空间 江南大学 博士