全变量系统和支持向量机结合的说话人确认

删除或更新信息，请邮件至freekaoyan#163.com(#换成@)

清华大学辅仁网/2017-07-07

全变量系统和支持向量机结合的说话人确认

郭武¹, 张圣¹, 徐杰², 胡国平³, 马啸空¹

1. 中国科学技术大学电子工程与信息科学系, 合肥 230026;
2. 国家计算机网络应急技术处理协调中心, 北京 100029;
3. 科大讯飞股份有限公司, 合肥 230088

Speaker verification based on SVM and total variability

GUO Wu¹, ZHANG Sheng¹, XU Jie², HU Guoping³, MA Xiaokong¹

1. Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei 230026, China;
2. National Computer Network Emergency Response Technical Team Coordination Center of China, Beijing 100029, China;
3. IFLYTEK Corporation, Hefei 230088, China

摘要:

输出: BibTeX | EndNote (RIS)

摘要基于全变量因子分析和概率线性区分性分析的算法是目前与文本无关的说话人确认的主流算法。该文将全变量分析和支持向量机结合起来，把低维的全变量因子作为支持向量机的输入特征，并采用余弦核函数来增强低维特征的区分性，该系统取得了与当前主流算法相当的性能；进一步，将此系统得分和概率线性鉴别分析系统得分融合起来可以取得明显的性能提升。在NIST 2012说话人评测通用测试条件的女声部分，融合后的系统在情境一和三的检测代价函数相对最好的单系统分别下降了25.1%和25.2%。

关键词 ：说话人确认,全变量系统,支持向量机,核函数

Abstract：The total variability factor extractor and the probability linear discriminant analysis (PLDA) algorithms have been the state-of-the-art for text-independent speaker verification. This study combines a support vector machine (SVM) with the PLDA. The low dimensional i-vectors of the total variability system are used as the inputs to the support vector machine, with the cosine kernel function used to achieve better discrimination. This method achieves considerable performance improvement with the PLDA system. Furthermore, the score fusion of the SVM with the PLDA give even better results. Tests were conducted on the female part of the interview section of the NIST 2012 core test corpus. The detection cost function (DCF) of the fusion system was reduced by 25.1% for common condition 1 and 25.2% for condition 3 compared with the best results for a single system.

Key words：speaker verification total variability support vector machine kernel function

收稿日期: 2016-06-21 出版日期: 2017-03-25

ZTFLH:

TN912.34

引用本文:

郭武, 张圣, 徐杰, 胡国平, 马啸空. 全变量系统和支持向量机结合的说话人确认[J]. 清华大学学报（自然科学版）, 2017, 57(3): 240-243.
GUO Wu, ZHANG Sheng, XU Jie, HU Guoping, MA Xiaokong. Speaker verification based on SVM and total variability. Journal of Tsinghua University(Science and Technology), 2017, 57(3): 240-243.

链接本文:

http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2017.26.003或 http://jst.tsinghuajournals.com/CN/Y2017/V57/I3/240

图表:

表1 不同输入特征矢量实验对比

表2 SVM 系统一系列实验对比

表3 余弦核SVM 下不同规整方法对比

表4 系统得分融合前后的性能对比

参考文献:

[1]	Reynolds D A, Quatieri T F, Dunn R B. Speaker verification using adapted Gaussian mixture models[J]. Digital Signal Processing, 2000, 10(1):19-41.
[2]	Kenny P, Boulianne G, Ouellet P, et al. Joint factor analysis versus eigenchannels in speaker recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(4):1435-1447.
[3]	Dehak N, Kenny P J, Dehak R, et al. Front-end factor analysis for speaker verification[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(4):788-798.
[4]	Prince S J D, Elder J H. Probabilistic linear discriminant analysis for inferences about identity[C]//2007 IEEE 11th International Conference on Computer Vision. Rio de Janeiro, Brazil:IEEE Press, 2007:1-8.
[5]	Burget L, Plchot O, Cumani S, et al. Discriminatively trained probabilistic linear discriminant analysis for speaker verification[C]//2011 IEEE international conference on acoustics, speech and signal processing (ICASSP). Prague, Czech Republic:IEEE Press, 2011:4832-4835.
[6]	Jiang Y, Kong A L, Wang L. PLDA in the i-supervector space for text-independent speaker verification[J]. Eurasip Journal on Audio Speech and Music Processing, 2014, 2014(1):1-13.
[7]	Kenny P, Stafylakis T, Ouellet P, et al. PLDA for speaker verification with utterances of arbitrary duration[C]//2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Brisbane, Australia:IEEE Press, 2013:7649-7653.
[8]	Li N, Mak M W. SNR-invariant PLDA modeling in nonparametric subspace for robust speaker verification[J]. IEEE/ACM Transactions on Audio Speech and Language Processing, 2015, 23(10):1648-1659.
[9]	Bourouba H, Korba C A, Djemili R. Novel approach in speaker identification using SVM and GMM[J]. Control Engineering & Applied Informatics, 2013, 15(3):87-95.
[10]	Ding I J, Yen C T, Ou D C. A method to integrate GMM, SVM and DTW for speaker recognition[J]. International Journal of Engineering and Technology Innovation, 2014, 4(1):38-47.
[11]	Campbell W M, Sturim D E, Reynolds D A, et al. SVM based speaker verification using a GMM supervector kernel and NAP variability compensation[C]//2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. Brisbane, Australia:IEEE Press, 2006, 1:I-I.
[12]	Solomonoff A, Quillen C, Campbell W M. Channel compensation for SVM speaker recognition[C]//ICASSP 2005, Acoustics, Speech, and Signal Processing Proceedings. Philadelphia, PA, USA:IEEE Press, 2010:629-632."

[1]	赛牙热·依马木, 热依莱木·帕尔哈提, 艾斯卡尔·艾木都拉, 李志军. 基于不同关键词提取算法的维吾尔文本情感辨识[J]. 清华大学学报（自然科学版）, 2017, 57(3): 270-273.
[2]	杨殿阁, 何长伟, 李满, 何奇洸. 基于支持向量机的汽车转向与换道行为识别[J]. 清华大学学报（自然科学版）, 2015, 55(10): 1093-1097.
[3]	张超, 刘奕, 张辉, 黄弘. 基于支持向量机的城市燃气日负荷预测方法研究[J]. 清华大学学报（自然科学版）, 2014, 54(3): 320-325.