删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

THUYG-20:免费的维吾尔语语音数据库

清华大学 辅仁网/2017-07-07

THUYG-20:免费的维吾尔语语音数据库
艾斯卡尔·肉孜1, 殷实1, 张之勇1, 王东1, 艾斯卡尔·艾木都拉2, 郑方1
1. 清华大学 计算机科学与技术系, 清华信息科学技术国家实验室, 信息技术研究院, 北京 100084;
2. 新疆大学 信息科学与工程学院, 乌鲁木齐 830046
THUYG-20: A free Uyghur speech database
Aisikaer Rouzi1, YIN Shi1, ZHANG Zhiyong1, WANG Dong1, Askar Hamdulla2, ZHENG Fang1
1. Research Institute of Information Technology, Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China;
2. School of Information Science and Engineering, Xinjiang University, Urumqi 830046, China

摘要:

输出: BibTeX | EndNote (RIS)
摘要语音数据资源是语音识别研究的基础。当前国内只有为数不多的开放的语音数据库供研究者免费使用,特别是在维吾尔语等少数民族语音识别方面,数据资源更为贫乏。该文发布一个完全免费的维吾尔语连续语音数据库,该数据库包括约20 h的训练数据和1 h的测试数据,同时介绍了构建维吾尔语语音识别系统所需要的音素集、词表、文本数据等相关资源,以及用于构建基线系统的脚本。给出了该基线系统在纯净测试数据和噪声测试数据上的识别性能。该数据库为维吾尔语语音识别研究提供了可以借鉴的标准数据库。
关键词 语音识别,维吾尔语,语料库,深度神经网络(DNN)
Abstract:Speech data plays a fundamental role in research on speech recognition. However, there are few open speech databases available for researchers in China, especially for minor languages such as Uyghur. This paper develops a Uyghur continuous speech database which is totally open and free. The database consists of 20 h of training speech and 1 h of test speech, as well as all the resources needed to construct a full Uyghur speech recognition system, including a phone set, lexicon, and text data. A recipe used to construct the baseline system is also described with results for two test sets involving clean speech and noisy speech. This paper provides a standard database for Uyghur speech recognition.
Key wordsspeech recognitionUyghur languagecorpusdeep neural network (DNN)
收稿日期: 2016-06-24 出版日期: 2017-02-21
ZTFLH:TP391.4
通讯作者:郑方,教授,E-mail:fzheng@tsinghua.edu.cnE-mail: fzheng@tsinghua.edu.cn
引用本文:
艾斯卡尔·肉孜, 殷实, 张之勇, 王东, 艾斯卡尔·艾木都拉, 郑方. THUYG-20:免费的维吾尔语语音数据库[J]. 清华大学学报(自然科学版), 2017, 57(2): 182-187.
Aisikaer Rouzi, YIN Shi, ZHANG Zhiyong, WANG Dong, Askar Hamdulla, ZHENG Fang. THUYG-20: A free Uyghur speech database. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 182-187.
链接本文:
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2017.22.012 http://jst.tsinghuajournals.com/CN/Y2017/V57/I2/182


图表:
表1 THUYGG20语音语料库参数
表2 THUYGG20文本语料库参数
图1 DNNGHMM 模型框架图
表3 两种语言模型在TESTGA 上的识别结果
表4 基线系统在TESTGN 上的识别结果
表5 基线系统加噪训练后在TESTGN 上的识别结果


参考文献:
[1] 王昆仑, 樊志锦, 吐尔洪江, 等. 维吾尔语综合语音数据库系统[C]//第五届全国人机语音通讯学术会议. 哈尔滨, 1998:366-368.WANG Kunlun, FAN Zhijin, Turhunjan, et al. Integrated speech corpus system of Uyghur language[C]//The 5th National Conference on Man-Machine Speech Communication. Harbin, China, 1998:366-368. (in Chinese)
[2] 蔡琴, 吾守尔·斯拉木. 基于HTK的维吾尔语连续数字语音识别[J]. 现代计算机, 2007(4):14-16.CAI Qin, Wushour Silamu. Uighur continuous digital speech recognition based on HTK[J]. Modern Computer, 2007(4):14-16. (in Chinese)
[3] 那斯尔江·吐尔逊, 吾守尔·斯拉木, 陶梅. 基于HTK的维吾尔语连续语音识别研究[C]//第7届中文信息处理国际会议. 武汉, 2007.Nasirjan Tursun, Wushour Silamu, TAO Mei. Research of Uyghur continuous speech recognition based on HTK[C]//The 7th Conference on Chinese Information Processing. Wuhan, China, 2007. (in Chinese)
[4] 努尔麦麦提·尤鲁瓦斯, 吾守尔·斯拉木, 热依曼·吐尔逊. 基于音节的维吾尔语大词汇连续语音识别系统[J]. 清华大学学报:自然科学版, 2013, 53(6):741-744.Nurmemet Yolwas, Wushor Silamu, Reyiman Tursun. Syllable based language model for large vocabulary continuous speech recognition of Uyghur[J]. Journal of Tsinghua University:Science and Technology, 2013, 53(6):741-744. (in Chinese)
[5] Nasirjan Tursun, Wushour Silamu. Large vocabulary continuous speech recognition in Uyghur:Data preparation and experimental results[C]//Chinese Spoken Language Processing. Kunming, China, 2008:1-4.
[6] 张小燕, 宿建军, 薛化建, 等. 维吾尔语语音识别语料库中的OOV研究[J]. 计算机工程与设计, 2012, 33(2):772-776.ZHANG Xiaoyan, SU Jianjun, XUE Huajian, et al. Research on OOV problem in constructing Uyghur speech corpus[J]. Computer Engineering and Design, 2012, 33(2):772-776. (in Chinese)
[7] 王昆仑. 维吾尔语音节语音识别与识别基元的研究[J]. 计算机科学, 2003, 30(7):182-184.WANG Kunlun. A study of Uighur syllable speech recognition and the base element of the recognition[J]. Computer Science, 2003, 30(7):182-184. (in Chinese)
[8] 王昆仑. 基于CDCPM的维吾尔语非特定人语音识别[J]. 计算机研究与发展, 2001, 38(10):1242-1246.WANG Kunlun. Uighur speaker independent speech recognition based on CDCPM[J]. Journal of Computer Research & Development, 2001, 38(10):1242-1246. (in Chinese)
[9] 努尔麦麦提·尤鲁瓦斯, 吾守尔·斯拉木, 热依曼·吐尔逊. 维吾尔语大词汇语音识别系统识别单元研究[J]. 北京大学学报:自然科学版, 2014, 50(1):149-152.Nurmemet Yolwas, Wushour Silamu, Reyiman Tursun. Research on recognition units of large vocabulary speech recognition system of Uyghur[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2014, 50(1):149-152. (in Chinese)
[10] 努尔麦麦提·尤鲁瓦斯, 吾守尔·斯拉木. 维吾尔语连续语音识别声学模型优化研究[J]. 计算机工程与应用, 2013, 49(2):145-147.Nurmemet Yolwas, Wushour Silamu. Optimization of acoustic model for Uyghur continuous speech recognition[J]. Computer Engineering and Applications, 2013, 49(2):145-147. (in Chinese)
[11] Wushour Silamu, Nasirjan Tursun. HMM-based Uyghur continuous speech recognition system[C]//World Congress on Computer Science and Information Engineering. Los Angeles, CA, USA, 2009:243-247.
[12] 那斯尔江·吐尔逊, 吾守尔·斯拉木. 基于隐马尔可夫模型的维吾尔语连续语音识别系统[J]. 计算机应用, 2009, 29(2):2009-2011, 2025.Nasirjan Tursun, Wushour Silamu. Uyghur continuous speech recognition system based on HMM[J]. Computer Application, 2009, 29(2):2009-2011, 2025. (in Chinese)
[13] 陶梅, 吾守尔·斯拉木, 那斯尔江·吐尔逊. 基于HTK的维吾尔语连续语音声学建模[J]. 中文信息学报, 2008, 22(5):56-59.TAO Mei, Wushour Silamu, Nasirjan Tursun. The Uyghur acoustic model based on HTK[J]. Journal of Chinese Information Processing, 2008, 22(5):56-59. (in Chinese)
[14] 杨雅婷, 马博, 王磊, 等. 多发音字典在维吾尔语方言语音识别中的应用[J].清华大学学报:自然科学版, 2011, 51(9):1303-1306.YANG Yating, MA Bo, WANG Lei, et al. Multi-pronunciation dictionary based on Uyghur accent modeling for speech recognition[J]. Journal of Tsinghua University:Science and Technology, 2011, 51(9):1303-1306. (in Chinese)
[15] 杨雅婷, 马博, 王磊, 等. 维吾尔语语音识别中发音变异现象[J].清华大学学报:自然科学版, 2011, 51(9):1230-1233, 1238.YANG Yating, MA Bo, WANG Lei, et al. Uyghur pronunciation variations in automatic speech recognition systems[J]. Journal of Tsinghua University:Science and Technology, 2011, 51(9):1230-1233, 1238. (in Chinese)
[16] Mijit Ablimit, Neubig G, Mimura M. Uyghur morpheme-based language models and ASR[C]//Proceeding of ICSP. Beijing, China, 2010:581-584.
[17] Mijit Ablimit, Askar Hamdulla, Kawahara T. Morpheme concatenation approach in language modeling for large-vocabulary Uyghur speech recognition[C]//Oriental COCOSDA. Hsinchu, China, 2011:112-115.
[18] Mijit Ablimit, Kawahara T, Askar Hamdulla. Lexicon optimization for automatic speech recognition based on discriminative learning[C]//APSIPA SC. Xi'an, China, 2011:935-938.
[19] Mijit Ablimit, Kawahara T, Askar Hamdulla. Discriminative approach to lexical entry selection for automatic speech recognition of agglutinative language[C]//ICASSP. Kyoto, Japan, 2012:5009-5012.
[20] Mijit Ablimit, Kawahara T, Askar Hamdulla. Lexicon optimization based on discriminative learning for automatic speech recognition of agglutinative language[J]. Speech Communication, 2014, 60:78-87.
[21] 薛化建, 董兴华, 周喜, 等. 基于子字单元的维吾尔语语音识别研究[J]. 计算机工程, 2011, 37(20):208-210.XUE Huajian, DONG Xinghua, ZHOU Xi, et al. Research on Uyghur speech recognition based on subword unit[J]. Computer Engineering, 2011, 37(20):208-210. (in Chinese)
[22] LI Xin, CAI Shang, PAN Jielin. Large vocabulary Uyghur continuous speech recognition based on stems and suffixes[C]//Chinese Spoken Language Processing (ISCSLP). Tainan, China, 2010:220-223.
[23] 米日古力·阿布都热素, 艾克白尔·帕塔尔, 艾斯卡尔·艾木都拉. 基于电话语料的维吾尔连续音素识别[J]. 通信技术, 2012, 45(7):54-56.Mirigul Abdursul, Akbar Pattar, Askar Hamdulla. Telephone speech corpus-based Uyghur continuous phoneme recognition[J]. Communication Technology, 2012, 45(7):54-56. (in Chinese)
[24] Povey D, Ghoshal A, Boulianne G, et al. The Kaldi speech recognition toolkit[C]//Proc of ASRU. Waikoloa, HI, USA, 2011.
[25] YIN Shi, LIU Chao, ZHANG Zhiyong, et al. Noisy training for deep neural networks in speech recognition[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2015, 2015(1):1-14.


相关文章:
[1]赛牙热·依马木, 热依莱木·帕尔哈提, 艾斯卡尔·艾木都拉, 李志军. 基于不同关键词提取算法的维吾尔文本情感辨识[J]. 清华大学学报(自然科学版), 2017, 57(3): 270-273.
[2]米吉提·阿不里米提, 艾克白尔·帕塔尔, 艾斯卡尔·艾木都拉. 基于层次化结构的语言模型单元集优化[J]. 清华大学学报(自然科学版), 2017, 57(3): 257-263.
[3]热合木·马合木提, 于斯音·于苏普, 张家俊, 宗成庆, 艾斯卡尔·艾木都拉. 基于模糊匹配与音字转换的维吾尔语人名识别[J]. 清华大学学报(自然科学版), 2017, 57(2): 188-196.
[4]张鹏远, 计哲, 侯炜, 金鑫, 韩卫生. 小资源下语音识别算法设计与优化[J]. 清华大学学报(自然科学版), 2017, 57(2): 147-152.
[5]阿不都萨拉木·达吾提, 于斯音·于苏普, 艾斯卡尔·艾木都拉. 类别区分词与情感词典相结合的维吾尔文句子情感分类[J]. 清华大学学报(自然科学版), 2017, 57(2): 197-201.
[6]王建荣, 张句, 路文焕, 魏建国, 党建武. 机器人自身噪声环境下的自动语音识别[J]. 清华大学学报(自然科学版), 2017, 57(2): 153-157.
[7]哈里旦木·阿布都克里木, 程勇, 刘洋, 孙茂松. 基于双向门限递归单元神经网络的维吾尔语形态切分[J]. 清华大学学报(自然科学版), 2017, 57(1): 1-6.
[8]邢安昊, 张鹏远, 潘接林, 颜永红. 基于SVD的DNN裁剪方法和重训练[J]. 清华大学学报(自然科学版), 2016, 56(7): 772-776.

相关话题/数据库 计算机 资源 系统 数据