删除或更新信息,请邮件至freekaoyan#163.com(#换成@)

基于预训练语言表示模型的汉语韵律结构预测\r\n\t\t

本站小编 Free考研考试/2022-01-16

\r张鹏远1, 2,卢春晖1, 2,王睿敏\r1, 2\r
\r
AuthorsHTML:\r张鹏远1, 2,卢春晖1, 2,王睿敏\r1, 2\r
\r
AuthorsListE:\rZhang Pengyuan 1, 2,Lu Chunhui1, 2,Wang Ruimin\r1, 2\r
\r
AuthorsHTMLE:\rZhang Pengyuan 1, 2,Lu Chunhui1, 2,Wang Ruimin\r1, 2\r
\r
Unit:\r1. 中国科学院声学研究所语言声学与内容理解重点实验室,北京 100190;
2. 中国科学院大学电子电器与通信工程学院,北京 100049\r
\r
Unit_EngLish:\r\r1. Key Laboratory of Speech Acoustics and Content Understanding,Institute of Acoustics,Chinese Academy of Sciences,Beijing 100190,China;\r
\r\r2. School of Electronic,\rElectrical and Communication Engineering,University of Chinese Academy of Sciences,Beijing 100049,China\r
\r
Abstract_Chinese:\r\r韵律结构预测作为语音合成系统中的一个关键步骤,其结果直接影响合成语音的自然度和可懂度.本文提出了一种基于预训练语言表示模型的韵律结构预测方法,以字为建模单位,在预训练语言模型的基础上对每个韵律层级设置了独立的输出层,利用韵律标注数据对预训练模型进行微调.另外在此基础上额外增加了分词任务,通过多任务学习的方法对各韵律层级间的关系及韵律与词间的关系建模,实现对输入文本各级韵律边界的同时预测.实验首先证明了多输出结构设置的合理性及使用预训练模型的有效性,并验证了分词任务的加入可以进一步提升模型性能;将最优的结果与设置的两个基线模型相比,在韵律词和韵律短语预测的\r\rF\r\r\r1\r\r值上与条件随机场模型相比分别有\r2.48\r%\r和\r4.50\r%\r的绝对提升,而与双向长短时记忆网络相比分别有\r6.2\r%\r和\r5.4\r%\r的绝对提升;最后实验表明该方法可以在保证预测性能的同时减少对训练数据量的需求.\r\r
\r
Abstract_English:\r\rProsodic structure prediction is an indispensable step in the text-to-speech system\r,\rand its results directly influence the naturalness and intelligibility of synthesized speech\r.\rIn this study\r,\ra prosodic structure prediction method based on a pretrained language representation model was proposed\r.\rOn the basis of the pretrained language representation model\r,\ra separate output layer was set for each prosody level\r,\rwith character as the modeling unit\r.\rThen\r,\rthe model was fine-tuned with prosody labeled data\r.\rTo achieve the simultaneous prediction of different prosodic levels in input text\r,\ra word segmentation task was additionally introduced and the multitask learning method was used to model the relationship between the multilevel prosody and lexicon words\r.\rThe experimental results prove the rationality of a multi-output structure and the effectiveness of using a pretrained language representation model and verify that adding the word segmentation task can further improve model performance\r.\rWhen comparing the best result to the baseline conditional random field model\r,\rsignificant improvements of 2.48\r%\r and 4.50\r%\r were observed for the F1 scores of prosodic word prediction and prosodic phrase prediction\r,\rrespectively\r.\rBy contrast\r,\rwhen comparing the best result to the baseline bidirectional long short-term memory model\r,\rmore significant improvements of 6.2\r%\r and 5.4\r%\r were observed for the F1 scores of prosodic word prediction and prosodic phrase prediction\r,\rrespectively\r.\rFinally\r,\rthe experiments show that the proposed method considerably reduces the demand for training data while maintaining an excellent prediction performance\r.\r\r
\r
Keyword_Chinese:韵律结构预测;预训练语言表示模型;多任务学习;语音合成\r

Keywords_English:prosodic structure prediction;pretrained language representation model;multitask learning;speech synthesis\r


PDF全文下载地址:http://xbzrb.tju.edu.cn/#/digest?ArticleID=6421
相关话题/结构 汉语