基于语义的汉语普通话电子喉语音转换增强

钱兆鹏; 肖克晶; 刘蝉; 孙悦

doi:10.3969/j.issn.0372-2112.2020.05.002

您当前的位置：

首页 >

文章列表页 >

基于语义的汉语普通话电子喉语音转换增强

学术论文 | 更新时间：2025-07-16

- 基于语义的汉语普通话电子喉语音转换增强
- Voice Conversion for Enhancing Mandarin Electro-Laryngeal Speech Based on Semantic Information
- 电子学报 2020年48卷第5期页码：840-845
- 作者机构：
  
  1. 北京航空航天大学生物与医学工程学院,北京,100191
  2. 北京机械设备研究所,北京,100854
  4. 中国人民大学信息学院,北京,100872
- 作者简介：
- 基金信息：
  
  北京市自然科学基金 (No.4194079）;北京航空航天大学虚拟现实国家重点实验室开放课题 (No.VRLAB2018B06）;北京工商大学农产品质量安全追溯技术及应用国家工程实验室开放课题 (No.AQT-2018-YB4）
- DOI：10.3969/j.issn.0372-2112.2020.05.002
  中图分类号： TP391.42
- 网络出版：2020-05-25，
  
  纸质出版：2020
- 稿件说明：
移动端阅览
钱兆鹏, 肖克晶, 刘蝉, 等. 基于语义的汉语普通话电子喉语音转换增强[J]. 电子学报, 2020,48(5):840-845.

QIAN Zhao-peng, XIAO Ke-jing, LIU Chan, et al. Voice Conversion for Enhancing Mandarin Electro-Laryngeal Speech Based on Semantic Information[J]. Acta Electronica Sinica, 2020, 48(5): 840-845.
钱兆鹏, 肖克晶, 刘蝉, 等. 基于语义的汉语普通话电子喉语音转换增强[J]. 电子学报, 2020,48(5):840-845. DOI： 10.3969/j.issn.0372-2112.2020.05.002.

QIAN Zhao-peng, XIAO Ke-jing, LIU Chan, et al. Voice Conversion for Enhancing Mandarin Electro-Laryngeal Speech Based on Semantic Information[J]. Acta Electronica Sinica, 2020, 48(5): 840-845. DOI： 10.3969/j.issn.0372-2112.2020.05.002.

摘要

电子喉语音存在基频单一、发声机械、辐射噪声大等多种缺陷，这严重影响了电子喉语音可懂度和自然度，特别是对汉语普通话之类的声调语言，问题尤其严重.汉语普通话电子喉语音识别存在辅音混淆的问题并且识别结果没有声调，因此本文在识别结果的基础之上设计了拼音拼写修正器和声调标注工具，再结合基于Tacotron-2的TTS实现了电子喉语音向正常语音的转换.客观评价实验结果表明，拼音拼写修正器可以提高拼音准确率，声调标注在有上下文的语义环境中具有较高准确率.主观听力测试结果表明，本文所提方法在不同语言水平上提高了汉语普通话电子喉语音的可懂度和自然度.研究结果表明，本文设计的方法可以将不带声调的电子喉语音转换为正常语音，相比于传统语音转换方法具有更高的性能.

Abstract

The Electro-Laryngeal (EL) speech has some drawbacks such as single fundamental frequency

mechanical sound and large radiation noise. The drawbacks affect the intelligibility and naturalness of the EL speech. Especially

the tonal language such as Mandarin EL speech would be worse understanding. In this paper

the spelling corrector for pinyin and the tone labelling tool are designed to solve the problems that Mandarin EL speech recognition has some errors in consonants and the recognition result has no tone. The result is synthesized into the healthy speech by TTS based on Tacotron-2. The objective evaluation results show that the accuracy of pinyin spelling corrector has been improved; the accuracy of tone labelling under contextual environment is very high. The subjective results shows the proposed method can improve the intelligibility and naturalness of the EL speech a lot. The results illustrate that the proposed method can convert the EL speech without tone into the healthy speech. And the proposed method performs better than the traditional method based on speech signal processing.

关键词

Keywords

references

浏览量

107

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于对抗学习和增强优化的深度转换语音还原方法

采用双重交换表示分离的任意说话人语音转换

约束条件下的结构化高斯混合模型及非平行语料语音转换