电子学报 ›› 2018, Vol. 46 ›› Issue (10): 2359-2366.DOI: 10.3969/j.issn.0372-2112.2018.10.008

• 学术论文 • 上一篇    下一篇

一种融合相位估计的深度卷积神经网络语音增强方法

袁文浩, 梁春燕, 夏斌, 孙文珠   

  1. 山东理工大学计算机科学与技术学院, 山东淄博 255000
  • 收稿日期:2017-10-21 修回日期:2018-02-20 出版日期:2018-10-25
    • 通讯作者:
    • 袁文浩
    • 作者简介:
    • 梁春燕,女.1986年出生,山东淄博人.2014年毕业于中国科学院声学研究所获博士学位,现为山东理工大学计算机科学与技术学院讲师.主要研究方向为语音信号处理,说话人识别.E-mail:liangchunyan@sdut.edu.cn
    • 基金资助:
    • 国家自然科学基金 (No.61701286,No.11704229); 山东省自然科学基金 (No.ZR2015FL003,No.ZR2017MF047,No.ZR2017LA011)

A Deep Convolutional Neural Network Based Speech Enhancement Approach Incorporating Phase Estimation

YUAN Wen-hao, LIANG Chun-yan, XIA Bin, SUN Wen-zhu   

  1. College of Computer Science and Technology, Shandong University of Technology, Zibo, Shandong 255000, China
  • Received:2017-10-21 Revised:2018-02-20 Online:2018-10-25 Published:2018-10-25
    • Corresponding author:
    • YUAN Wen-hao

摘要: 在时频域的语音增强中,幅度估计和相位估计都是影响语音增强性能的重要因素.为了在基于深度学习的语音增强方法中融合对相位的估计,本文将含噪语音短时傅里叶变换(STFT)的实部和虚部特征作为两个通道输入深度卷积神经网络,通过建立一个同步估计纯净语音STFT的实部和虚部特征的多任务学习模型,实现了对幅度和相位的同步估计.实验结果表明,相比仅考虑幅度估计的方法,本文方法具有更好的噪声抑制能力,在低信噪比条件下,显著提高了语音增强性能.

关键词: 语音增强, 相位估计, 幅度估计, 深度卷积神经网络

Abstract: In the speech enhancement of the time-frequency domain, both the amplitude estimation and the phase estimation are the important factors that affect speech enhancement performance. In order to incorporate the phase estimation into the speech enhancement approaches based on deep learning, the real and imaginary part of the short-time Fourier transform (STFT) of noisy speech are treated as two channels and fed into the deep convolutional neural network (DCNN) in this paper. By establishing a multi-task learning model which simultaneously estimates the real and imaginary part of the STFT of clean speech, the synchronous estimation of the amplitude and phase is achieved. Experimental results show that compared with the approaches only considering the amplitude estimation, the proposed approach has better noise suppression ability, and improves speech enhancement performance significantly under the condition of low SNR.

Key words: speech enhancement, phase estimation, amplitude estimation, deep convolutional neural network

中图分类号: