In the speech enhancement of the time-frequency domain
both the amplitude estimation and the phase estimation are the important factors that affect speech enhancement performance. In order to incorporate the phase estimation into the speech enhancement approaches based on deep learning
the real and imaginary part of the short-time Fourier transform (STFT) of noisy speech are treated as two channels and fed into the deep convolutional neural network (DCNN) in this paper. By establishing a multi-task learning model which simultaneously estimates the real and imaginary part of the STFT of clean speech
the synchronous estimation of the amplitude and phase is achieved. Experimental results show that compared with the approaches only considering the amplitude estimation
the proposed approach has better noise suppression ability
and improves speech enhancement performance significantly under the condition of low SNR.