电子学报 ›› 2021, Vol. 49 ›› Issue (1): 29-39.DOI: 10.12263/DZXB.20200644

• 学术论文 • 上一篇    下一篇

基于广义合成分析和深度神经网络的自回归系数估计方法

崔子豪, 鲍长春   

  1. 北京工业大学信息学部, 北京 100124
  • 收稿日期:2020-06-30 修回日期:2020-08-18 出版日期:2021-01-25 发布日期:2020-09-09
  • 通讯作者: 鲍长春
  • 作者简介:崔子豪 男,1991年生于云南昆明.现为北京工业大学博士研究生.主要研究方向为语音增强.E-mail:cuizihao@emails.bjut.edu.cn
  • 基金资助:
    国家自然科学基金(No.61831019,No.61471014)

Auto-Regressive Coefficient Estimation Based on the GABS and DNN

CUI Zi-hao, BAO Chang-chun   

  1. Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
  • Received:2020-06-30 Revised:2020-08-18 Online:2021-01-25 Published:2020-09-09
  • Supported by:
     

摘要: 自回归(AR)模型是一类描述时序序列相关性的有效方法,经典的AR系数估计方法对残差信号做了简单的假设,在噪声干扰等复杂场景中难以准确估计AR系数,而基于深度神经网络(DNN)的AR (DNN-AR)系数估计方法在训练中容易受到莱文逊-杜宾迭代(LDR)解法的数值稳定性的影响.为改善DNN-AR系数训练的稳定性和整体性能,在保证系统稳定性的前提下,本文利用精度转化提高系统运算速度的思路,提出了基于广义合成分析(GABS)模型的深度网络结构改善方法,提高了AR系数在含噪环境下估计的准确性和网络训练的稳定性.组合DNN的GABS (GABS-DNN)的模型由三个主要部分组成:修正器的谱增强网络、编码器的DNN预处理及LDR参数估计和解码器的AR系数到功率谱的转换.在优化目标函数的过程中,引入了增强谱和观测谱的误差,减少了反向传播时LDR的梯度对增强网络的影响,实现了稳定估计含噪语音的AR系数.

 

关键词: AR系数, 广义合成分析, 深度神经网络, 莱文逊-杜宾迭代解

Abstract: The auto-regressive (AR) model is an effective method to describe the correlation of time series.The classic AR coefficient estimation method utilizes a simple assumption about residual signal.It is a challenge to accurately estimate the auto-regressive coefficients in a complex environment such as noise or interference.Even though Deep Neural Networks (DNN)based AR (DNN-AR) coefficient estimation method can estimate the AR coefficients in a complex environment,the DNN-AR method is easily affected by the numerical stability of Levinson-Durbin recursion (LDR) approach during the training stage.The main target is to improve the stability and overall performance of the DNN-AR based method.In this paper,the precision transform method is utilized to improve computational efficiency while keeping system stability,and the generalized analysis-by-synthesis combing DNN (GABS-DNN) model is proposed for improving the accuracy of AR coefficient estimation and stability of the DNN training in the noisy environment.The GABS-DNN model consists of three main parts:spectrum enhancement network in the modifier,DNN preprocessing and LDR parameter estimation at the encoder,and the conversion from autoregressive coefficient to power spectrum at the decoder.In the process of optimizing the objective function,the error between the enhanced spectrum and the observed spectrum is added for reducing the influence of the gradient of the LDR on the enhanced network during back-propagation,which results in a stable estimation of the AR coefficients of noisy speech.

Key words: auto-regressive (AR) coefficients, generalized analysis-by-synthesis (GABS), deep neural networks (DNN), Levinson-Durbin recursion (LDR)

中图分类号: