电子学报 ›› 2016, Vol. 44 ›› Issue (12): 2924-2931.DOI: 10.3969/j.issn.0372-2112.2016.12.016

• 学术论文 • 上一篇    下一篇

不相关匹配追踪的分段区分性特征变换方法

陈斌1,2, 牛铜1, 张连海1, 屈丹1, 李弼程1   

  1. 1. 解放军信息工程大学信息系统工程学院, 河南郑州 450001;
    2. 西南电子电信技术研究所上海分所, 上海 200434
  • 收稿日期:2015-05-17 修回日期:2015-11-24 出版日期:2016-12-25 发布日期:2016-12-25
  • 作者简介:陈斌,男,1987年生于江西萍乡.现为解放军信息工程大学信息系统工程学院博士研究生,西南电子电信技术研究所上海分所工程师.主要研究方向为连续语音识别、区分性训练和机器学习.E-mail:chenbin873335@163.com;牛铜,男,1982年生于河南郑州.现为解放军信息工程大学信息系统工程学院博士研究生.主要研究方向为语音识别和语音增强.E-mail:niutong0072@gmail.com
  • 基金资助:

    国家自然科学基金(No.61175017,No.61403415);国家高技术研究发展计划(863计划)课题(No.2012AA011603)

A Discriminative Segmental Feature Transform Method Based on Uncorrelated Matching Pursuit

CHEN Bin1,2, NIU Tong1, ZHANG Lian-hai1, QU Dan1, LI Bi-cheng1   

  1. 1. Institute of Information System Engineering, Information Engineering University, Zhengzhou, Henan 450001, China;
    2. Shanghai Branch of Southwest Electronics and Telecommunication Technology Research Institute, Shanghai 200434, China
  • Received:2015-05-17 Revised:2015-11-24 Online:2016-12-25 Published:2016-12-25

摘要:

为了提高基于分帧特征变换方法的稳定性,提出了一种基于分段的区分性特征变换方法.该方法将特征变换当成高维信号的稀疏逼近问题,采用状态绑定的方法训练得到基于域划分的线性变换矩阵(Region Dependent Linear Transform,RDLT)和基于最小音素错误准则均值补偿的特征(mean-offset feature Minimum Phone Error,m-fMPE)变换矩阵,将两者的特征变换矩阵构成过完备的字典;采用强制对齐的方式对语音信号进行分段,以似然度最大化作为目标函数,利用匹配追踪算法对目标函数迭代优化,自动地确定各语音信号段中的变换矩阵及其系数.为保证特征变换的稳定性,在选择变换矩阵过程中引入相关度测量,去除相关的特征基矢量.实验结果表明,相比于传统的RDLT方法,当声学模型分别采用最大似然和区分性准则训练时,识别性能分别可以提高1.63%和2.23%.该方法同时能应用于语音增强和模型区分性训练中.

关键词: 特征变换, 语音识别, 区分性训练, 语音增强, 匹配追踪

Abstract:

A discriminative segmental feature transform method is proposed to promote the stability of the frame based method.The feature transform is considered as the sparse high dimensional approximation problem.Firstly,a set of feature transform matrices are estimated by tied-state based training of RDLT (Region Dependent Linear Transform) and m-fMPE (mean-offset feature Minimum Phone Error),and the transform matrices are integrated into an over-complete dictionary.Then,the speech signal is segmented through force alignment.Finally,following the matching pursuit to optimize the likelihood objective function iteratively,the transform matrices of each segment are selected from the dictionary and the corresponding coefficients are automatic determined in the optimization process.Further,to guarantee the stability of the transform matrices,a correlation measurement is introduced to remove the correlated basis in the recurrence process.The experimental results show that,compared with the traditional RDLT method,when the acoustic model is trained with maximum likelihood and discriminative training criterion separately,the recognition performance can be improved by 1.63% and 2.23% respectively.The method can also be applied to speech enhancement and model discriminative training.

Key words: feature transform, speech recognition, discriminative training, speech enhancement, matching pursuit

中图分类号: