基于贝叶斯主成分分析的i-vector说话人确认方法

肜娅峰; 陈晨; 陈德运; 何勇军

doi:10.12263/DZXB.20200476

您当前的位置：

首页 >

文章列表页 >

基于贝叶斯主成分分析的i-vector说话人确认方法

学术论文 | 更新时间：2025-12-08

- 基于贝叶斯主成分分析的i-vector说话人确认方法
- Bayesian Principal Component Analysis for I-Vector Speaker Verification
- 电子学报 2021年49卷第11期页码：2186-2194
- 作者机构：
  
  1.哈尔滨理工大学计算机科学与技术学院，黑龙江哈尔滨 150080
  2.哈尔滨理工大学计算机科学与技术博士后流动站，黑龙江哈尔滨 150080
- 作者简介：
  
  [ "肜娅峰女，1997年出生.哈尔滨理工大学计算机科学与技术学院硕士研究生.主要研究方向为说话人识别、语音信号处理等. E-mail:rongyafeng908@163.com" ]
  [ "陈晨女，1990年出生.哈尔滨理工大学计算机科学与技术学院讲师、博士后、硕士生导师.主要研究方向为语音信号处理、音频信息分析、说话人识别等. E-mail:chenc@hrbust.edu.cn" ]
  [ "陈德运（通讯作者）男，1962年出生.哈尔滨理工大学计算机科学与技术学院教授、博士生导师.主要研究方向为模式识别、机器学习等. E-mail:chendeyun@hrbust.edu.cn" ]
  [ "何勇军男，1980年出生.哈尔滨理工大学计算机科学与技术学院教授、博士生导师.主要研究方向为语音信号处理、图像处理等. E-mail:holywit@163.com" ]
- 基金信息：
  
  国家自然科学基金(62101163;61673142);黑龙江省自然科学基金(JJ2019JQ0013);黑龙江省博士后专项经费(LBH-Z20020);黑龙江省普通高校基本科研业务费专项资金(2020-KYYWF-0341);哈尔滨市杰出青年人才基金(2017RAYXJ013)
- DOI：10.12263/DZXB.20200476
  中图分类号： TP391.4;
- 收稿：2020-05-19，
  
  修回：2020-11-09，
  
  纸质出版：2021-11-25
- 稿件说明：
移动端阅览
肜娅峰,陈晨,陈德运等.基于贝叶斯主成分分析的i-vector说话人确认方法[J].电子学报,2021,49(11):2186-2194.

RONG Ya-feng,CHEN Chen,CHEN De-yun,et al.Bayesian Principal Component Analysis for I-Vector Speaker Verification[J].ACTA ELECTRONICA SINICA,2021,49(11):2186-2194.
肜娅峰,陈晨,陈德运等.基于贝叶斯主成分分析的i-vector说话人确认方法[J].电子学报,2021,49(11):2186-2194. DOI： 10.12263/DZXB.20200476.

RONG Ya-feng,CHEN Chen,CHEN De-yun,et al.Bayesian Principal Component Analysis for I-Vector Speaker Verification[J].ACTA ELECTRONICA SINICA,2021,49(11):2186-2194. DOI： 10.12263/DZXB.20200476.

摘要

身份-矢量（identity-vector

i-vector）方法作为说话人确认领域中的主流方法之一，能够通过学习总变化空间来获取有效的低维说话人特征——i-vector特征.但是当开发集数据不充足时，会导致学习到的总变化空间模型误差较大；同时，还无法有效确认此时的总变化空间是否因为预先设置的维度过高而学到了冗余信息.为此，本文将贝叶斯主成分分析（Bayesian Principal Component Analysis

BPCA）引入总变化空间的学习过程中，利用其来为总变化空间引入更多的先验信息，从而对开发集数据中包含的信息进行补充，并在先验信息的约束下削弱总变化空间中无效维的影响.实验结果表明，当开发集数据不充足时，相比于传统的总变化空间学习方法，BPCA方法能够有效提升说话人确认系统的识别性能.

Abstract

As one of the most important methods in speaker verification

the identity-vector (i-vector) approach can obtain effective low-dimensional i-vector by learning the total variability space (TVS). However

when there is no sufficient development data

it will lead to a large error in the learned TVS model. Meanwhile

it is difficult to determine whether there is redundancy in the learned TVS due to the high preset dimension. To solve the above problems

the Bayesian principal component analysis (BPCA) is introduced into the learning of the TVS. And this proposed method can introduce more prior information into the TVS to supply more information. Additionally

under the constraint of prior information

the influence of invalid dimension in the TVS can be weakened. The experimental results show that when the development data is insufficient

the BPCA method can effectively improve the performance compared with the traditional TVS learning methods.

关键词

Keywords

references

Dehak N , Kenny P J , Dehak R , et al . Front-end factor analysis for speaker verification [J]. IEEE Transactions on Audio, Speech, and Language Processing , 2011 , 19 ( 4 ): 788 － 798 .

Campbell W M , Sturim D E , Reynolds D A . Support vector machines using GMM supervectors for speaker verification [J]. IEEE Signal Processing Letters , 2006 , 13 ( 5 ): 308 － 311 .

Vestman V , Kinnunen T . Supervector compression strategies to speed up i-vector system development [A]. The Speaker and Language Recognition Workshop (Odyssey 2018)[C] Les Sables d'Olonne , France : ISCA , 2018 . 357 － 364 .

Lei Z C , Yang Y C . Maximum likelihood i-vector space using PCA for speaker verification [A]. Proceedings of Twelfth Annual Conference of the International Speech Communication Association [C]. Florence, Italy : ISCA , 2011 . 2725 － 2728 .

Tipping M E , Bishop C M . Probabilistic principal component analysis [J]. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 1999 , 61 ( 3 ): 611 － 622 .

Lei Y , Hansen J H L . Speaker recognition using supervised probabilistic principal component analysis [A]. Proceedings of Eleventh Annual Conference of the International Speech Communication Association [C]. Makuhari, Japan : ISCA , 2010 . 382 － 385 .

Chen C , Han J Q , Pan Y L . Speaker verification via estimating total variability space using probabilistic partial least squares [A]. Proceedings of Eighteenth Annual Conference of the International Speech Communication Association [C]. Stockholm, Sweden : ISCA , 2017 . 1537 － 1541 .

Chen C , Han J Q . Partial least squares based total variability space modeling for I-vector speaker verification [J]. Chinese Journal of Electronics , 2018 , 27 ( 6 ): 1229 － 1233 .

韩纪庆 , 张磊 , 郑铁然 . 语音信号处理(第3版) [M]. 北京 : 清华大学出版社 , 2019 .

Han J Q , Zhang L , Zheng T R . Speech Signal Processing(3rd ed) [M]. Beijing, China : Tsinghua University Press , 2019 . (in Chinese)

Fisher R A . The use of multiple measurements in taxonomic problems [J]. Annals of Eugenics , 1936 , 7 ( 2 ): 179 － 188 .

Hatch A O , Kajarekar S , Stolcke A . Within-class covariance normalization for SVM-based speaker recognition [A]. Proceedings of Ninth International Conference on Spoken Language Processing [C]. Pittsburgh, Pennsylvania,USA : ISCA , 2006 . 1471 － 1474 .

Campbell W M , Sturim D E , Reynolds D A , et al . SVM based speaker verification using a GMM supervector kernel and NAP variability compensation [A]. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing [C]. Toulouse, France : IEEE , 2006 . 97 － 100 .

Prince S J D , Elder J H . Probabilistic linear discriminant analysis for inferences about identity [A]. Proceedings of IEEE International Conference on Computer Vision [C]. Rio de Janeiro, Brazil : IEEE , 2007 . 1 － 8 .

Variani E , Lei X , McDermott E , et al . Deep neural networks for small footprint text-dependent speaker verification [A]. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing [C]. Florence, Italy : IEEE , 2014 . 4052 － 4056 .

Nyder D , Garcia-Romero D , Povey D , et al . Deep neural network embeddings for text-independent speaker verification [A]. Proceedings of Eighteenth Annual Conference of the International Speech Communication Association [C]. Stockholm, Sweden : ISCA , 2017 . 999 － 1003 .

Snyder D , Garcia-Romero D , Sell G , et al . X-vectors: robust DNN embeddings for speaker recognition [A]. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing [C]. Calgary, AB, Canada : IEEE , 2018 . 5329 － 5333 .

LeCun Y , Boser B , Denker J S , et al . Backpropagation applied to handwritten zip code recognition [J]. Neural Computation , 1989 , 1 ( 4 ): 541 － 551 .

Gu B , Guo W , Dai L R , et al . An improved deep neural network for modeling speaker characteristics at different temporal scales [A]. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing [C]. Barcelona, Spain : IEEE , 2020 . 6814 － 6818 .

Hong Q B , Wu C H , Wang H M , et al . Statistics pooling time delay neural network based on x-vector for speaker verification [A]. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing [C]. Barcelona, Spain : IEEE , 2020 . 6849 － 6853 .

Chatfield K , Simonyan K , Vedaldi A , et al . Return of the devil in the details: Delving deep into convolutional nets [A]. Proceedings of the British Machine Vision Conference [C]. Nottingham, UK : BMVA , 2014 . 1 － 12 .

Nagrani A , Chung J S , Zisserman A . VoxCeleb: A large-scale speaker identification dataset [A]. Proceedings of Eighteenth Annual Conference of the International Speech Communication Association [C]. Stockholm, Sweden : ISCA , 2017 . 2616 － 2620 .

He K , Zhang X , Ren S , et al . Deep residual learning for image recognition [A]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition [C]. LasVegas, NV, USA : IEEE , 2016 . 770 － 778 .

Chung J S , Nagrani A , Zisserman A . Voxceleb2: deep speaker recognition [A]. Proceedings of Nineteenth Annual Conference of the International Speech Communication Association [C]. Hyderabad, India : ISCA , 2018 . 1086 － 1090 .

Bishop C M . Bayesian PCA [J]. Advances in Neural Information Processing Systems , 1999 , 11 ( 2 ): 382 － 388 .

Bishop C M . Machine Learning and Pattern Recognition [M]. New York, USA : Springer , 2006 .

Jankowski C , Kalyanswamy A , Basson S , et al . NTIMIT: A phonetically balanced, continuous speech, telephone bandwidth speech database [A]. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing [C]. Albuquerque, NM, USA : IEEE , 1990 . 109 － 122 .

蒋晔 , 唐振民 . 短语音说话人辨认的研究 [J]. 电子学报 , 2011 , 39 ( 4 ): 953 － 957 .

Jiang Y , Tang Z M . Research on the speaker identification based on short utterance [J]. Acta Electronica Sinica , 2011 , 39 ( 4 ): 953 － 957 . (in Chinese)

张二华 , 王明合 , 唐振民 . 加性噪声条件下鲁棒说话人确认 [J]. 电子学报 , 2019 , 47 ( 6 ): 1244 － 1250 .

Zhang E H , Wang M H , Tang Z M . Robust speaker verification under additive noise condition [J]. Acta Electronica Sinica , 2019 , 47 ( 6 ): 1244 － 1250 . (in Chinese)

Garcia-Romero D , Espy-Wilson C Y . Analysis of i-vector length normalization in speaker recognition systems [A]. Proceedings of Twelfth Annual Conference of the International Speech Communication Association [C]. Florence, Italy : IEEE , 2011 . 249 － 252 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于说话人分类技术的分级说话人识别研究

加性噪声条件下鲁棒说话人确认

用神经计算机的说话人确认系统及其应用