电子学报 ›› 2015, Vol. 43 ›› Issue (7): 1286-1293.DOI: 10.3969/j.issn.0372-2112.2015.07.006

• 学术论文 • 上一篇    下一篇

基于信号规整和稀疏变换的语音与音频分层编码方法

李晓明, 鲍长春, 贾懋   

  1. 北京工业大学电子信息与控制工程学院语音与音频信号处理研究室, 北京 100124
  • 收稿日期:2014-01-03 修回日期:2014-03-17 出版日期:2015-07-25
    • 作者简介:
    • 李晓明 男,1983年生,内蒙古赤峰人,北京工业大学博士研究生,主要研究方向为语音与音频编码. E-mail:lixiaoming@emails.bjut.edu.cn;鲍长春 男,1965年生,内蒙古赤峰人,博士,北京工业大学教授、博士生导师,IEEE 高级会员,国际语音通信学会(ISCA)会员,亚太信号与信息处理学会(APSIPA)会员,中国电子学会理事,中国声学学会理事,信号处理分会委员,《信号处理》和《数据采集与处理》编委.主要研究方向为语音与音频信号处理. E-mail:baochch@bjut.edu.cn
    • 基金资助:
    • 国家自然科学基金 (No.61072089,No.61201197); 北京市教委科技计划面上项目 (No.KM201310005008); 教育部博士学科点专项科研基金新教师基金 (No.20121103120017); 北京工业大学第12届研究生科技基金 (No.ykj-2013-9563)

The Layered Coding of Speech and Audio Signals Based on Signal Warp and Sparse Transform

LI Xiao-ming, BAO Chang-chun, JIA Mao-shen   

  1. Speech and Audio Signal Processing Laboratory, School of Electronic Information and Control Engineering, Beijing University of Technology, Beijing 100124, China
  • Received:2014-01-03 Revised:2014-03-17 Online:2015-07-25 Published:2015-07-25
    • Supported by:
    • National Natural Science Foundation of China (No.61072089, No.61201197); Science and Technology Project of Beijing Municipal Education Commission (No.KM201310005008); New Teacher Fund of Special Research Fund for Doctoral Programs of Ministry of Education (No.20121103120017); The 12th Graduate Science and Technology Foundation of Beijing University of Technology (No.ykj-2013-9563)

摘要:

基于语音和音频信号的固有周期性特征,本文构建了一种适合语音和音频信号的统一分析/合成模型,并分别在24kbps和32kbps码率下,实现了对宽带语音和音频信号的高质量分层编码.首先,本文将具有时变周期的输入信号规整为具有固定周期的信号,并对规整后的周期信号构建规整矩阵;其次,对规整矩阵的行和列分别进行调制叠接变换(MLT)和离散余弦变换(DCT),完成规整矩阵的稀疏化;最后,利用分带量化和矢量哈夫曼编码完成稀疏矩阵元素的量化和编码.主客观测试结果表明,本文所提方法的语音、音频及其混合信号的编码质量均优于同等速率下的ITU-T G.722.1和AMR-WB编码器.

关键词: 语音编码, 音频编码, 信号规整, 稀疏变换

Abstract:

Based on the periodic characteristics of speech and audio,a layered coding method by using uniform analysis and synthesis model is proposed in this paper.The constructed coder can perform equally well on speech and audio at the bit rates of 24kbps and 32kbps.First,the input signal which has time-varying period is warped into a constant period signal.Second,a sparse representation of the warped signal is achieved by applying the MLT and DCT on the warped matrix derived from the warped signal.Finally,the sub-band quantization and Huffman coding are applied on the transform coefficients.Both the objective PESQ/PEAQ results and the subjective A/B listening tests show that the proposed coder outperforms the ITU-T G.722.1 and AMR-WB codec.

Key words: speech coding, audio coding, signal warping, sparse transform

中图分类号: