大连工业大学信息科学与工程学院,辽宁大连 116034
[ "廉筱峪 男,2001年7月出生于辽宁省抚顺市.现为大连工业大学硕士研究生.主要研究方向为语音信号处理. E-mail: 709242393@qq.com" ]
[ "夏楠 男,1983年5月出生于辽宁省大连市.2013年在大连理工大学获工学博士学位,其后在国家无线电监测中心从事无线电监测定位研究工作,高级工程师,现为大连工业大学信息科学与工程学院副教授.主要研究方向为阵列信号处理、语音信号处理等. E-mail: xia_nan0520@aliyun.com" ]
[ "戴高乐 男,2002年11月出生于浙江省宁波市,现为大连工业大学本科生.主要研究方向为语音信号处理. E-mail: 2050491891@qq.com" ]
[ "杨红琴 女,2000年12月出生于云南省昆明市,现为大连工业大学硕士研究生.主要研究方向为语音信号处理、语音情感识别. E-mail: 1909847594@qq.com" ]
收稿:2023-09-26,
修回:2024-01-28,
纸质出版:2024-04-25
移动端阅览
廉筱峪,夏楠,戴高乐,等. 复杂噪声环境下基于轻量化模型的车内交互语音增强和识别方法[J]. 电子学报,2024,52(04):1282-1287.
LIAN Xiao-yu, XIA Nan, DAI Gao-le, et al. An In-Vehicle Interaction Speech Enhancement and Recognition Method Based on Lightweight Models in Complex Environment[J]. Acta Electronica Sinica, 2024, 52(04): 1282-1287.
廉筱峪,夏楠,戴高乐,等. 复杂噪声环境下基于轻量化模型的车内交互语音增强和识别方法[J]. 电子学报,2024,52(04):1282-1287. DOI:10.12263/DZXB.20230905
LIAN Xiao-yu, XIA Nan, DAI Gao-le, et al. An In-Vehicle Interaction Speech Enhancement and Recognition Method Based on Lightweight Models in Complex Environment[J]. Acta Electronica Sinica, 2024, 52(04): 1282-1287. DOI:10.12263/DZXB.20230905
针对车内语音交互在复杂噪声环境下识别率低以及难以在有限计算资源设备上部署问题,本文设计了轻量化的语音增强模型和语音识别模型并进行联合训练.语音增强模型引入多尺度通道时频注意力模块来提取多尺度时频特征和各个维度上的关键信息.在语音识别模型中提出了多头逐元素线性注意力,显著降低了注意力模块所需的计算复杂度.实验表明,在自制数据集上这一联合训练模型表现出良好的噪声鲁棒性.
In order to solve the problem of low recognition rate of in-vehicle voice interaction in complex noise environment and difficult deployment on devices with limited computing resources
this article proposes a lightweight and robust voice recognition method based on joint training framework in the noisy environment. The speech enhancement model introduces a multi-scale channel time-frequency attention module to extract multi-scale time-frequency features and key information in various dimensions. In the speech recognition model
multi-head element-wise linear attention is proposed
which significantly reduces the computational complexity required for the attention module. Experiments show that the joint training model shows good noise robustness on the self-made dataset.
袁文浩 , 胡少东 , 时云龙 , 等 . 一种用于语音增强的卷积门控循环网络 [J ] . 电子学报 , 2022 , 50 ( 12 ): 2945 - 2956 .
YUAN W H , HU S D , SHI Y L , et al . A convolutional gated recurrent network for speech enhancement [J ] . Acta Electronica Sinica , 2020 , 48 ( 7 ): 1276 - 1283 . (in Chinese)
FAN C H , ZHANG H M , YI J Y , et al . SpecMNet: Spectrum mend network for monaural speech enhancement [J ] . Applied Acoustics , 2022 , 194 : 108792 .
XU X M , TU W P , YANG Y H . CASE-Net: Integrating local and non-local attention operations for speech enhancement [J ] . Speech Communication , 2023 , 148 : 31 - 39 .
GULATI A , QIN J , C C CHIUet al . Conformer: Convolution-augmented Transformer for speech recognition [C ] // Interspeech 2020 . Singapore : ISCA , 2020 : 5036 - 5040 .
VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems . New York : ACM , 2017 : 5999 - 6009 .
LI S Q , XU M L , ZHANG X L , et al . Efficient conformer-based speech recognition with linear attention [C ] // Asia-Pacific Signal and Information Processing Association Annual Summit and Conference . New York : IEEE , 2021 : 448 - 453 .
李宜亭 , 屈丹 , 杨绪魁 , 等 . 一种改进的线性注意力机制语音识别方法 [J ] . 信号处理 , 2023 , 39 ( 3 ): 516 - 525 .
LI Y T , QU D , YANG X K , et al . Speech recognition model based on improved linear attention mechanism [J ] . Journal of Signal Processing , 2023 , 39 ( 3 ): 516 - 525 . (in Chinese)
FAN C H , DING M M , YI J Y , et al . Two-stage deep spectrum fusion for noise-robust end-to-end speech recognition [J ] . Applied Acoustics , 2023 , 212 : 109547 .
ZHU Q S , ZHANG J , ZHANG Z Q , et al . A joint speech enhancement and self-supervised representation learning framework for noise-robust speech recognition [J ] . ACM Transactions on Audio, Speech, and Language Processing , 2023 , 31 : 1927 - 1939 .
0
浏览量
16
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621