复杂噪声环境下基于轻量化模型的车内交互语音增强和识别方法

廉筱峪, 夏楠, 戴高乐, 杨红琴

电子学报 ›› 2024, Vol. 52 ›› Issue (4) : 1282-1287.

PDF(4060 KB)
PDF(4060 KB)
电子学报 ›› 2024, Vol. 52 ›› Issue (4) : 1282-1287. DOI: 10.12263/DZXB.20230905
学术论文

复杂噪声环境下基于轻量化模型的车内交互语音增强和识别方法

作者信息 +

An In-Vehicle Interaction Speech Enhancement and Recognition Method Based on Lightweight Models in Complex Environment

Author information +
文章历史 +

摘要

针对车内语音交互在复杂噪声环境下识别率低以及难以在有限计算资源设备上部署问题,本文设计了轻量化的语音增强模型和语音识别模型并进行联合训练.语音增强模型引入多尺度通道时频注意力模块来提取多尺度时频特征和各个维度上的关键信息.在语音识别模型中提出了多头逐元素线性注意力,显著降低了注意力模块所需的计算复杂度.实验表明,在自制数据集上这一联合训练模型表现出良好的噪声鲁棒性.

Abstract

In order to solve the problem of low recognition rate of in-vehicle voice interaction in complex noise environment and difficult deployment on devices with limited computing resources, this article proposes a lightweight and robust voice recognition method based on joint training framework in the noisy environment. The speech enhancement model introduces a multi-scale channel time-frequency attention module to extract multi-scale time-frequency features and key information in various dimensions. In the speech recognition model, multi-head element-wise linear attention is proposed, which significantly reduces the computational complexity required for the attention module. Experiments show that the joint training model shows good noise robustness on the self-made dataset.

关键词

深度学习 / 语音增强 / 语音识别 / 注意力机制 / 联合训练

Key words

deep learning / speech enhancement / speech recognition / attention mechanism / joint training

引用本文

导出引用
廉筱峪 , 夏楠 , 戴高乐 , 杨红琴. 复杂噪声环境下基于轻量化模型的车内交互语音增强和识别方法[J]. 电子学报, 2024, 52(4): 1282-1287. https://doi.org/10.12263/DZXB.20230905
LIAN Xiao-yu , XIA Nan , DAI Gao-le , YANG Hong-qin. An In-Vehicle Interaction Speech Enhancement and Recognition Method Based on Lightweight Models in Complex Environment[J]. Acta Electronica Sinica, 2024, 52(4): 1282-1287. https://doi.org/10.12263/DZXB.20230905

参考文献

1
袁文浩, 胡少东, 时云龙, 等. 一种用于语音增强的卷积门控循环网络[J]. 电子学报, 2022, 50(12): 2945-2956.
YUAN W H, HU S D, SHI Y L, et al. A convolutional gated recurrent network for speech enhancement[J]. Acta Electronica Sinica, 2020, 48(7): 1276-1283. (in Chinese)
2
FAN C H, ZHANG H M, YI J Y, et al. SpecMNet: Spectrum mend network for monaural speech enhancement[J]. Applied Acoustics, 2022, 194: 108792.
3
XU X M, TU W P, YANG Y H. CASE-Net: Integrating local and non-local attention operations for speech enhancement[J]. Speech Communication, 2023, 148: 31-39.
4
GULATI A, QIN J, CHIUet al C C. Conformer: Convolution-augmented Transformer for speech recognition[C]//Interspeech 2020. Singapore: ISCA, 2020: 5036-5040.
5
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 5999-6009.
6
LI S Q, XU M L, ZHANG X L, et al. Efficient conformer-based speech recognition with linear attention[C]//Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. New York: IEEE, 2021: 448-453.
7
李宜亭, 屈丹, 杨绪魁, 等. 一种改进的线性注意力机制语音识别方法[J]. 信号处理, 2023, 39(3): 516-525.
LI Y T, QU D, YANG X K, et al. Speech recognition model based on improved linear attention mechanism[J]. Journal of Signal Processing, 2023, 39(3): 516-525. (in Chinese)
8
FAN C H, DING M M, YI J Y, et al. Two-stage deep spectrum fusion for noise-robust end-to-end speech recognition[J]. Applied Acoustics, 2023, 212: 109547.
9
ZHU Q S, ZHANG J, ZHANG Z Q, et al. A joint speech enhancement and self-supervised representation learning framework for noise-robust speech recognition[J]. ACM Transactions on Audio, Speech, and Language Processing, 2023, 31: 1927-1939.

基金

教育部产学合作协同育人项目(220603231024713)
PDF(4060 KB)

Accesses

Citation

Detail

段落导航
相关文章

/