An In-Vehicle Interaction Speech Enhancement and Recognition Method Based on Lightweight Models in Complex Environment

LIAN Xiao-yu; XIA Nan; DAI Gao-le; YANG Hong-qin

doi:10.12263/DZXB.20230905

您当前的位置：

首页 >

文章列表页 >

An In-Vehicle Interaction Speech Enhancement and Recognition Method Based on Lightweight Models in Complex Environment

PAPERS | 更新时间：2025-12-08

- An In-Vehicle Interaction Speech Enhancement and Recognition Method Based on Lightweight Models in Complex Environment
- ACTA ELECTRONICA SINICA Vol. 52, Issue 4, Pages: 1282-1287(2024)
- 作者机构：
  
  大连工业大学信息科学与工程学院，辽宁大连 116034
- 作者简介：
- 基金信息：
  
  Industry-university Cooperation Education Project of the Ministry of Education(220603231024713)
- DOI：10.12263/DZXB.20230905
  CLC： TN912.3
- Received：26 September 2023，
  
  Revised：2024-01-28，
  
  Published：25 April 2024
- 稿件说明：
移动端阅览
廉筱峪,夏楠,戴高乐,等. 复杂噪声环境下基于轻量化模型的车内交互语音增强和识别方法[J]. 电子学报,2024,52(04):1282-1287.

LIAN Xiao-yu, XIA Nan, DAI Gao-le, et al. An In-Vehicle Interaction Speech Enhancement and Recognition Method Based on Lightweight Models in Complex Environment[J]. Acta Electronica Sinica, 2024, 52(04): 1282-1287.
廉筱峪,夏楠,戴高乐,等. 复杂噪声环境下基于轻量化模型的车内交互语音增强和识别方法[J]. 电子学报,2024,52(04):1282-1287. DOI：10.12263/DZXB.20230905

LIAN Xiao-yu, XIA Nan, DAI Gao-le, et al. An In-Vehicle Interaction Speech Enhancement and Recognition Method Based on Lightweight Models in Complex Environment[J]. Acta Electronica Sinica, 2024, 52(04): 1282-1287. DOI：10.12263/DZXB.20230905

摘要

针对车内语音交互在复杂噪声环境下识别率低以及难以在有限计算资源设备上部署问题，本文设计了轻量化的语音增强模型和语音识别模型并进行联合训练.语音增强模型引入多尺度通道时频注意力模块来提取多尺度时频特征和各个维度上的关键信息.在语音识别模型中提出了多头逐元素线性注意力，显著降低了注意力模块所需的计算复杂度.实验表明，在自制数据集上这一联合训练模型表现出良好的噪声鲁棒性.

Abstract

In order to solve the problem of low recognition rate of in-vehicle voice interaction in complex noise environment and difficult deployment on devices with limited computing resources

this article proposes a lightweight and robust voice recognition method based on joint training framework in the noisy environment. The speech enhancement model introduces a multi-scale channel time-frequency attention module to extract multi-scale time-frequency features and key information in various dimensions. In the speech recognition model

multi-head element-wise linear attention is proposed

which significantly reduces the computational complexity required for the attention module. Experiments show that the joint training model shows good noise robustness on the self-made dataset.

关键词

Keywords

references

袁文浩 , 胡少东 , 时云龙 , 等 . 一种用于语音增强的卷积门控循环网络 [J ] . 电子学报 , 2022 , 50 ( 12 ): 2945 - 2956 .

YUAN W H , HU S D , SHI Y L , et al . A convolutional gated recurrent network for speech enhancement [J ] . Acta Electronica Sinica , 2020 , 48 ( 7 ): 1276 - 1283 . (in Chinese)

FAN C H , ZHANG H M , YI J Y , et al . SpecMNet: Spectrum mend network for monaural speech enhancement [J ] . Applied Acoustics , 2022 , 194 : 108792 .

XU X M , TU W P , YANG Y H . CASE-Net: Integrating local and non-local attention operations for speech enhancement [J ] . Speech Communication , 2023 , 148 : 31 - 39 .

GULATI A , QIN J , C C CHIUet al . Conformer: Convolution-augmented Transformer for speech recognition [C ] // Interspeech 2020 . Singapore : ISCA , 2020 : 5036 - 5040 .

VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems . New York : ACM , 2017 : 5999 - 6009 .

LI S Q , XU M L , ZHANG X L , et al . Efficient conformer-based speech recognition with linear attention [C ] // Asia-Pacific Signal and Information Processing Association Annual Summit and Conference . New York : IEEE , 2021 : 448 - 453 .

李宜亭 , 屈丹 , 杨绪魁 , 等 . 一种改进的线性注意力机制语音识别方法 [J ] . 信号处理 , 2023 , 39 ( 3 ): 516 - 525 .

LI Y T , QU D , YANG X K , et al . Speech recognition model based on improved linear attention mechanism [J ] . Journal of Signal Processing , 2023 , 39 ( 3 ): 516 - 525 . (in Chinese)

FAN C H , DING M M , YI J Y , et al . Two-stage deep spectrum fusion for noise-robust end-to-end speech recognition [J ] . Applied Acoustics , 2023 , 212 : 109547 .

ZHU Q S , ZHANG J , ZHANG Z Q , et al . A joint speech enhancement and self-supervised representation learning framework for noise-robust speech recognition [J ] . ACM Transactions on Audio, Speech, and Language Processing , 2023 , 31 : 1927 - 1939 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Object Detection Based on EIMYOLO for High-Resolution Remote Sensing Images

Single-Image High Dynamic Range Reconstruction Based on Multi-Attention and Perceptual Weighted Learning

Research of Video Dynamic Quality Evaluation Based on Human Perception and Memory

MoGE: Graph Context Enhanced Multi-Task Recommendation Method

Related Author

CAO Feng

ZENG Ke-wen

LI De-yu

LUO Xi-zhao

TAO Chong-ben

KE De-zhang

CHEN Ye-yao

XU Hai-yong

Related Institution

School of Information and Technology, Shanxi University

School of Computer Science and Technology, Soochow University

School of Electronics and Information Engineering, Suzhou University of Science and Technology

Faculty of Information Science and Engineering, Ningbo University

School of Mathematics and Statistics, Ningbo University

⁰