复杂噪声环境下基于轻量化模型的车内交互语音增强和识别方法

廉筱峪; 夏楠; 戴高乐; 杨红琴

doi:10.12263/DZXB.20230905

您当前的位置：

首页 >

文章列表页 >

复杂噪声环境下基于轻量化模型的车内交互语音增强和识别方法

学术论文 | 更新时间：2025-12-08

- 复杂噪声环境下基于轻量化模型的车内交互语音增强和识别方法
- An In-Vehicle Interaction Speech Enhancement and Recognition Method Based on Lightweight Models in Complex Environment
- 电子学报 2024年52卷第4期页码：1282-1287
- 作者机构：
  
  大连工业大学信息科学与工程学院，辽宁大连 116034
- 作者简介：
  
  [ "廉筱峪男，2001年7月出生于辽宁省抚顺市.现为大连工业大学硕士研究生.主要研究方向为语音信号处理. E-mail: 709242393@qq.com" ]
  [ "夏楠男，1983年5月出生于辽宁省大连市.2013年在大连理工大学获工学博士学位，其后在国家无线电监测中心从事无线电监测定位研究工作，高级工程师，现为大连工业大学信息科学与工程学院副教授.主要研究方向为阵列信号处理、语音信号处理等. E-mail: xia_nan0520@aliyun.com" ]
  [ "戴高乐男，2002年11月出生于浙江省宁波市，现为大连工业大学本科生.主要研究方向为语音信号处理. E-mail: 2050491891@qq.com" ]
  [ "杨红琴女，2000年12月出生于云南省昆明市，现为大连工业大学硕士研究生.主要研究方向为语音信号处理、语音情感识别. E-mail: 1909847594@qq.com" ]
- 基金信息：
  
  教育部产学合作协同育人项目(220603231024713)
- DOI：10.12263/DZXB.20230905
  中图分类号： TN912.3
- 收稿：2023-09-26，
  
  修回：2024-01-28，
  
  纸质出版：2024-04-25
- 稿件说明：
移动端阅览
廉筱峪,夏楠,戴高乐,等. 复杂噪声环境下基于轻量化模型的车内交互语音增强和识别方法[J]. 电子学报,2024,52(04):1282-1287.

LIAN Xiao-yu, XIA Nan, DAI Gao-le, et al. An In-Vehicle Interaction Speech Enhancement and Recognition Method Based on Lightweight Models in Complex Environment[J]. Acta Electronica Sinica, 2024, 52(04): 1282-1287.
廉筱峪,夏楠,戴高乐,等. 复杂噪声环境下基于轻量化模型的车内交互语音增强和识别方法[J]. 电子学报,2024,52(04):1282-1287. DOI：10.12263/DZXB.20230905

LIAN Xiao-yu, XIA Nan, DAI Gao-le, et al. An In-Vehicle Interaction Speech Enhancement and Recognition Method Based on Lightweight Models in Complex Environment[J]. Acta Electronica Sinica, 2024, 52(04): 1282-1287. DOI：10.12263/DZXB.20230905

摘要

针对车内语音交互在复杂噪声环境下识别率低以及难以在有限计算资源设备上部署问题，本文设计了轻量化的语音增强模型和语音识别模型并进行联合训练.语音增强模型引入多尺度通道时频注意力模块来提取多尺度时频特征和各个维度上的关键信息.在语音识别模型中提出了多头逐元素线性注意力，显著降低了注意力模块所需的计算复杂度.实验表明，在自制数据集上这一联合训练模型表现出良好的噪声鲁棒性.

Abstract

In order to solve the problem of low recognition rate of in-vehicle voice interaction in complex noise environment and difficult deployment on devices with limited computing resources

this article proposes a lightweight and robust voice recognition method based on joint training framework in the noisy environment. The speech enhancement model introduces a multi-scale channel time-frequency attention module to extract multi-scale time-frequency features and key information in various dimensions. In the speech recognition model

multi-head element-wise linear attention is proposed

which significantly reduces the computational complexity required for the attention module. Experiments show that the joint training model shows good noise robustness on the self-made dataset.

关键词

Keywords

references

袁文浩 , 胡少东 , 时云龙 , 等 . 一种用于语音增强的卷积门控循环网络 [J ] . 电子学报 , 2022 , 50 ( 12 ): 2945 - 2956 .

YUAN W H , HU S D , SHI Y L , et al . A convolutional gated recurrent network for speech enhancement [J ] . Acta Electronica Sinica , 2020 , 48 ( 7 ): 1276 - 1283 . (in Chinese)

FAN C H , ZHANG H M , YI J Y , et al . SpecMNet: Spectrum mend network for monaural speech enhancement [J ] . Applied Acoustics , 2022 , 194 : 108792 .

XU X M , TU W P , YANG Y H . CASE-Net: Integrating local and non-local attention operations for speech enhancement [J ] . Speech Communication , 2023 , 148 : 31 - 39 .

GULATI A , QIN J , C C CHIUet al . Conformer: Convolution-augmented Transformer for speech recognition [C ] // Interspeech 2020 . Singapore : ISCA , 2020 : 5036 - 5040 .

VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems . New York : ACM , 2017 : 5999 - 6009 .

LI S Q , XU M L , ZHANG X L , et al . Efficient conformer-based speech recognition with linear attention [C ] // Asia-Pacific Signal and Information Processing Association Annual Summit and Conference . New York : IEEE , 2021 : 448 - 453 .

李宜亭 , 屈丹 , 杨绪魁 , 等 . 一种改进的线性注意力机制语音识别方法 [J ] . 信号处理 , 2023 , 39 ( 3 ): 516 - 525 .

LI Y T , QU D , YANG X K , et al . Speech recognition model based on improved linear attention mechanism [J ] . Journal of Signal Processing , 2023 , 39 ( 3 ): 516 - 525 . (in Chinese)

FAN C H , DING M M , YI J Y , et al . Two-stage deep spectrum fusion for noise-robust end-to-end speech recognition [J ] . Applied Acoustics , 2023 , 212 : 109547 .

ZHU Q S , ZHANG J , ZHANG Z Q , et al . A joint speech enhancement and self-supervised representation learning framework for noise-robust speech recognition [J ] . ACM Transactions on Audio, Speech, and Language Processing , 2023 , 31 : 1927 - 1939 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于EIMYOLO的高分遥感图像目标检测

基于多重注意力和感知加权学习的单图像高动态范围重建

基于感知和记忆的视频动态质量评价

MoGE：基于图上下文增强的多任务推荐算法