电子学报 ›› 2023, Vol. 51 ›› Issue (1): 202-212.DOI: 10.12263/DZXB.20211630

• 学术论文 • 上一篇    下一篇

基于动静态特征双输入神经网络的咳嗽声诊断COVID-19算法

张永梅, 孙捷   

  1. 北方工业大学信息学院,北京 100144
  • 收稿日期:2021-12-08 修回日期:2022-05-17 出版日期:2023-01-25
    • 作者简介:
    • 张永梅 女,1967年1月出生于山西省太原市.现为北方工业大学信息学院教授、博士生导师.获省部级科技进步一等奖1项、二等奖3项,获省级教学成果二等奖2项.授权发明专利19项,申请软件著作权63项.在国内外发表学术论文212篇,出版专著和教材5部.E-mail: zhangym@ncut.edu.cn
      孙捷 男,1997年6月出生于山东省临沂市.现为北方工业大学信息学院硕士研究生.研究方向为图像处理和人工智能.E-mail: sunjie0627@163.com
    • 基金资助:
    • 国家重点研发计划 (2020YFC0811004)

A Dynamic-Static Dual Input Deep Neural Network Algorithm for Diagnosing COVID-19 by Cough

ZHANG Yong-mei, SUN Jie   

  1. School of Information Science and Technology, North China University of Technology, Beijing 100144, China
  • Received:2021-12-08 Revised:2022-05-17 Online:2023-01-25 Published:2023-02-23
    • Supported by:
    • National Key R&D Program of China (2020YFC0811004)

摘要:

新型冠状病毒肺炎(COVID-19)已经在世界范围内造成了严重影响,在防控疫情方面学者们进行了大量研究.利用咳嗽声判断病变部位来诊断新冠肺炎具有非接触、成本低、易获取等优点,但是此类研究在国内较为匮乏.梅尔倒谱系数(Mel Frequency Cepstral Coefficients,MFCC)特征仅能够表示声音的静态特征,而一阶差分MFCC特征还能反应声音的动态特征.为了更好地防治新冠肺炎,本文提出了基于动静态特征双输入神经网络的咳嗽声诊断新冠肺炎算法,通过咳嗽声诊断新冠肺炎.在Coswara数据集基础上,对咳嗽声的音频进行裁剪,提取MFCC和一阶差分MFCC特征训练了一个动静态特征双输入神经网络模型.本文模型采用统计池化层,可以输入不同长度的MFCC特征.实验结果表明,与现有模型相比较,本文算法明显提升了识别准确率、召回率、特异性和F1值.

关键词: 深度学习, 咳嗽声, 新冠肺炎, 梅尔倒谱系数, 音频技术, 卷积神经网络

Abstract:

The COVID-19 (corona virus disease 2019) has caused serious impacts worldwide. Many scholars have done a lot of research on the prevention and control of the epidemic. The diagnosis of COVID-19 by cough is non-contact, low-cost, and easy-access, however, such research is still relatively scarce in China. Mel frequency cepstral coefficients (MFCC) feature can only represent the static sound feature, while the first-order differential MFCC feature can also reflect the dynamic feature of sound. In order to better prevent and treat COVID-19, the paper proposes a dynamic-static dual input deep neural network algorithm for diagnosing COVID-19 by cough. Based on Coswara dataset, cough audio is clipped, MFCC and first-order differential MFCC features are extracted, and a dynamic and static feature dual-input neural network model is trained. The model adopts a statistic pooling layer so that different length of MFCC features can be input. The experiment results show the proposed algorithm can significantly improve the recognition accuracy, recall rate, specificity, and F1-score compared with the existing models.

Key words: deep learning, cough, COVID-19, Mel frequency cepstral coefficients, audio technology, CNN

中图分类号: