

浏览全部资源
扫码关注微信
1.清华大学电子工程系,北京 100084
2.江苏师范大学语言科学与艺术学院,江苏徐州 221009
Received:10 February 2022,
Revised:2022-07-30,
Published:25 December 2023
移动端阅览
陈旭初,张卫强,马勇.基于原始波形的端到端阿尔茨海默症检测方法[J].电子学报,2023,51(12):3582-3590.
CHEN Xu-chu,ZHANG Wei-qiang,MA Yong.Raw Waveform-Based End-to-End Alzheimer's Disease Detection Method[J].ACTA ELECTRONICA SINICA,2023,51(12):3582-3590.
陈旭初,张卫强,马勇.基于原始波形的端到端阿尔茨海默症检测方法[J].电子学报,2023,51(12):3582-3590. DOI: 10.12263/DZXB.20220162.
CHEN Xu-chu,ZHANG Wei-qiang,MA Yong.Raw Waveform-Based End-to-End Alzheimer's Disease Detection Method[J].ACTA ELECTRONICA SINICA,2023,51(12):3582-3590. DOI: 10.12263/DZXB.20220162.
阿尔茨海默症(Alzheimer’s Disease,AD)是一种退行性疾病,随着病情加重,患者的语言能力逐渐减弱.目前已经有研究者使用梅尔谱图、梅尔频率倒谱系数(Mel Frequency Cepstral Coefficient,MFCC)等声学特征对阿尔茨海默症患者和健康人进行分类,但是对于使用神经网络从原始波形提取特征进行阿尔茨海默症检测还缺少进一步的探索.本文提出一种基于原始波形的端到端阿尔茨海默症检测方法.该方法使用一维卷积从原始波形中提取时间维度特征,并使用含有膨胀卷积的残差块提取更复杂的特征.为进一步提高性能,在残差块中引入挤压-激励模块.在全国人机语音通讯学术会议(National Conference on Man-Machine Speech Communication,NCMMSC)2021 AD数据集上,本文提出的模型在长音频测试集、短音频测试集分别达到了86.55%和81.35%的准确率,比基线系统分别提高了6.75%、7.35%.在INTERSPEECH2020 ADReSS数据集上,模型的准确率为66.67%,比基线系统提高4.17%.
Alzheimer's disease (AD) is a degenerative disease
as the disease worsens
the patient's language ability gradually decreases. Some researchers have already used acoustic features such as Mel spectrogram and Mel frequency cepstral coefficient (MFCC) to classify AD patients and healthy individuals
but there is a lack of further exploration on using neural networks to extract features from raw waveforms for AD detection. In this paper
we propose an end-to-end AD detection method based on raw waveforms. The method uses one-dimensional convolution to extract time-dimensional features from the original waveform and uses a residual block containing an inflated convolution to extract more complex features. To further improve performance
the squeeze-and-excitation block is introduced into the residual block. On the national conference on man-machine speech communication (NCMMSC) 2021 AD dataset
the model proposed in this paper achieves 86.55% and 81.35% accuracy on the long audio test set and short audio test set
respectively
which is 6.75% and 7.35% better than the baseline system
respectively. On the INTERSPEECH2020 ADReSS dataset
the accuracy of the model is 66.67%
an improvement of 4.17% over the baseline system.
MATTSON M P . Pathways towards and away from Alzheimer's disease [J ] . Nature , 2004 , 430 ( 7000 ): 631 - 639 .
WORLD HEALTH ORGANIZATION . Dementia: A public helath priority [EB/OL ] . ( 2012-04-11 )[ 2022-01-06 ] . https://www.who.int/mental_health/publications/dementia_report_2012/en/ https://www.who.int/mental_health/publications/dementia_report_2012/en/ .
FOLSTEIN M F , FOLSTEIN S E , MCHUGH P R . “Mini-mental state”: A practical method for grading the cognitive state of patients for the clinician [J ] . Journal of Psychiatric Research , 1975 , 12 ( 3 ): 189 - 198 .
NASREDDINE Z S , PHILLIPS N A , BÉDIRIAN V , et al . The Montreal cognitive assessment, MoCA: A brief screening tool for mild cognitive impairment [J ] . Journal of the American Geriatrics Society , 2005 , 53 ( 4 ): 695 - 699 .
SELKOE D J . Alzheimer's disease [J ] . Cold Spring Harbor Perspectives in Biology , 2011 , 3 ( 7 ): a004457 .
MUELLER K D , KOSCIK R L , HERMANN B P , et al . Declines in connected language are associated with very early mild cognitive impairment: Results from the Wisconsin registry for Alzheimer's prevention [J ] . Frontiers in Aging Neuroscience , 2018 , 9 : 437 .
CHIEN Y W , HONG S Y , CHEAH W T , et al . An automatic assessment system for Alzheimer's disease based on speech using feature sequence generator and recurrent neural network [J ] . Scientific Reports , 2019 , 9 ( 1 ): 19597 .
CHEN J , YE J P , TANG F Y , et al . Automatic detection of Alzheimer's disease using spontaneous speech only [C ] // Interspeech , 2021 . Baixas : International Speech Communication Association , 2021: 3830 - 3834 .
KÖNIG A , SATT A , SORIN A , et al . Automatic speech analysis for the assessment of patients with predementia and Alzheimer's disease [J ] . Alzheimer's & Dementia: Diagnosis , Assessment & Disease Monitoring, 2015 , 1 ( 1 ): 112 - 124 .
LUZ S . Longitudinal monitoring and detection of Alzheimer's type dementia from spontaneous speech data [C ] // 2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS) . Piscataway : IEEE , 2017 : 45 - 46 .
YUAN J H , BIAN Y C , CAI X Y , et al . Disfluencies and fine-tuning pre-trained language models for detection of Alzheimer's disease [C ] // Interspeech 2020 . Baixas : International Speech Communication Association , 2020 : 2162 - 2166 .
ZHU Z N , NOVIKOVA J , RUDZICZ F . Detecting cognitive impairments by agreeing on interpretations of linguistic features [C ] // Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , Volume 1 . Stroudsburg : Association for Computational Linguistics , 2019: 1431 - 1441 .
EYBEN F , SCHERER K R , SCHULLER B W , et al . The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing [J ] . IEEE Transactions on Affective Computing , 2015 , 7 ( 2 ): 190 - 202 .
PALAZ D , COLLOBERT R , MAGIMAI-DOSS M . Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks [C ] // Interspeech 2013 . Baixas : International Speech Communication Association , 2013 : 1766 - 1770 .
MUCKENHIRN H , MAGIMAI-DOSS M , MARCELL S . Towards directly modeling raw speech signal for speaker verification using CNN S [C ] // 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Piscataway : IEEE , 2018 : 4884 - 4888 .
HOSHEN Y , WEISS R J , WILSON K W . Speech acoustic modeling from raw multichannel waveforms [C ] // 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Piscataway : IEEE , 2015 : 4624 - 4628 .
LEE J , PARK J , KIM K L , et al . Sample-level deep convolutional neural networks for music auto-tagging using raw waveforms [EB/OL ] . ( 2017-03-06 )[ 2022-01-06 ] . https://arxiv.org/abs/1703.01789 https://arxiv.org/abs/1703.01789 .
SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition [EB/OL ] . ( 2014-09-04 ) [ 2022-01-06 ] . https://arxiv.org/abs/1409.1556 https://arxiv.org/abs/1409.1556 .
HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 770 - 778 .
YU F , KOLTUN V , FUNKHOUSER T . Dilated residual networks [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 636 - 644 .
HU J , SHEN L , ALBANIE S , et al . Squeeze-and-excitation networks [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2020 , 42 ( 8 ): 2011 - 2023 .
GRAVES A , MOHAMED A R , HINTON G . Speech recognition with deep recurrent neural networks [C ] // 2013 IEEE International Conference on Acoustics, Speech and Signal Processing . Piscataway : IEEE , 2013 : 6645 - 6649 .
HE K M , ZHANG X Y , REN S Q , et al . Identity mappings in deep residual networks [C ] // 14th European Conference on Computer Vision (ECCV) . Berlin : Springer Verlag , 2016 : 630 - 645 .
KIM T , LEE J , NAM J . Comparison and analysis of SampleCNN architectures for audio classification [J ] . IEEE Journal of Selected Topics in Signal Processing , 2019 , 13 ( 2 ): 285 - 297 .
LUZ S , HAIDER F , DE LA FUENTE S , et al . Alzheimer's dementia recognition through spontaneous speech: The ADReSS challenge [C ] // Interspeech 2020 . Baixas : International Speech Communication Association , 2020 : 2172 - 2176 .
SRIVASTAVA N , HINTON G , KRIZHEVSKY A , et al . Dropout: A simple way to prevent neural networks from overfitting [J ] . The Journal of Machine Learning Research , 2014 , 15 ( 1 ): 1929 - 1958 .
PRIYANKA M A S , SOLOMI V S , VIJAYALAKSHMI P , et al . Multiresolution feature extraction (MRFE) based speech recognition system [C ] // 2013 International Conference on Recent Trends in Information Technology (ICRTIT) . Piscataway : IEEE , 2014 : 152 - 156 .
KONG Q Q , CAO Y , IQBAL T , et al . PANNs: Large-scale pretrained audio neural networks for audio pattern recognition [J ] . IEEE/ACM Transactions on Audio, Speech, and Language Processing , 2020 , 28 : 2880 - 2894 .
HOWARD A , SANDLER M , CHEN B , et al . Searching for MobileNetV3 [C ] // 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2020 : 1314 - 1324 .
EYBEN F , WENINGER F , GROSS F , et al . Recent developments in openSMILE, the Munich open-source multimedia feature extractor [C ] // Proceedings of the 21st ACM international conference on Multimedia . New York : ACM , 2013 : 835 - 838 .
KIRANYAZ S , AVCI O , ABDELJABER O , et al . 1D convolutional neural networks and applications: A survey [J ] . Mechanical Systems and Signal Processing , 2021 , 151 : 107398 .
0
Views
14
下载量
1
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621