

浏览全部资源
扫码关注微信
清华大学电子工程系,北京 100084
Received:13 January 2023,
Revised:2023-07-19,
Published:25 September 2024
移动端阅览
陈旭初, 蒲钰, 张卫强. 基于dVAE-BERT模型的阿尔茨海默症检测方法[J]. 电子学报, 2024, 52(09): 2971-2978.
CHEN Xu-chu, PU Yu, ZHANG Wei-qiang. Detection of Alzheimer's Disease Based on dVAE-BERT Model[J]. Acta Electronica Sinica, 2024, 52(09): 2971-2978.
陈旭初, 蒲钰, 张卫强. 基于dVAE-BERT模型的阿尔茨海默症检测方法[J]. 电子学报, 2024, 52(09): 2971-2978. DOI:10.12263/DZXB.20230050
CHEN Xu-chu, PU Yu, ZHANG Wei-qiang. Detection of Alzheimer's Disease Based on dVAE-BERT Model[J]. Acta Electronica Sinica, 2024, 52(09): 2971-2978. DOI:10.12263/DZXB.20230050
阿尔茨海默症(Alzheimer’s Disease,AD)是一种神经退行性疾病,患者会出现失语症、语言流畅性降低等症状.目前已经有研究者使用发音特征,流畅性、停顿等副语言学特征,或者从转录的文本中提取特征检测阿尔茨海默症.但是,传统声学特征检测方法难以获取语义信息,而将语音转录成文本又费时费力,并且由于老年人口音、患病等影响,转录质量下降明显.本文使用离散变分自编码器(discrete Variational Autoencoders,dVAE)将语音转换为伪音素序列后,利用BERT(Bidirectional Encoder Representations from Transformers)模型对伪音素序列的连接关系进行建模, 提出一种dVAE-BERT模型,从而提取音频在语言维度的表征.该模型在ADReSSo(Alzheimer’s Dementia Recognition through Spontaneous Speech only)数据集上,模型的准确率为70.42%,比基线系统提高5.63%,其与Wav2vec2.0、HuBERT(Hidden-unit BERT)模型融合后,准确率分别为76.06%、71.83%.
Alzheimer's disease (AD) is a neurodegenerative disease that causes symptoms such as aphasia and decreased speech fluency. Researchers have used articulatory features
paralinguistic features such as fluency and pauses
or features extracted from transcribed text to detect Alzheimer's disease. However
traditional acoustic feature detection methods are difficult to obtain semantic information
while transcribing speech into text is time-consuming and laborious
and the quality of transcription is significantly degraded due to the effects of accent and disease in the elderly. In this paper
we propose a dVAE-BERT (discrete Variational Autoencoders-Bidirectional Encoder Representations from Transformers) model
which uses discrete Variational Autoencoders (dVAE) to convert speech into pseudo-phoneme sequences
and then uses the Bidirectional Encoder Representations from Transformers (BERT) model to model the connection relations of the pseudo-phoneme sequences to extract the representation of audio in the language dimension. The accuracy of the model on the ADReSSo (Alzheimer's Dementia Recognition through Spontaneous Speech only) dataset is 70.42%
which is 5.63% better than the baseline system
and its accuracy is 76.06% and 71.83% after fusion with Wav2vec2.0 and Hidden-unit BERT (HuBERT) models
respectively.
Jr JACK C R , ALBERT M S , KNOPMAN D S , et al . Introduction to the recommendations from the National Institute on Aging-Alzheimer's Association workgroups on diagnostic guidelines for Alzheimer's disease [J ] . Alzheimer's & Dementia , 2011 , 7 ( 3 ): 257 - 262 .
VILLEMAGNE V L , BURNHAM S , BOURGEAT P , et al . Amyloid β deposition, neurodegeneration, and cognitive decline in sporadic Alzheimer's disease: A prospective cohort study [J ] . The Lancet. Neurology , 2013 , 12 ( 4 ): 357 - 367 .
WORLD HEALTH ORGANIZATION . Dementia: A public health priority [EB/OL ] . ( 2012-04-11 )[ 2022-04-06 ] . https://www.who.int/mental_health/publications/dementia_report_2012/en/ https://www.who.int/mental_health/publications/dementia_report_2012/en/ .
POORE Q E , RAPPORT L J , FUERST D R , et al . Word list generation performance in Alzheimer's disease and vascular dementia [J ] . Neuropsychol Dev Cogn B Aging Neuropsychol Cogn , 2006 , 13 ( 1 ): 86 - 94 .
REILLY J , PEELLE J E , ANTONUCCI S M , et al . Anomia as a marker of distinct semantic memory impairments in Alzheimer's disease and semantic dementia [J ] . Neuropsychology , 2011 , 25 ( 4 ): 413 - 426 .
HOFFMANN I , NEMETH D , DYE C D , et al . Temporal parameters of spontaneous speech in Alzheimer's disease [J ] . International Journal of Speech-Language Pathology , 2010 , 12 ( 1 ): 29 - 34 .
LUZ S , DE LA FUENTE S , ALBERT P . A method for analysis of patient speech in dialogue for dementia detection [EB/OL ] . ( 2018-12-25 )[ 2022 - 1021 ] . http://arxiv.org/abs/1811.09919 http://arxiv.org/abs/1811.09919 .
HAIDER F , DE LA FUENTE S , LUZ S . An assessment of paralinguistic acoustic features for detection of Alzheimer's dementia in spontaneous speech [J ] . IEEE Journal of Selected Topics in Signal Processing , 2020 , 14 ( 2 ): 272 - 281 .
LUZ S , HAIDER F , DE LA FUENTE S , et al . Alzheimer's dementia recognition through spontaneous speech: The ADReSS challenge [C ] // Interspeech 2020 . Shanghai : ISCA , 2020 : 2172 - 2176 .
陈旭初 , 张卫强 , 马勇 . 基于原始波形的端到端阿尔茨海默症检测方法 [J ] . 电子学报 , 2023 , 51 ( 12 ): 3582 - 3590 .
CHEN X C , ZHANG W Q , MA Y . Raw waveform-based end-to-end Alzheimer's disease detection method [J ] . Acta Electronica Sinica , 2023 , 51 ( 12 ): 3582 - 3590 . (in Chinese)
EYBEN F , SCHERER K R , SCHULLER B W , et al . The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing [J ] . IEEE Transactions on Affective Computing , 2016 , 7 ( 2 ): 190 - 202 .
WARNITA T , INOUE N , SHINODA K . Detecting Alzheimer's disease using gated convolutional neural network from audio data [C ] // Interspeech 2018 . Hyderabad : ISCA , 2018 : 1706 - 1710 .
CHIEN Y W , HONG S Y , CHEAH W T , et al . An automatic assessment system for Alzheimer's disease based on speech using feature sequence generator and recurrent neural network [J ] . Scientific Reports , 2019 , 9 : 19597 .
ROHANIAN M , HOUGH J , PURVER M . Multi-modal fusion with gating using audio, lexical and disfluency features for Alzheimer's dementia recognition from spontaneous speech [C ] // Interspeech 2020 . Shanghai : ISCA , 2020 : 2187 - 02191 .
BALAGOPALAN A , NOVIKOVA J . Comparing acoustic-based approaches for Alzheimer's disease detection [C ] // Interspeech 2021 . Brno : ISCA , 2021 : 3800 - 3804 .
BAEVSKI A , ZHOU H R , MOHAMED A , et al . wav2vec 2 . 0 : A framework for self-supervised learning of speech representations[EB/OL ] . ( 2020-06-20 )[ 2022-10-21 ] . http://arxiv.org/abs/2006.11477 http://arxiv.org/abs/2006.11477 .
MIRHEIDARI B , BLACKBURN D , WALKER T , et al . Detecting signs of dementia using word vector representations [C ] // Interspeech 2018 . Hyderabad : ISCA , 2018 : 1893 - 1897 .
MIKOLOV T , CHEN K , CORRADO G , et al . Efficient estimation of word representations in vector space [EBB/OL ] . ( 2013-01-16 )[ 2022-10-21 ] . http://arxiv.org/abs/1301.3781 http://arxiv.org/abs/1301.3781 .
PENNINGTON J , SOCHER R , MANNING C . Glove: global vectors for word representation [C ] // Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) . Stroudsburg : Association for Computational Linguistics , 2014 : 1532 - 1543 .
FRASER K C , MELTZER J A , RUDZICZ F . Linguistic features identify Alzheimer's disease in narrative speech [J ] . Journal of Alzheimer's Disease , 2016 , 49 ( 2 ): 407 - 422 .
MEGHANANI A , ANOOP C S , RAMAKRISHNAN A G . Recognition of Alzheimer's dementia from the transcriptions of spontaneous speech using fastText and CNN models [J ] . Frontiers in Computer Science , 2021 , 3 : 624558 .
JOULIN A , GRAVE E , BOJANOWSKI P , et al . Bag of tricks for efficient text classification [C ] // Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2 , Short Papers . Stroudsburg : Association for Computational Linguistics , 2017 : 427 - 431 .
CODINA-FILBÀ J , CÁMBARA G , LUQUE J , et al . Influence of ASR and language model on Alzheimer's disease detection [EB/OL ] . ( 2021-09-20 )[ 2023-01-02 ] . https://arxiv.org/abs/2110.15704 https://arxiv.org/abs/2110.15704 .
RAMESH A , PAVLOV M , GOH G , et al . Zero-Shot text-to-image generation [EB/OL ] . ( 2021-02-26 )[ 2022-10-21 ] . http://arxiv.org/abs/2102.12092 http://arxiv.org/abs/2102.12092 .
ROLFE J T . Discrete variational autoencoders [EB/OL ] . ( 2017-04-22 )[ 2022-10-21 ] . http://arxiv.org/abs/1609. 02200 http://arxiv.org/abs/1609.02200 .
DEVLIN J , CHANG M W , LEE K T , et al . BERT: Pre-training of deep bidirectional transformers for language understanding [EB/OL ] ( 2019-05-24 )[ 2022-12-30 ] . http://arxiv.org/abs/1810.04805 http://arxiv.org/abs/1810.04805 .
LUZ S , HAIDER F , DE LA FUENTE S , et al . Detecting cognitive decline using speech only: The ADReSSo challenge [C ] // Interspeech 2021 . Brno : ISCA , 2021 : 3780 - 3784 .
HSU W N , BOLTE B , TSAI Y H H , et al . HuBERT: Self-supervised speech representation learning by masked prediction of hidden units [J ] . IEEE/ACM Transactions on Audio , Speech and Language Processing, 29 : 3451 - 3460 .
KINGMA D P , WELLING M . Auto-encoding variational bayes [EB/OL ] . ( 2013-12-20 )[ 2022-10-03 ] . http://arxiv.org/abs/1312.6114 http://arxiv.org/abs/1312.6114 .
GAUDER L , PEPINO L , FERRER L , et al . Alzheimer disease recognition using speech-based embeddings from pre-trained models [C ] // Interspeech 2021 . Brno : ISCA , 2021 : 3795 - 3799 .
TU Y , LIN S , QIAO J , et al . Alzheimer's disease diagnosis via multimodal feature fusion [J ] . Computers in Biology and Medicine , 2022 , 148 : 105901 .
CAMPBELL E L , DOCÍO-FERNÁNDEZ L , RABOSO J J , et al . Alzheimer's Dementia detection from audio and text modalities [A/OL ] . ( 2020-08-11 )[ 2023-01-03 ] . http://arxiv.org/abs/2008.04617 http://arxiv.org/abs/2008.04617 .
ILIAS L , ASKOUNIS D , PSARRAS J . A multimodal approach for dementia detection from spontaneous speech with tensor fusion layer [C ] // 2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI) . Piscataway : IEEE , 2022 : 1 - 5 .
CUI C , YANG H C , WANG Y H , et al . Deep multimodal fusion of image and non-image data in disease diagnosis and prognosis: A review [EB/OL ] . ( 2022-03-25 )[ 2022-10-21 ] . http://arxiv.org/abs/2203.15588 http://arxiv.org/abs/2203.15588 .
SANDLER M , HOWARD A , ZHU M L , et al . MobileNetV2: inverted residuals and linear bottlenecks [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 :
HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 770 - 778 .
ZHU Y X , OBYAT A , LIANG X H , et al . WavBERT: exploiting semantic and non-semantic speech using Wav2vec and BERT for dementia detection [C ] // Interspeech 2021 . Brno : ISCA , 2021 : 3790 - 3794 .
0
Views
14
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621