清华大学电子工程系,北京 100084
[ "陈旭初 男,1992年12月出生于河南省驻马店市.现为清华大学电子工程系在读硕士研究生.主要研究方向为音频事件检测、情感识别. E-mail: chen-xc20@mails.tsinghua.edu.cn" ]
[ "蒲 钰 男,2001年6月出生于四川省成都市. 现为清华大学电子工程系在读硕士研究生.主要研究方向为阿尔兹海默症检测. E-mail: puy19@mails.tsinghua.edu.cn" ]
[ "张卫强 男,1979年1月出生于河北省雄县.2002年于中国石油大学应用物理系获学士学位,2005年于北京理工大学电子工程系获硕士学位,2009年于清华大学电子工程系获博士学位,2017年斯坦福大学访问学者,现为清华大学电子工程系副研究员.主要研究方向为语音与音频信号处理. E-mail: wqzhang@tsinghua.edu.cn" ]
收稿:2023-01-13,
修回:2023-07-19,
纸质出版:2024-09-25
移动端阅览
陈旭初, 蒲钰, 张卫强. 基于dVAE-BERT模型的阿尔茨海默症检测方法[J]. 电子学报, 2024, 52(09): 2971-2978.
CHEN Xu-chu, PU Yu, ZHANG Wei-qiang. Detection of Alzheimer's Disease Based on dVAE-BERT Model[J]. Acta Electronica Sinica, 2024, 52(09): 2971-2978.
陈旭初, 蒲钰, 张卫强. 基于dVAE-BERT模型的阿尔茨海默症检测方法[J]. 电子学报, 2024, 52(09): 2971-2978. DOI:10.12263/DZXB.20230050
CHEN Xu-chu, PU Yu, ZHANG Wei-qiang. Detection of Alzheimer's Disease Based on dVAE-BERT Model[J]. Acta Electronica Sinica, 2024, 52(09): 2971-2978. DOI:10.12263/DZXB.20230050
阿尔茨海默症(Alzheimer’s Disease,AD)是一种神经退行性疾病,患者会出现失语症、语言流畅性降低等症状.目前已经有研究者使用发音特征,流畅性、停顿等副语言学特征,或者从转录的文本中提取特征检测阿尔茨海默症.但是,传统声学特征检测方法难以获取语义信息,而将语音转录成文本又费时费力,并且由于老年人口音、患病等影响,转录质量下降明显.本文使用离散变分自编码器(discrete Variational Autoencoders,dVAE)将语音转换为伪音素序列后,利用BERT(Bidirectional Encoder Representations from Transformers)模型对伪音素序列的连接关系进行建模, 提出一种dVAE-BERT模型,从而提取音频在语言维度的表征.该模型在ADReSSo(Alzheimer’s Dementia Recognition through Spontaneous Speech only)数据集上,模型的准确率为70.42%,比基线系统提高5.63%,其与Wav2vec2.0、HuBERT(Hidden-unit BERT)模型融合后,准确率分别为76.06%、71.83%.
Alzheimer's disease (AD) is a neurodegenerative disease that causes symptoms such as aphasia and decreased speech fluency. Researchers have used articulatory features
paralinguistic features such as fluency and pauses
or features extracted from transcribed text to detect Alzheimer's disease. However
traditional acoustic feature detection methods are difficult to obtain semantic information
while transcribing speech into text is time-consuming and laborious
and the quality of transcription is significantly degraded due to the effects of accent and disease in the elderly. In this paper
we propose a dVAE-BERT (discrete Variational Autoencoders-Bidirectional Encoder Representations from Transformers) model
which uses discrete Variational Autoencoders (dVAE) to convert speech into pseudo-phoneme sequences
and then uses the Bidirectional Encoder Representations from Transformers (BERT) model to model the connection relations of the pseudo-phoneme sequences to extract the representation of audio in the language dimension. The accuracy of the model on the ADReSSo (Alzheimer's Dementia Recognition through Spontaneous Speech only) dataset is 70.42%
which is 5.63% better than the baseline system
and its accuracy is 76.06% and 71.83% after fusion with Wav2vec2.0 and Hidden-unit BERT (HuBERT) models
respectively.
Jr JACK C R , ALBERT M S , KNOPMAN D S , et al . Introduction to the recommendations from the National Institute on Aging-Alzheimer's Association workgroups on diagnostic guidelines for Alzheimer's disease [J ] . Alzheimer's & Dementia , 2011 , 7 ( 3 ): 257 - 262 .
VILLEMAGNE V L , BURNHAM S , BOURGEAT P , et al . Amyloid β deposition, neurodegeneration, and cognitive decline in sporadic Alzheimer's disease: A prospective cohort study [J ] . The Lancet. Neurology , 2013 , 12 ( 4 ): 357 - 367 .
WORLD HEALTH ORGANIZATION . Dementia: A public health priority [EB/OL ] . ( 2012-04-11 )[ 2022-04-06 ] . https://www.who.int/mental_health/publications/dementia_report_2012/en/ https://www.who.int/mental_health/publications/dementia_report_2012/en/ .
POORE Q E , RAPPORT L J , FUERST D R , et al . Word list generation performance in Alzheimer's disease and vascular dementia [J ] . Neuropsychol Dev Cogn B Aging Neuropsychol Cogn , 2006 , 13 ( 1 ): 86 - 94 .
REILLY J , PEELLE J E , ANTONUCCI S M , et al . Anomia as a marker of distinct semantic memory impairments in Alzheimer's disease and semantic dementia [J ] . Neuropsychology , 2011 , 25 ( 4 ): 413 - 426 .
HOFFMANN I , NEMETH D , DYE C D , et al . Temporal parameters of spontaneous speech in Alzheimer's disease [J ] . International Journal of Speech-Language Pathology , 2010 , 12 ( 1 ): 29 - 34 .
LUZ S , DE LA FUENTE S , ALBERT P . A method for analysis of patient speech in dialogue for dementia detection [EB/OL ] . ( 2018-12-25 )[ 2022 - 1021 ] . http://arxiv.org/abs/1811.09919 http://arxiv.org/abs/1811.09919 .
HAIDER F , DE LA FUENTE S , LUZ S . An assessment of paralinguistic acoustic features for detection of Alzheimer's dementia in spontaneous speech [J ] . IEEE Journal of Selected Topics in Signal Processing , 2020 , 14 ( 2 ): 272 - 281 .
LUZ S , HAIDER F , DE LA FUENTE S , et al . Alzheimer's dementia recognition through spontaneous speech: The ADReSS challenge [C ] // Interspeech 2020 . Shanghai : ISCA , 2020 : 2172 - 2176 .
陈旭初 , 张卫强 , 马勇 . 基于原始波形的端到端阿尔茨海默症检测方法 [J ] . 电子学报 , 2023 , 51 ( 12 ): 3582 - 3590 .
CHEN X C , ZHANG W Q , MA Y . Raw waveform-based end-to-end Alzheimer's disease detection method [J ] . Acta Electronica Sinica , 2023 , 51 ( 12 ): 3582 - 3590 . (in Chinese)
EYBEN F , SCHERER K R , SCHULLER B W , et al . The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing [J ] . IEEE Transactions on Affective Computing , 2016 , 7 ( 2 ): 190 - 202 .
WARNITA T , INOUE N , SHINODA K . Detecting Alzheimer's disease using gated convolutional neural network from audio data [C ] // Interspeech 2018 . Hyderabad : ISCA , 2018 : 1706 - 1710 .
CHIEN Y W , HONG S Y , CHEAH W T , et al . An automatic assessment system for Alzheimer's disease based on speech using feature sequence generator and recurrent neural network [J ] . Scientific Reports , 2019 , 9 : 19597 .
ROHANIAN M , HOUGH J , PURVER M . Multi-modal fusion with gating using audio, lexical and disfluency features for Alzheimer's dementia recognition from spontaneous speech [C ] // Interspeech 2020 . Shanghai : ISCA , 2020 : 2187 - 02191 .
BALAGOPALAN A , NOVIKOVA J . Comparing acoustic-based approaches for Alzheimer's disease detection [C ] // Interspeech 2021 . Brno : ISCA , 2021 : 3800 - 3804 .
BAEVSKI A , ZHOU H R , MOHAMED A , et al . wav2vec 2 . 0 : A framework for self-supervised learning of speech representations[EB/OL ] . ( 2020-06-20 )[ 2022-10-21 ] . http://arxiv.org/abs/2006.11477 http://arxiv.org/abs/2006.11477 .
MIRHEIDARI B , BLACKBURN D , WALKER T , et al . Detecting signs of dementia using word vector representations [C ] // Interspeech 2018 . Hyderabad : ISCA , 2018 : 1893 - 1897 .
MIKOLOV T , CHEN K , CORRADO G , et al . Efficient estimation of word representations in vector space [EBB/OL ] . ( 2013-01-16 )[ 2022-10-21 ] . http://arxiv.org/abs/1301.3781 http://arxiv.org/abs/1301.3781 .
PENNINGTON J , SOCHER R , MANNING C . Glove: global vectors for word representation [C ] // Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) . Stroudsburg : Association for Computational Linguistics , 2014 : 1532 - 1543 .
FRASER K C , MELTZER J A , RUDZICZ F . Linguistic features identify Alzheimer's disease in narrative speech [J ] . Journal of Alzheimer's Disease , 2016 , 49 ( 2 ): 407 - 422 .
MEGHANANI A , ANOOP C S , RAMAKRISHNAN A G . Recognition of Alzheimer's dementia from the transcriptions of spontaneous speech using fastText and CNN models [J ] . Frontiers in Computer Science , 2021 , 3 : 624558 .
JOULIN A , GRAVE E , BOJANOWSKI P , et al . Bag of tricks for efficient text classification [C ] // Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2 , Short Papers . Stroudsburg : Association for Computational Linguistics , 2017 : 427 - 431 .
CODINA-FILBÀ J , CÁMBARA G , LUQUE J , et al . Influence of ASR and language model on Alzheimer's disease detection [EB/OL ] . ( 2021-09-20 )[ 2023-01-02 ] . https://arxiv.org/abs/2110.15704 https://arxiv.org/abs/2110.15704 .
RAMESH A , PAVLOV M , GOH G , et al . Zero-Shot text-to-image generation [EB/OL ] . ( 2021-02-26 )[ 2022-10-21 ] . http://arxiv.org/abs/2102.12092 http://arxiv.org/abs/2102.12092 .
ROLFE J T . Discrete variational autoencoders [EB/OL ] . ( 2017-04-22 )[ 2022-10-21 ] . http://arxiv.org/abs/1609. 02200 http://arxiv.org/abs/1609.02200 .
DEVLIN J , CHANG M W , LEE K T , et al . BERT: Pre-training of deep bidirectional transformers for language understanding [EB/OL ] ( 2019-05-24 )[ 2022-12-30 ] . http://arxiv.org/abs/1810.04805 http://arxiv.org/abs/1810.04805 .
LUZ S , HAIDER F , DE LA FUENTE S , et al . Detecting cognitive decline using speech only: The ADReSSo challenge [C ] // Interspeech 2021 . Brno : ISCA , 2021 : 3780 - 3784 .
HSU W N , BOLTE B , TSAI Y H H , et al . HuBERT: Self-supervised speech representation learning by masked prediction of hidden units [J ] . IEEE/ACM Transactions on Audio , Speech and Language Processing, 29 : 3451 - 3460 .
KINGMA D P , WELLING M . Auto-encoding variational bayes [EB/OL ] . ( 2013-12-20 )[ 2022-10-03 ] . http://arxiv.org/abs/1312.6114 http://arxiv.org/abs/1312.6114 .
GAUDER L , PEPINO L , FERRER L , et al . Alzheimer disease recognition using speech-based embeddings from pre-trained models [C ] // Interspeech 2021 . Brno : ISCA , 2021 : 3795 - 3799 .
TU Y , LIN S , QIAO J , et al . Alzheimer's disease diagnosis via multimodal feature fusion [J ] . Computers in Biology and Medicine , 2022 , 148 : 105901 .
CAMPBELL E L , DOCÍO-FERNÁNDEZ L , RABOSO J J , et al . Alzheimer's Dementia detection from audio and text modalities [A/OL ] . ( 2020-08-11 )[ 2023-01-03 ] . http://arxiv.org/abs/2008.04617 http://arxiv.org/abs/2008.04617 .
ILIAS L , ASKOUNIS D , PSARRAS J . A multimodal approach for dementia detection from spontaneous speech with tensor fusion layer [C ] // 2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI) . Piscataway : IEEE , 2022 : 1 - 5 .
CUI C , YANG H C , WANG Y H , et al . Deep multimodal fusion of image and non-image data in disease diagnosis and prognosis: A review [EB/OL ] . ( 2022-03-25 )[ 2022-10-21 ] . http://arxiv.org/abs/2203.15588 http://arxiv.org/abs/2203.15588 .
SANDLER M , HOWARD A , ZHU M L , et al . MobileNetV2: inverted residuals and linear bottlenecks [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 :
HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 770 - 778 .
ZHU Y X , OBYAT A , LIANG X H , et al . WavBERT: exploiting semantic and non-semantic speech using Wav2vec and BERT for dementia detection [C ] // Interspeech 2021 . Brno : ISCA , 2021 : 3790 - 3794 .
0
浏览量
14
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621