基于dVAE-BERT模型的阿尔茨海默症检测方法

陈旭初; 蒲钰; 张卫强

doi:10.12263/DZXB.20230050

您当前的位置：

首页 >

文章列表页 >

基于dVAE-BERT模型的阿尔茨海默症检测方法

学术论文 | 更新时间：2025-12-08

- 基于dVAE-BERT模型的阿尔茨海默症检测方法
- Detection of Alzheimer's Disease Based on dVAE-BERT Model
- 电子学报 2024年52卷第9期页码：2971-2978
- 作者机构：
  
  清华大学电子工程系，北京 100084
- 作者简介：
  
  [ "陈旭初男，1992年12月出生于河南省驻马店市.现为清华大学电子工程系在读硕士研究生.主要研究方向为音频事件检测、情感识别. E-mail: chen-xc20@mails.tsinghua.edu.cn" ]
  [ "蒲钰男，2001年6月出生于四川省成都市. 现为清华大学电子工程系在读硕士研究生.主要研究方向为阿尔兹海默症检测. E-mail: puy19@mails.tsinghua.edu.cn" ]
  [ "张卫强男，1979年1月出生于河北省雄县.2002年于中国石油大学应用物理系获学士学位，2005年于北京理工大学电子工程系获硕士学位，2009年于清华大学电子工程系获博士学位，2017年斯坦福大学访问学者，现为清华大学电子工程系副研究员.主要研究方向为语音与音频信号处理. E-mail: wqzhang@tsinghua.edu.cn" ]
- 基金信息：
  
  国家自然科学基金(62276153)
- DOI：10.12263/DZXB.20230050
  中图分类号： TP391.5
- 收稿：2023-01-13，
  
  修回：2023-07-19，
  
  纸质出版：2024-09-25
- 稿件说明：
移动端阅览
陈旭初, 蒲钰, 张卫强. 基于dVAE-BERT模型的阿尔茨海默症检测方法[J]. 电子学报, 2024, 52(09): 2971-2978.

CHEN Xu-chu, PU Yu, ZHANG Wei-qiang. Detection of Alzheimer's Disease Based on dVAE-BERT Model[J]. Acta Electronica Sinica, 2024, 52(09): 2971-2978.
陈旭初, 蒲钰, 张卫强. 基于dVAE-BERT模型的阿尔茨海默症检测方法[J]. 电子学报, 2024, 52(09): 2971-2978. DOI：10.12263/DZXB.20230050

CHEN Xu-chu, PU Yu, ZHANG Wei-qiang. Detection of Alzheimer's Disease Based on dVAE-BERT Model[J]. Acta Electronica Sinica, 2024, 52(09): 2971-2978. DOI：10.12263/DZXB.20230050

摘要

阿尔茨海默症（Alzheimer’s Disease，AD）是一种神经退行性疾病，患者会出现失语症、语言流畅性降低等症状.目前已经有研究者使用发音特征，流畅性、停顿等副语言学特征，或者从转录的文本中提取特征检测阿尔茨海默症.但是，传统声学特征检测方法难以获取语义信息，而将语音转录成文本又费时费力，并且由于老年人口音、患病等影响，转录质量下降明显.本文使用离散变分自编码器（discrete Variational Autoencoders，dVAE）将语音转换为伪音素序列后，利用BERT（Bidirectional Encoder Representations from Transformers）模型对伪音素序列的连接关系进行建模，提出一种dVAE-BERT模型，从而提取音频在语言维度的表征.该模型在ADReSSo（Alzheimer’s Dementia Recognition through Spontaneous Speech only）数据集上，模型的准确率为70.42%，比基线系统提高5.63%，其与Wav2vec2.0、HuBERT（Hidden-unit BERT）模型融合后，准确率分别为76.06%、71.83%.

Abstract

Alzheimer's disease (AD) is a neurodegenerative disease that causes symptoms such as aphasia and decreased speech fluency. Researchers have used articulatory features

paralinguistic features such as fluency and pauses

or features extracted from transcribed text to detect Alzheimer's disease. However

traditional acoustic feature detection methods are difficult to obtain semantic information

while transcribing speech into text is time-consuming and laborious

and the quality of transcription is significantly degraded due to the effects of accent and disease in the elderly. In this paper

we propose a dVAE-BERT (discrete Variational Autoencoders-Bidirectional Encoder Representations from Transformers) model

which uses discrete Variational Autoencoders (dVAE) to convert speech into pseudo-phoneme sequences

and then uses the Bidirectional Encoder Representations from Transformers (BERT) model to model the connection relations of the pseudo-phoneme sequences to extract the representation of audio in the language dimension. The accuracy of the model on the ADReSSo (Alzheimer's Dementia Recognition through Spontaneous Speech only) dataset is 70.42%

which is 5.63% better than the baseline system

and its accuracy is 76.06% and 71.83% after fusion with Wav2vec2.0 and Hidden-unit BERT (HuBERT) models

respectively.

关键词

Keywords

references

Jr JACK C R , ALBERT M S , KNOPMAN D S , et al . Introduction to the recommendations from the National Institute on Aging-Alzheimer's Association workgroups on diagnostic guidelines for Alzheimer's disease [J ] . Alzheimer's & Dementia , 2011 , 7 ( 3 ): 257 - 262 .

VILLEMAGNE V L , BURNHAM S , BOURGEAT P , et al . Amyloid β deposition, neurodegeneration, and cognitive decline in sporadic Alzheimer's disease: A prospective cohort study [J ] . The Lancet. Neurology , 2013 , 12 ( 4 ): 357 - 367 .

WORLD HEALTH ORGANIZATION . Dementia: A public health priority [EB/OL ] . ( 2012-04-11 )[ 2022-04-06 ] . https://www.who.int/mental_health/publications/dementia_report_2012/en/ https://www.who.int/mental_health/publications/dementia_report_2012/en/ .

POORE Q E , RAPPORT L J , FUERST D R , et al . Word list generation performance in Alzheimer's disease and vascular dementia [J ] . Neuropsychol Dev Cogn B Aging Neuropsychol Cogn , 2006 , 13 ( 1 ): 86 - 94 .

REILLY J , PEELLE J E , ANTONUCCI S M , et al . Anomia as a marker of distinct semantic memory impairments in Alzheimer's disease and semantic dementia [J ] . Neuropsychology , 2011 , 25 ( 4 ): 413 - 426 .

HOFFMANN I , NEMETH D , DYE C D , et al . Temporal parameters of spontaneous speech in Alzheimer's disease [J ] . International Journal of Speech-Language Pathology , 2010 , 12 ( 1 ): 29 - 34 .

LUZ S , DE LA FUENTE S , ALBERT P . A method for analysis of patient speech in dialogue for dementia detection [EB/OL ] . ( 2018-12-25 )[ 2022 - 1021 ] . http://arxiv.org/abs/1811.09919 http://arxiv.org/abs/1811.09919 .

HAIDER F , DE LA FUENTE S , LUZ S . An assessment of paralinguistic acoustic features for detection of Alzheimer's dementia in spontaneous speech [J ] . IEEE Journal of Selected Topics in Signal Processing , 2020 , 14 ( 2 ): 272 - 281 .

LUZ S , HAIDER F , DE LA FUENTE S , et al . Alzheimer's dementia recognition through spontaneous speech: The ADReSS challenge [C ] // Interspeech 2020 . Shanghai : ISCA , 2020 : 2172 - 2176 .

陈旭初 , 张卫强 , 马勇 . 基于原始波形的端到端阿尔茨海默症检测方法 [J ] . 电子学报 , 2023 , 51 ( 12 ): 3582 - 3590 .

CHEN X C , ZHANG W Q , MA Y . Raw waveform-based end-to-end Alzheimer's disease detection method [J ] . Acta Electronica Sinica , 2023 , 51 ( 12 ): 3582 - 3590 . (in Chinese)

EYBEN F , SCHERER K R , SCHULLER B W , et al . The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing [J ] . IEEE Transactions on Affective Computing , 2016 , 7 ( 2 ): 190 - 202 .

WARNITA T , INOUE N , SHINODA K . Detecting Alzheimer's disease using gated convolutional neural network from audio data [C ] // Interspeech 2018 . Hyderabad : ISCA , 2018 : 1706 - 1710 .

CHIEN Y W , HONG S Y , CHEAH W T , et al . An automatic assessment system for Alzheimer's disease based on speech using feature sequence generator and recurrent neural network [J ] . Scientific Reports , 2019 , 9 : 19597 .

ROHANIAN M , HOUGH J , PURVER M . Multi-modal fusion with gating using audio, lexical and disfluency features for Alzheimer's dementia recognition from spontaneous speech [C ] // Interspeech 2020 . Shanghai : ISCA , 2020 : 2187 - 02191 .

BALAGOPALAN A , NOVIKOVA J . Comparing acoustic-based approaches for Alzheimer's disease detection [C ] // Interspeech 2021 . Brno : ISCA , 2021 : 3800 - 3804 .

BAEVSKI A , ZHOU H R , MOHAMED A , et al . wav2vec 2 . 0 : A framework for self-supervised learning of speech representations[EB/OL ] . ( 2020-06-20 )[ 2022-10-21 ] . http://arxiv.org/abs/2006.11477 http://arxiv.org/abs/2006.11477 .

MIRHEIDARI B , BLACKBURN D , WALKER T , et al . Detecting signs of dementia using word vector representations [C ] // Interspeech 2018 . Hyderabad : ISCA , 2018 : 1893 - 1897 .

MIKOLOV T , CHEN K , CORRADO G , et al . Efficient estimation of word representations in vector space [EBB/OL ] . ( 2013-01-16 )[ 2022-10-21 ] . http://arxiv.org/abs/1301.3781 http://arxiv.org/abs/1301.3781 .

PENNINGTON J , SOCHER R , MANNING C . Glove: global vectors for word representation [C ] // Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) . Stroudsburg : Association for Computational Linguistics , 2014 : 1532 - 1543 .

FRASER K C , MELTZER J A , RUDZICZ F . Linguistic features identify Alzheimer's disease in narrative speech [J ] . Journal of Alzheimer's Disease , 2016 , 49 ( 2 ): 407 - 422 .

MEGHANANI A , ANOOP C S , RAMAKRISHNAN A G . Recognition of Alzheimer's dementia from the transcriptions of spontaneous speech using fastText and CNN models [J ] . Frontiers in Computer Science , 2021 , 3 : 624558 .

JOULIN A , GRAVE E , BOJANOWSKI P , et al . Bag of tricks for efficient text classification [C ] // Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2 , Short Papers . Stroudsburg : Association for Computational Linguistics , 2017 : 427 - 431 .

CODINA-FILBÀ J , CÁMBARA G , LUQUE J , et al . Influence of ASR and language model on Alzheimer's disease detection [EB/OL ] . ( 2021-09-20 )[ 2023-01-02 ] . https://arxiv.org/abs/2110.15704 https://arxiv.org/abs/2110.15704 .

RAMESH A , PAVLOV M , GOH G , et al . Zero-Shot text-to-image generation [EB/OL ] . ( 2021-02-26 )[ 2022-10-21 ] . http://arxiv.org/abs/2102.12092 http://arxiv.org/abs/2102.12092 .

ROLFE J T . Discrete variational autoencoders [EB/OL ] . ( 2017-04-22 )[ 2022-10-21 ] . http://arxiv.org/abs/1609. 02200 http://arxiv.org/abs/1609.02200 .

DEVLIN J , CHANG M W , LEE K T , et al . BERT: Pre-training of deep bidirectional transformers for language understanding [EB/OL ] ( 2019-05-24 )[ 2022-12-30 ] . http://arxiv.org/abs/1810.04805 http://arxiv.org/abs/1810.04805 .

LUZ S , HAIDER F , DE LA FUENTE S , et al . Detecting cognitive decline using speech only: The ADReSSo challenge [C ] // Interspeech 2021 . Brno : ISCA , 2021 : 3780 - 3784 .

HSU W N , BOLTE B , TSAI Y H H , et al . HuBERT: Self-supervised speech representation learning by masked prediction of hidden units [J ] . IEEE/ACM Transactions on Audio , Speech and Language Processing, 29 : 3451 - 3460 .

KINGMA D P , WELLING M . Auto-encoding variational bayes [EB/OL ] . ( 2013-12-20 )[ 2022-10-03 ] . http://arxiv.org/abs/1312.6114 http://arxiv.org/abs/1312.6114 .

GAUDER L , PEPINO L , FERRER L , et al . Alzheimer disease recognition using speech-based embeddings from pre-trained models [C ] // Interspeech 2021 . Brno : ISCA , 2021 : 3795 - 3799 .

TU Y , LIN S , QIAO J , et al . Alzheimer's disease diagnosis via multimodal feature fusion [J ] . Computers in Biology and Medicine , 2022 , 148 : 105901 .

CAMPBELL E L , DOCÍO-FERNÁNDEZ L , RABOSO J J , et al . Alzheimer's Dementia detection from audio and text modalities [A/OL ] . ( 2020-08-11 )[ 2023-01-03 ] . http://arxiv.org/abs/2008.04617 http://arxiv.org/abs/2008.04617 .

ILIAS L , ASKOUNIS D , PSARRAS J . A multimodal approach for dementia detection from spontaneous speech with tensor fusion layer [C ] // 2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI) . Piscataway : IEEE , 2022 : 1 - 5 .

CUI C , YANG H C , WANG Y H , et al . Deep multimodal fusion of image and non-image data in disease diagnosis and prognosis: A review [EB/OL ] . ( 2022-03-25 )[ 2022-10-21 ] . http://arxiv.org/abs/2203.15588 http://arxiv.org/abs/2203.15588 .

SANDLER M , HOWARD A , ZHU M L , et al . MobileNetV2: inverted residuals and linear bottlenecks [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 :

HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 770 - 778 .

ZHU Y X , OBYAT A , LIANG X H , et al . WavBERT: exploiting semantic and non-semantic speech using Wav2vec and BERT for dementia detection [C ] // Interspeech 2021 . Brno : ISCA , 2021 : 3790 - 3794 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于原始波形的端到端阿尔茨海默症检测方法