1.西南交通大学计算机与人工智能学院,四川成都 611756
2.可持续城市交通智能化教育部工程研究中心,四川成都 611756
3.综合交通大数据应用技术国家工程实验室,四川成都 611756
4.四川省制造业产业链协同与信息化支撑技术重点实验室,四川成都 611756
[ "樊琳 女,博士研究生.主要研究方向为计算机视觉及医学图像分析.E-mail: linfan@my.swjtu.edu.cn" ]
[ "龚勋 男,博士,教授,博士生导师.主要研究方向为计算机视觉、人工智能和医学图像分析.E-mail: xgong@home.swjtu.edu.cn" ]
[ "郑岑洋 男,硕士研究生.主要研究方向为计算机视觉及医学图像分析.E-mail: Z_C_Y@my.swjtu.edu.cn" ]
收稿:2023-12-05,
修回:2024-04-20,
纸质出版:2024-07-25
移动端阅览
樊琳, 龚勋, 郑岑洋. 基于文本引导下的多模态医学图像分析算法[J]. 电子学报, 2024, 52(07): 2341-2355.
FAN Lin, GONG Xun, ZHENG Cen-yang. A Multi-Modal Medical Image Analysis Algorithm Based on Text Guidance[J]. Acta Electronica Sinica, 2024, 52(07): 2341-2355.
樊琳, 龚勋, 郑岑洋. 基于文本引导下的多模态医学图像分析算法[J]. 电子学报, 2024, 52(07): 2341-2355. DOI:10.12263/DZXB.20231135
FAN Lin, GONG Xun, ZHENG Cen-yang. A Multi-Modal Medical Image Analysis Algorithm Based on Text Guidance[J]. Acta Electronica Sinica, 2024, 52(07): 2341-2355. DOI:10.12263/DZXB.20231135
结合胃镜超声和白光内镜可以更准确地识别胃肠道间质瘤.但是现有的多模态方法往往仅关注于图像特征,忽略了诊断文本信息中所包含的语义信息对于精确理解和诊断医学图像的重要性.为此,本文提出一种新的基于文本引导下的多模态医学图像分析算法框架(Text-guided Multi-modal Medical image analysis framework,TMM-Net).TMM-Net使用多阶段的诊断文本来引导模型学习,以提取图像中的关键诊断信息特征,然后通过交叉模态注意力机制促进多模态特征之间的交互.值得注意的是,TMM-Net通过预测病变属性来模拟临床诊断过程,从而增强了可解释性.验证实验在两个中心包含10 025个模态数据对的数据集上进行.结果表明,该方法相比目前最优的GISTs诊断方法精度提升7.7%,同时获得了最高的(Area Under the Curve,AUC)值:0.927,其可解释性可以更好地适合临床需求.
Combining gastroscopy ultrasound and white light endoscopy can improve the accuracy of identifying gastrointestinal stromal tumors (GISTs). However
existing multi-modal methods often focus solely on image features and overlook the semantic relevance contained in diagnostic textual information
which is crucial for precise understanding and diagnosis of medical images. To address this issue
we propose a novel text-guided multi-modal medical image analysis framework (TMM-Net). TMM-Net extracts key diagnostic information features from images through a multi-stage guided model of diagnostic text
and then promotes the interaction of multi-modal features through cross-modal attention mechanisms. Notably
TMM-Net simulates the clinical diagnostic process by predicting lesion attributes
enhancing interpretability. Validation experiments were conducted on a dataset consisting of 10 025 modality data pairs from two centers. The results show that the proposed method achieves a 7.7% improvement in accuracy compared to the current state-of-the-art GISTs diagnostic method
with the highest AUC (Area Under the Curve) value of 0.927
and its interpretability may better suit clinical needs.
PARK E Y , KIM G H . Diagnosis of gastric subepithelial tumors using endoscopic ultrasonography or abdominopelvic computed tomography: Which is better? [J ] . Clinical Endoscopy , 2019 , 52 ( 6 ): 519 - 520 .
PALLIO S , CRINÒ S F , MAIDA M , et al . Endoscopic ultrasound advanced techniques for diagnosis of gastrointestinal stromal tumours [J ] . Cancers , 2023 , 15 ( 4 ): 1285 .
中华医学会消化内镜分会NOTES、外科学组 , 中国医师协会内镜医师分会消化内镜专业委员会 , 中华医学会外科学分会胃肠外科学组 . 中国消化道黏膜下肿瘤内镜诊治专家共识(2023版) [J ] . 中国实用外科杂志 , 2023 , 43 ( 3 ): 241 - 251 .
KHAN S , ZHANG R , FANG W L , et al . Reliability of endoscopic ultrasound using miniprobes and grayscale histogram analysis in diagnosing upper gastrointestinal subepithelial lesions [J ] . Gastroenterology Research and Practice , 2020 , 2020 : 6591341 .
HORIE Y , YOSHIO T , AOYAMA K , et al . Diagnostic outcomes of esophageal cancer by artificial intelligence using convolutional neural networks [J ] . Gastrointestinal Endoscopy , 2019 , 89 ( 1 ): 25 - 32 .
HIRASAWA T , AOYAMA K , TANIMOTO T , et al . Application of artificial intelligence using a convolutional neural network for detecting gastric cancer in endoscopic images [J ] . Gastric Cancer , 2018 , 21 ( 4 ): 653 - 660 .
BYRNE M F , CHAPADOS N , SOUDAN F , et al . Real-time differentiation of adenomatous and hyperplastic diminutive colorectal polyps during analysis of unaltered videos of standard colonoscopy using a deep learning model [J ] . Gut , 2019 , 68 ( 1 ): 94 - 100 .
KUWAHARA T , HARA K , MIZUNO N , et al . Usefulness of deep learning analysis for the diagnosis of malignancy in intraductal papillary mucinous neoplasms of the pancreas [J ] . Clinical and Translational Gastroenterology , 2019 , 10 ( 5 ): 1 - 8 .
HIRAI K , KUWAHARA T , FURUKAWA K , et al . Artificial intelligence-based diagnosis of upper gastrointestinal subepithelial lesions on endoscopic ultrasonography images [J ] . Gastric Cancer: Official Journal of the International Gastric Cancer Association and the Japanese Gastric Cancer Association , 2022 , 25 ( 2 ): 382 - 391 .
KIM Y H , KIM G H , KIM K B , et al . Application of a convolutional neural network in the diagnosis of gastric mesenchymal tumors on endoscopic ultrasonography images [J ] . Journal of Clinical Medicine , 2020 , 9 ( 10 ): 3162 .
MINODA Y , IHARA E , KOMORI K , et al . Efficacy of endoscopic ultrasound with artificial intelligence for the diagnosis of gastrointestinal stromal tumors [J ] . Journal of Gastroenterology , 2020 , 55 ( 12 ): 1119 - 1126 .
OH C K , KIM T , CHO Y K , et al . Convolutional neural network-based object detection model to identify gastrointestinal stromal tumors in endoscopic ultrasound images [J ] . Journal of Gastroenterology and Hepatology , 2021 , 36 ( 12 ): 3387 - 3394 .
SEVEN G , SILAHTAROGLU G , SEVEN O O , et al . Differentiating gastrointestinal stromal tumors from leiomyomas using a neural network trained on endoscopic ultrasonography images [J ] . Digestive Diseases (Basel, Switzerland) , 2022 , 40 ( 4 ): 427 - 435 .
TANAKA H , KAMATA K , ISHIHARA R , et al . Value of artificial intelligence with novel tumor tracking technology in the diagnosis of gastric submucosal tumors by contrast-enhanced harmonic endoscopic ultrasonography [J ] . Journal of Gastroenterology and Hepatology , 2022 , 37 ( 5 ): 841 - 846 .
YANG X T , WANG H , DONG Q , et al . An artificial intelligence system for distinguishing between gastrointestinal stromal tumors and leiomyomas using endoscopic ultrasonography [J ] . Endoscopy , 2022 , 54 ( 3 ): 251 - 261 .
张淑军 , 彭中 , 李辉 . SAU-Net: 基于U-Net和自注意力机制的医学图像分割方法 [J ] . 电子学报 , 2022 , 50 ( 10 ): 2433 - 2442 .
ZHANG S J , PENG Z , LI H . SAU-Net: Medical image segmentation method based on U-Net and self-attention [J ] . Acta Electronica Sinica , 2022 , 50 ( 10 ): 2433 - 2442 . (in Chinese)
刘少鹏 , 赵慧民 , 洪佳明 , 等 . 面向医学图像生成的鲁棒条件生成对抗网络 [J ] . 电子学报 , 2023 , 51 ( 2 ): 427 - 437 .
LIU S P , ZHAO H M , HONG J M , et al . Medical image synthesis using robust conditional GAN [J ] . Acta Electronica Sinica , 2023 , 51 ( 2 ): 427 - 437 . (in Chinese)
LI B , PENG H , LUO X H , et al . Medical image fusion method based on coupled neural P systems in nonsubsampled shearlet transform domain [J ] . International Journal of Neural Systems , 2021 , 31 ( 1 ): 2050050 .
CHEN X , LU Y , WANG Y H , et al . CMBF: Cross-modal-based fusion recommendation algorithm [J ] . Sensors (Basel, Switzerland) , 2021 , 21 ( 16 ): 5275 .
WU P S , WANG Z D , ZHENG B X , et al . AGGN: Attention-based glioma grading network with multi-scale feature extraction and multi-modal information fusion [J ] . Computers in Biology and Medicine , 2023 , 152 : 106457 .
LU S Y , LIU M Z , YIN L R , et al . The multi-modal fusion in visual question answering: A review of attention mechanisms [J ] . PeerJ. Computer Science , 2023 , 9 : e1400 .
金震东 , 刘枫 . 浅谈超声内镜的诊断标准及操作规范 [J ] . 临床消化病杂志 , 2006 , 18 ( 3 ): 132 - 134 .
JIN Z D , LIU F . Discussion on diagnostic criteria and operating specifications of ultrasonic endoscope [J ] . Chinese Journal of Clinical Gastroenterology , 2006 , 18 ( 3 ): 132 - 134 . (in Chinese)
LI M , WANG C J , ZHANG H Y , et al . MV-RAN: Multiview recurrent aggregation network for echocardiographic sequences segmentation and full cardiac cycle analysis [J ] . Computers in Biology and Medicine , 2020 , 120 : 103728 .
XU L , WU H , HE C M , et al . Multi-modal sequence learning for Alzheimer’s disease progression prediction with incomplete variable-length longitudinal data [J ] . Medical Image Analysis , 2022 , 82 : 102643 .
KAUR M , SINGH D . Multi-modality medical image fusion technique using multi-objective differential evolution based deep neural networks [J ] . Journal of Ambient Intelligence and Humanized Computing , 2021 , 12 ( 2 ): 2483 - 2493 .
ZHANG Z Z , CHEN P J , SHI X S , et al . Text-guided neural network training for image recognition in natural scenes and medicine [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2021 , 43 ( 5 ): 1733 - 1745 .
ZHOU Q , YE S Z , WEN M W , et al . Multi-modal medical image fusion based on densely-connected high-resolution CNN and hybrid transformer [J ] . Neural Computing and Applications , 2022 , 34 ( 24 ): 21741 - 21761 .
ORLHAC F , BOUGHDAD S , PHILIPPE C , et al . A postreconstruction harmonization method for multicenter radiomic studies in PET [J ] . Journal of Nuclear Medicine: Official Publication , Society of Nuclear Medicine, 2018 , 59 ( 8 ): 1321 - 1328 .
HERNANDEZ PETZSCHE M R , DE LA ROSA E , HANNING U , et al . ISLES 2022: A multi-center magnetic resonance imaging stroke lesion segmentation dataset [J ] . Scientific Data , 2022 , 9 ( 1 ): 762 .
DE STEFANO N , BATTAGLINI M , PARETO D , et al . MAGNIMS recommendations for harmonization of MRI data in MS multicenter studies [J ] . NeuroImage Clinical , 2022 , 34 : 102972 .
BORDIN V , BERTANI I , MATTIOLI I , et al . Integrating large-scale neuroimaging research datasets: Harmonisation of white matter hyperintensity measurements across Whitehall and UK Biobank datasets [J ] . NeuroImage , 2021 , 237 : 118189 .
RAJAGOPAL A , REDEKOP E , KEMISETTI A , et al . Federated learning with research prototypes: Application to multi-center MRI-based detection of prostate cancer with diverse histopathology [J ] . Academic Radiology , 2023 , 30 ( 4 ): 644 - 657 .
VESAL S , GAYO I , BHATTACHARYA I , et al . Domain generalization for prostate segmentation in transrectal ultrasound images: A multi-center study [J ] . Medical Image Analysis , 2022 , 82 : 102620 .
STARMANS M P A , ARIF M , et al . A multi-center, multi-vendor study to evaluate the generalizability of a radiomics model for classifying prostate cancer: High grade vs. low grade [J ] . Diagnostics (Basel, Switzerland) , 2021 , 11 ( 2 ): 369 .
KARANI N , ERDIL E , CHAITANYA K , et al . Test-time adaptable neural networks for robust medical image segmentation [J ] . Medical Image Analysis , 2021 , 68 : 101907 .
WANG R Z , ZHENG G Y . CyCMIS: Cycle-consistent cross-domain medical image segmentation via diverse image augmentation [J ] . Medical Image Analysis , 2022 , 76 : 102328 .
LI L , ZIMMER V A , SCHNABEL J A , et al . AtrialGeneral: Domain generalization for left atrial segmentation of multi-center LGE MRIs [C ] // International Conference on Medical Image Computing and Computer-Assisted Intervention . Cham : Springer , 2021 : 557 - 566 .
WU J T Y , DE LA HOZ M Á A , KUO P C , et al . Developing and validating multi-modal models for mortality prediction in COVID-19 patients: A multi-center retrospective study [J ] . Journal of Digital Imaging , 2022 , 35 ( 6 ): 1514 - 1529 .
LIU J P , LIU H , GONG S B , et al . Automated cardiac segmentation of cross-modal medical images using unsupervised multi-domain adaptation and spatial neural attention structure [J ] . Medical Image Analysis , 2021 , 72 : 102135 .
TOMAR D , LORTKIPANIDZE M , VRAY G , et al . Self-attentive spatial adaptive normalization for cross-modality domain adaptation [J ] . IEEE Transactions on Medical Imaging , 2021 , 40 ( 10 ): 2926 - 2938 .
ZHANG Y H , JIANG H , MIURA Y , et al . Contrastive learning of medical visual representations from paired images and text [EB/OL ] . ( 2020-10-02 )[ 2023-12-05 ] . http://arxiv.org/abs/2010.00747 http://arxiv.org/abs/2010.00747 .
RADFORD A , KIM J W , HALLACY C , et al . Learning transferable visual models from natural language supervision [EB/OL ] . ( 2021-02-26 )[ 2023-12-05 ] . http://arxiv.org/abs/2103.00020 http://arxiv.org/abs/2103.00020 .
DOSOVITSKIY A , BEYER L , KOLESNIKOV A , et al . An image is worth 16 x 16 words: Transformers for image recognition at scale[EB/OL ] . ( 2020-10-22 )[ 2023-12-05 ] . http://arxiv.org/abs/2010.11929 http://arxiv.org/abs/2010.11929 .
VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems . New York : ACM , 2017 : 6000 - 6010 .
BONMATI E , HU Y P , GRIMWOOD A , et al . Voice-assisted image labeling for endoscopic ultrasound classification using neural networks [J ] . IEEE Transactions on Medical Imaging , 2022 , 41 ( 6 ): 1311 - 1319 .
SEVEN G , SILAHTAROGLU G , SEVEN O O , et al . Differentiating gastrointestinal stromal tumors from leiomyomas using a neural network trained on endoscopic ultrasonography images [J ] . Digestive Diseases (Basel, Switzerland) , 2022 , 40 ( 4 ): 427 - 435 .
LEE M W , KIM G H , KIM K B , et al . Digital image analysis-based scoring system for endoscopic ultrasonography is useful in predicting gastrointestinal stromal tumors [J ] . Gastric Cancer: Official Journal of the International Gastric Cancer Association and the Japanese Gastric Cancer Association , 2019 , 22 ( 5 ): 980 - 987 .
ZHU C , HUA Y F , ZHANG M , et al . A multimodal multipath artificial intelligence system for diagnosing gastric protruded lesions on endoscopy and endoscopic ultrasonography images [J ] . Clinical and Translational Gastroenterology , 2023 , 14 ( 10 ): e00551 .
HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 770 - 778 .
SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition [EB/OL ] . ( 2015-04-10 )[ 2023-12-05 ] . http://arxiv.org/abs/1409.1556 http://arxiv.org/abs/1409.1556 .
OH C K , KIM T , CHO Y K , et al . Convolutional neural network-based object detection model to identify gastrointestinal stromal tumors in endoscopic ultrasound images [J ] . Journal of Gastroenterology and Hepatology , 2021 , 36 ( 12 ): 3387 - 3394 .
KIM Y H , KIM G H , KIM K B , et al . Application of a convolutional neural network in the diagnosis of gastric mesenchymal tumors on endoscopic ultrasonography images [J ] . Journal of Clinical Medicine , 2020 , 9 ( 10 ): 3162 .
HUANG G , LIU Z , VAN DER MAATEN L , et al . Densely connected convolutional networks [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 2261 - 2269 .
0
浏览量
14
下载量
1
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621