MGRW-Transformer:多粒度随机游走可解释性Transformer模型

耿宇; 丁卫平; 黄嘉爽; 鞠恒荣; 孙颖; 王海鹏

doi:10.12263/DZXB.20221181

您当前的位置：

首页 >

文章列表页 >

MGRW-Transformer:多粒度随机游走可解释性Transformer模型

更新时间：2024-09-09

- MGRW-Transformer:多粒度随机游走可解释性Transformer模型
- MGRW-Transformer: Multi-Granularity Random Walk Transformer Model for Interpretable Learning
- 电子学报 2024年页码：1-15
- 作者机构：
  
  南通大学信息科学技术学院，江苏南通 226019
- 作者简介：
  
  [ "耿宇男，1998年出生于江苏扬州.南通大学信息科学技术学院硕士研究生.主要研究领域为粒计算、深度学习." ]
  [ "丁卫平男，1979年出生于江苏金坛，博士，教授，博士生导师.2013年于南京航空航天大学获得工学博士学位.主要研究方向为人工智能及计算智能、多模态机器学习及优化、深度神经网络、粒计算及其在不确定大数据中应用.E-maill: dwp9988@163.com" ]
  [ "黄嘉爽男，1988年出生于江苏南通，博士，讲师.2015年取得南京工业大学硕士学位，2020年取得南京航空航天大学博士学位.主要研究领域为脑网络分析，深度学习." ]
  [ "鞠恒荣男，1989年出生于江苏泰州，博士，副教授.2015年取得江苏科技大学硕士学位，2019年取得南京大学博士学位.主要研究领域为粒计算、粗糙集、机器学习、知识发现." ]
- 基金信息：
  
  国家自然科学基金(61976120;62006128;62102199);江苏省自然科学基金(BK20191445);江苏省双创博士计划（No.（2020）30986）;江苏省高等学校自然科学研究重大项目(21KJA510004);南通市科技局基础科学研究项目(JC2021122)
- DOI：10.12263/DZXB.20221181
  中图分类号： TP18
- 收稿：2022-10-20，
  
  修回：2023-03-13，
  
  网络出版：2024-09-09，
- 稿件说明：
移动端阅览
耿宇, 丁卫平, 黄嘉爽, 等. MGRW-Transformer:多粒度随机游走可解释性Transformer模型[J/OL]. 电子学报, 2024,1-15.

GENG Yu, DING Wei-ping, HUANG Jia-shuang, et al. MGRW-Transformer: Multi-Granularity Random Walk Transformer Model for Interpretable Learning[J/OL]. ACTA ELECTRONICA SINICA, 2024, 1-15.
耿宇, 丁卫平, 黄嘉爽, 等. MGRW-Transformer:多粒度随机游走可解释性Transformer模型[J/OL]. 电子学报, 2024,1-15. DOI： 10.12263/DZXB.20221181.

GENG Yu, DING Wei-ping, HUANG Jia-shuang, et al. MGRW-Transformer: Multi-Granularity Random Walk Transformer Model for Interpretable Learning[J/OL]. ACTA ELECTRONICA SINICA, 2024, 1-15. DOI： 10.12263/DZXB.20221181.

摘要

深度学习模型凭借特征学习能力应用于图像识别任务，但由于缺乏对工作机制的语义解释，因此难以识别复杂医学图像.Vision Transformer模型的自注意力机制具备可解释性.然而，医学图像中的病灶区域往往存在位置多变且大小不定等现象，这使得单纯依靠自注意力模块的深度学习模型难以提供有效的语义解释.为此，本文提出基于多粒度随机游走的可解释性Transformer模型（Multi-Granularity Random Walk Transformer Model For Interpretable Learning，MGRW-Transformer）寻找对识别任务重要的区域.具体来说，首先将图像划分多个子图像块，输入到Vision Transformer中的多头注意力层输出注意力矩阵，然后将图像块作为结点构建无向图，将注意力指引结点作为游走起点进行粗粒度随机游走，接着将每个粗信息粒划分为更细的图像块进行细粒度随机游走，最后根据信息重要度选取最优粗、细信息粒集合并约简融合.综上便可获取输入图像的可视化语义解释效果.本文在自然图像和医学图像两类数据集上对MGRW-Transformer模型进行了验证，在ImageNet-segmentation数据集上比现有方法的pixel accuracy提高了8.09%，mIoU提高了13.82%，在医学图像数据集上能得到合理语义解释.

Abstract

Deep learning model is applied to image recognition task with feature learning ability

but it is difficult to recognize complex medical images due to lack of semantic interpretation of working mechanism. The vision transformer model with a self-attention mechanism offers great interpretability. However

medical images often contain lesions of variable size in different locations

which makes it difficult for a deep learning model with a self-attention module to reach correct and explainable conclusions. We propose a multi-granularity random walk transformer model (MGRW-Transformer) guided by an attention mechanism to find the regions that influence the recognition task. Our method divides the image into multiple sub-image blocks and transfers them to the vision transformer module for classification. The segmented image blocks are used as nodes to construct an undirected graph using the attention node as a starting node and guiding the coarse-grained random walk. We appropriately divide the coarse blocks into finer ones to manage the computational cost and combine the results based on the importance of the discovered features. The result is that the model offers a semantic interpretation of the input image

a visualization of the interpretation. In this paper

the MGRW-Transformer model is verified on natural image and medical image data sets

and the pixel accuracy and mIoU of the ImageNet-segmentation data sets are improved by 8.09% and 13.82%

respectively. Reasonable semantic interpretation can be obtained in medical image data set.

关键词

Keywords

references

GOUR M , JAIN S , SUNIL KUMAR T . Residual learning based CNN for breast cancer histopathological image classification [J ] . International Journal of Imaging Systems and Technology , 2020 , 30 ( 3 ): 621 - 635 .

PONNADA V T , SRINIVASU D S V N . Efficient CNN for lung cancer detection [J ] . International Journal of Recent Technology and Engineering (IJRTE) , 2019 , 8 ( 2 ): 3499 - 3503 .

魏博文 , 全红艳 . 基于语义与形态特征融合的语义分割网络 [J ] . 电子学报 , 2022 , 50 ( 11 ): 2688 - 2697 .

WEI B W , QUAN H Y . Semantic segmentation network based on semantic and morphological feature fusion [J ] . Acta Electronica Sinica , 2022 , 50 ( 11 ): 2688 - 2697 . (in Chinese)

RUSTAM Z , HARTINI S , PRATAMA R Y , et al . Analysis of architecture combining convolutional neural network (CNN) and kernel K-means clustering for lung cancer diagnosis [J ] . International Journal on Advanced Science, Engineering and Information Technology , 2020 , 10 ( 3 ): 1200 - 1206 .

SHI Z H , HAO H , ZHAO M H , et al . A deep CNN based transfer learning method for false positive reduction [J ] . Multimedia Tools and Applications , 2019 , 78 ( 1 ): 1017 - 1033 .

ZHAO L , XU X W , HOU R P , et al . Lung cancer subtype classification using histopathological images based on weakly supervised multi-instance learning [J ] . Physics in Medicine and Biology , 2021 , 66 ( 23 ): 235013 .

Dosovitskiy A , Beyer L , Kolesnikov A , et al . An image is worth 16 x 16 words: Tra nsformers for image recognition at scale[EB/OL ] . ( 2020-10-22 )[ 2022-08-01 ] . https://arxiv.org/abs/2010.11929 https://arxiv.org/abs/2010.11929 .

CHEN J N , LU Y Y , YU Q H , et al . Transunet: Transformers make strong encoders for medical image segmentation [EB/OL ] . ( 202102-08 )[ 2022-08-01 ] . https://arxiv.org/abs/2102.04306 https://arxiv.org/abs/2102.04306 .

ZHANG Y D , LIU H Y , HU Q . TransFuse: fusing transformers and CNNs for medical image segmentation [C ] // International Conference on Medical Image Computing and Computer-Assisted Intervention . Cham : Springer , 2021 : 14 - 24 .

GAO X H , QIAN Y , GAO A . COVID-VIT: Classification of COVID-19 from CT chest images based on vision transformer models [EB/OL ] . ( 2021-07-04 )[ 2022-08-01 ] . https://arxiv.org/abs/2107.01682 https://arxiv.org/abs/2107.01682 .

VALANARASU J M J , OZA P , HACIHALILOGLU I , et al . Medical transformer: Gated axial-attention for medical image segmentation [C ] // International Conference on Medical Image Computing and Computer-Assisted Intervention . Cham : Springer , 2021 : 36 - 46 .

MATSOUKAS C , HASLUM J F , SÖDERBERG M , et al . Is it time to replace CNNs with transformers for medical images? [EB/OL ] . ( 2021-08-24 )[ 2022-08-01 ] . https://arxiv.org/abs/2108.09038 https://arxiv.org/abs/2108.09038 .

PAPANASTASOPOULOS Z , SAMALA R K , CHAN H P , et al . Explainable AI for medical imaging: Deep-learning CNN ensemble for classification of estrogen receptor status from breast MRI [C ] // SPIE Medical Imaging . Proc SPIE 11314 , Medical Imaging 2020: Computer-Aided Diagnosis. Houston: SPIE , 2020 : 228 - 235 .

ZHANG Z Z , XIE Y P , XING F Y , et al . MDNet: A semantically and visually interpretable medical image diagnosis network [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 6428 - 6436 .

LIN C H , LICHTARGE O . Using interpretable deep learning to model cancer dependencies [J ] . Bioinformatics , 2021 , 37 ( 17 ): 2675 - 2681 .

FENG Y J , MIN X , CHEN N , et al . Patient outcome prediction via convolutional neural networks based on multi-granularity medical concept embedding [C ] // 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) . Piscataway : IEEE , 2017 : 770 - 777 .

WANG K , ZHANG X B , ZHANG X H , et al . Multi-granularity scale-aware networks for hard pixels segmentation of pulmonary nodules [J ] . Biomedical Signal Processing and Control , 2021 , 69 : 102890 .

BINDER A , MONTAVON G , LAPUSCHKIN S , et al . Layer-wise relevance propagation for neural networks with local renormalization layers [C ] // Proceedings of the Artificial Neural Networks and Machine Learning . Barcelona : Springer , 2016 : 63 - 71 .

VOITA E , TALBOT D , MOISEEV F , et al . Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned [C ] // Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics . Stroudsburg : Association for Computational Linguistics , 2019 : 5797 - 5808 .

CHEFER H , GUR S , WOLF L . Transformer interpretability beyond attention visualization [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 782 - 791 .

LEE J R , KIM S , PARK I , et al . Relevance-CAM: Your model already knows where to look [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 14939 - 14948 .

ZHOU B L , KHOSLA A , LAPEDRIZA A , et al . Learning deep features for discriminative localization [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 2921 - 2929 .

SELVARAJU R R , COGSWELL M , DAS A , et al . Grad-CAM: Visual explanations from deep networks via gradient-based localization [C ] // 2017 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2017 : 618 - 626 .

CHATTOPADHAY A , SARKAR A , HOWLADER P , et al . Grad-CAM++: Generalized gradient-based visual explanations for deep convolutional networks [C ] // 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) . Piscataway : IEEE , 2018 : 829 - 838 .

WANG H F , WANG Z F , DU M N , et al . Score-CAM: Score-weighted visual explanations for convolutional neural networks [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) . Piscataway : IEEE , 2020 : 24 - 25 .

ZEILER M D , FERGUS R . Visualizing and Understanding Convolutional Networks [M ] // Computer Vision-ECCV 2014 . Cham : Springer International Publishing , 2014 : 818 - 833 .

Petsiuk V , Das A , Saenko K . RISE: Randomized input sampling for explanation of black-box models [EB/OL ] . ( 2018-06-19 )[ 2022-08-01 ] . http://arxiv.org/abs/1806.07421 http://arxiv.org/abs/1806.07421 .

张宇倩 , 李国辉 , 雷军 , 等 . FF-CAM: 基于通道注意机制前后端融合的人群计数 [J ] . 计算机学报 , 2021 , 44 ( 2 ): 304 - 317 .

ZHANG Y Q , LI G H , LEI J , et al . FF-CAM: Crowd counting based on frontend-backend fusion through channel-attention mechanism [J ] . Chinese Journal of Computers , 2021 , 44 ( 2 ): 304 - 317 . (in Chinese)

BAMBA U , PANDEY D , LAKSHMINARAYANAN V . Classification of brain lesions from MRI images using a novel neural network [C ] // Proceedings Volume 11232 , Multimodal Biomedical Imaging XV . San Francisco : SPIE , 2020 : 23 - 31 .

Serrano S , Smith N A . Is attention interpretable? [EB/OL ] . ( 2019-06-09 )[ 2022-08-01 ] . https://arxiv.org/abs/1906.03731 https://arxiv.org/abs/1906.03731 .

VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems . New York : ACM , 2017 : 6000 - 6010 .

REN S Q , CAO X D , WEI Y C , et al . Face alignment at 3000 FPS via regressing local binary features [C ] // 2014 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2014 : 1685 - 1692 .

GRADY L . Random walks for image segmentation [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2006 , 28 ( 11 ): 1768 - 1783 .

RUSSAKOVSKY O , DENG J , SU H , et al . ImageNet large scale visual recognition challenge [J ] . International Journal of Computer Vision , 2015 , 115 ( 3 ): 211 - 252 .

GUILLAUMIN M , KÜTTEL D , FERRARI V . ImageNet auto-annotation with segmentation propagation [J ] . International Journal of Computer Vision , 2014 , 110 ( 3 ): 328 - 348 .

Mohamed H . Chest CT-Scan images Dataset [EB/OL ] . ( 2019-10-10 )[ 2022-08-01 ] . https://www.kaggle.com/datasets/mohamedhanyyy/chest-ctscan-images https://www.kaggle.com/datasets/mohamedhanyyy/chest-ctscan-images .

Simonyan K , Zisserman A . Very deep convolutional networks for large-scale image recognition [EB/OL ] . ( 2014-09-04 )[ 2022-08-01 ] . https://arxiv.org/abs/1409.1556 https://arxiv.org/abs/1409.1556 .

HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 770 - 778 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

融合自注意力机制的多行为图对比学习推荐方法

基于可拓展自注意力时空图卷积神经网络的用户轨迹识别模型

基于自注意力机制神经机器翻译的软件缺陷自动修复方法

一种多层多模态融合3D目标检测方法

M³ Res-Transformer：新冠肺炎胸部X-ray图像识别模型