1.湖南工商大学人工智能与先进计算学院,湖南长沙 410205
2.湘江实验室,湖南长沙 410205
3.湖南工商大学计算机学院,湖南长沙 410205
4.湖南工商大学智能工程与智能制造学院,湖南长沙 410205
5.湖南工商大学长沙人工智能社会实验室,湖南长沙 410205
[ "周新民 男,1977年5月出生于湖南省邵阳市.2010年5月博士毕业于同济大学计算机应用技术专业,2014年6月国防科技大学管理科学与工程博士后出站.现为湖南工商大学人工智能与先进计算学院副院长、教授、硕士生导师.主要研究方向为新型智慧城市、商务智能与大数据. E-mail: zhouxinmin2699@163.com" ]
[ "熊智谋 男,1995年3月出生于河南省信阳市.湖南工商大学计算机学院软件工程专业学术硕士在读.主要研究方向为医学图像处理与分析、计算机视觉. E-mail: Xzhimou@163.com" ]
[ "史长发 男,1985年2月出生于湖南省株洲市.2016年博士毕业于哈尔滨工业大学机械电子工程专业.现为湖南工商大学智能工程与智能制造学院副院长、副教授、硕士生导师.主要研究方向为医学图像处理与分析、深度学习. E-mail: ivanhanks@yeah.net" ]
[ "杨健 男,2000年7月出生于湖南省益阳市.湖南工商大学计算机学院电子信息专业硕士在读.主要研究方向为智慧医疗、计算机视觉.E-mail: 1468554194@qq.com" ]
收稿:2023-11-15,
修回:2024-01-28,
纸质出版:2024-09-25
移动端阅览
周新民, 熊智谋, 史长发, 等. 基于多尺度卷积调制的医学图像分割[J]. 电子学报, 2024, 52(09): 3159-3171.
ZHOU Xin-min, XIONG Zhi-mou, SHI Chang-fa, et al. Medical Image Segmentation Based on Multi‑Scale Convolution Modulation[J]. Acta Electronica Sinica, 2024, 52(09): 3159-3171.
周新民, 熊智谋, 史长发, 等. 基于多尺度卷积调制的医学图像分割[J]. 电子学报, 2024, 52(09): 3159-3171. DOI:10.12263/DZXB.20231068
ZHOU Xin-min, XIONG Zhi-mou, SHI Chang-fa, et al. Medical Image Segmentation Based on Multi‑Scale Convolution Modulation[J]. Acta Electronica Sinica, 2024, 52(09): 3159-3171. DOI:10.12263/DZXB.20231068
当前,越来越多的医学图像分割模型都采用Transformer模型作为基础结构,然而,Transformer模型的计算复杂度与输入序列呈二次关系且需要大量的数据进行预训练才能取得较好的结果,在数据量不足的情况下无法发挥优势;此外,Transformer往往无法有效提取图像的局部信息.相比于Transformer,卷积神经网络则能够很好地规避上述两个问题.为了充分发挥卷积神经网络与Transformer的各自优势并进一步挖掘卷积神经网络的潜力,本文提出一个多尺度卷积调制网络模型(Multi-Scale Convolution Modulation Network,MSCMNet),该模型将视觉Transformer领域模型结构设计方法融入传统卷积网络.采用卷积调制和多尺度特征提取策略,构建基于多尺度卷积调制机制的特征提取模块(Multi-Scale Convolution Modulation,MSCM).并提出高效的patch组合与patch分解策略分别用于特征图的下采样以及上采样,进一步提升模型的表征能力.在腹部多器官、心脏、皮肤癌以及细胞核四个不同类型以及不同规模的医学图像分割数据集上取得的mDice分别为0.805 7、0.923 3、0.923 9、0.854 8,以较低的运算量和参数量取得了最好的分割性能,为卷积神经网络以及Transformer在医学图像分割领域提供了一个新颖而高效的模型结构设计范式.
Currently
more and more medical image segmentation models are using Transformer as their basic structure. However
the computational complexity of the Transformer model is quadratic with respect to the input sequence
and it requires a large amount of data for pre-training in order to achieve good results. In situations where there is insufficient data
the Transformer's advantages cannot be fully realized. Additionally
the Transformer often fails to effectively extract local information from images. In contrast
convolutional neural networks can effectively avoid these two problems. In order to fully leverage the strengths of both convolutional neural networks and Transformers and further explore the potential of convolutional neural networks
this paper proposes a multi-scale convolution modulation network (MSCMNet) model. This model incorporates the design methodology of visual Transformer models into traditional convolutional networks. By using convolution modulation and multi-scale feature extraction strategies
a feature extraction module based on multi-scale convolution modulation (MSCM) is constructed. Efficient patch combination and patch decomposition strategies are also proposed for downsampling and upsampling of feature maps
respectively
further enhancing the model's representation ability. The mDice scores obtained on four different types and sizes of medical image segmentation datasets - multiple organs in the abdomen
heart
skin cancer
and nucleus - are 0.805 7
0.923 3
0.923 9 and 0.854 8
respectively. With lower computational complexity and parameter count
MSCMNet achieves the best segmentation performance
providing a novel and efficient model structure design paradigm for convolutional neural networks and Transformers in the field of medical image segmentation.
郑光远 , 刘峡壁 , 韩光辉 . 医学影像计算机辅助检测与诊断系统综述 [J ] . 软件学报 , 2018 , 29 ( 5 ): 1471 - 1514 .
ZHENG G Y , LIU X B , HAN G H . Survey on medical image computer aided detection and diagnosis systems [J ] . Journal of Software , 2018 , 29 ( 5 ): 1471 - 1514 . (in Chinese)
LE C Y , BOSER B , DENKER J S , et al . Handwritten digit recognition with a back-propagation network [C ] // Advances in Neural Information Processing Systems 2 . San Francisco : Morgan Kaufmann Publishers Inc , 1990 : 396 - 404 .
RONNEBERGER O , FISCHER P , BROX T . U-Net: Convolutional networks for biomedical image segmentation [C ] // Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015 . Cham : Springer , 2015 : 234 - 241 .
殷晓航 , 王永才 , 李德英 . 基于U-Net结构改进的医学影像分割技术综述 [J ] . 软件学报 , 2021 , 32 ( 2 ): 519 - 550 .
YIN X H , WANG Y C , LI D Y . Suvery of medical image segmentation technology based on U-net structure improvement [J ] . Journal of Software , 2021 , 32 ( 2 ): 519 - 550 . (in Chinese)
周涛 , 霍兵强 , 陆惠玲 , 等 . 残差神经网络及其在医学图像处理中的应用研究 [J ] . 电子学报 , 2020 , 48 ( 7 ): 1436 - 1447 .
ZHOU T , HUO B Q , LU H L , et al . Research on residual neural network and its application on medical image processing [J ] . Acta Electronica Sinica , 2020 , 48 ( 7 ): 1436 - 1447 . (in Chinese)
刘金平 , 吴娟娟 , 张荣 , 等 . 基于结构重参数化与多尺度深度监督的COVID-19胸部CT图像自动分割 [J ] . 电子学报 , 2023 , 51 ( 5 ): 1163 - 1171 .
LIU J P , WU J J , ZHANG R , et al . Toward automated segmentation of COVID-19 chest CT images based on structural reparameterization and multi-scale deep supervision [J ] . Acta Electronica Sinica , 2023 , 51 ( 5 ): 1163 - 1171 . (in Chinese)
ZHOU Z W , RAHMAN SIDDIQUEE M M , TAJBAKHSH N , et al . UNet++: A nested U-Net architecture for medical image segmentation [C ] // Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support . Cham : Springer International Publishing , 2018 : 3 - 11 .
OKTAY O , SCHLEMPER J , FOLGOC L L , et al . Attention U-Net: Learning where to look for the pancreas [EB/OL ] . ( 2018-05-20 )[ 2022-11-25 ] . http://arxiv.org/abs/1804.03999 http://arxiv.org/abs/1804.03999 .
张淑军 , 彭中 , 李辉 . SAU-Net:基于U-Net和自注意力机制的医学图像分割方法 [J ] . 电子学报 , 2022 , 50 ( 10 ): 2433 - 2442 .
ZHANG S J , PENG Z , LI H . SAU-net: Medical image segmentation method based on U-net and self-attention [J ] . Acta Electronica Sinica , 2022 , 50 ( 10 ): 2433 - 2442 . (in Chinese)
ÇIÇEK Ö , ABDULKADIR A , LIENKAMP S S , et al . 3D U-Net: Learning dense volumetric segmentation from sparse annotation [C ] // Medical Image Computing and Computer-Assisted Intervention-MICCAI 2016 . Cham : Springer , 2016 : 424 - 432 .
MILLETARÌ F , NAVAB N , AHMADI S A . V-Net: Fully convolutional neural networks for volumetric medical image segmentation [C ] // 2016 Fourth International Conference on 3D Vision (3DV) . Stanford : IEEE , 2016 : 565 - 571 .
VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems . Red Hook : Curran Associates Inc , 2017 : 6000 - 6010 .
DOSOVITSKIY A , BEYER L , KOLESNIKOV A , et al . An image is worth 16 x 16 words: transformers for image recognition at scale[EB/OL ] . ( 2021006-03 )[ 2022-09-15 ] . http://arxiv.org/abs/2010.11929 http://arxiv.org/abs/2010.11929 .
CARION N , MASSA F , SYNNAEVE G , et al . End-to-end object detection with transformers [C ] // Computer Vision-ECCV 2020 . Cham : Springer International Publishing , 2020 : 213 - 229 .
CHEN J , LU Y , YU Q , et al . TransUNet: Transformers make strong encoders for medical image segmentation [EB/OL ] . ( 2021-02-08 )[ 2022-09-18 ] . http://arxiv.org/abs/2102.04306 http://arxiv.org/abs/2102.04306 .
ZHANG Y D , LIU H Y , HU Q . TransFuse: Fusing transformers and CNNs for medical image segmentation [C ] // Medical Image Computing and Computer Assisted Intervention-MICCAI 2021 . Cham : Springer , 2021 : 14 - 24 .
GAO Y H , ZHOU M , METAXAS D N . UTNet: A hybrid transformer architecture for medical image segmentation [C ] // Medical Image Computing and Computer Assisted Intervention-MICCAI 2021 . Cham : Springer , 2021 : 61 - 71 .
CAO H , WANG Y Y , CHEN J , et al . Swin-Unet: Unet-like pure transformer for medical image segmentation [EB/OL ] . ( 2021-05-12 )[ 2022-09-18 ] . http://arxiv.org/abs/2105.05537 http://arxiv.org/abs/2105.05537 .
LIU Z , LIN Y T , CAO Y , et al . Swin Transformer: Hierarchical vision transformer using shifted windows [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2021 : 9992 - 10002 .
Huang X H , Deng Z F , Li D D , et al . MISSFormer: An effective medical image segmentation transformer [EB/OL ] . ( 2021-09-15 )[ 2022-09-18 ] . http://arxiv.org/abs/2109.07162 http://arxiv.org/abs/2109.07162 .
GUO M H , LU C Z , LIU Z N , et al . Visual attention network [EB/OL ] . ( 2022-07-11 )[ 2022-11-13 ] . https://arxiv.org/abs/2202.09741v5 https://arxiv.org/abs/2202.09741v5 .
GUO M H , LU C Z , HOU Q , et al . SegNeXt: Rethinking convolutional attention design for semantic segmentation [EB/OL ] . ( 2022-09-18 )[ 2022-11-10 ] . https://arxiv.org/abs/2209.08575 https://arxiv.org/abs/2209.08575 .
LIU Z , MAO H Z , WU C Y , et al . A ConvNet for the 2020s [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 11966 - 11976 .
HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 770 - 778 ..
CHU X X , TIAN Z , ZHANG B , et al . Conditional positional encodings for vision transformers [EB/OL ] . ( 2023-02-13 )[ 2023-07-13 ] . http://arxiv.org/abs/2102.10882 http://arxiv.org/abs/2102.10882 .
SRIVASTAVA N , HINTON G , KRIZHEVSKY A , et al . Dropout: A simple way to prevent neural networks from overfitting [J ] . Journal of Machine Learning Research , 2014 , 15 ( 1 ): 1929 - 1958 .
Ba J L , Kiros J R , Hinton G E . Layer Normalization [EB/OL ] . ( 2016-07-21 )[ 2023-09-07 ] . http://arxiv.org/abs/1607.06450 http://arxiv.org/abs/1607.06450 .
IOFFE S , SZEGEDY C . Batch normalization: Accelerating deep network training by reducing internal covariate shift [C ] // Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 . Lille, France : JMLR.org , 2015 : 448 - 456 .
JI Y F , BAI H T , YANG J , et al . AMOS: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation [EB/OL ] . ( 2022-16-16 )[ 2023-05-11 ] . http://arxiv.org/abs/2206.08023 http://arxiv.org/abs/2206.08023 .
BERNARD O , LALANDE A , ZOTTI C , et al . Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: Is the problem solved? [J ] . IEEE Transactions on Medical Imaging , 2018 , 37 ( 11 ): 2514 - 2525 .
CODELLA N C F , GUTMAN D , CELEBI M E , et al . Skin lesion analysis toward melanoma detection: A challenge at the 2017 International symposium on biomedical imaging (ISBI), hosted by the international skin imaging collaboration (ISIC) [C ] // 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018) . Piscataway : IEEE , 2018 : 168 - 172 .
TSCHANDL P , ROSENDAHL C , KITTLER H . The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions [J ] . Scientific Data , 2018 , 5 : 180161 .
KUMAR N , VERMA R , ANAND D , et al . A multi-organ nucleus segmentation challenge [J ] . IEEE Transactions on Medical Imaging , 2020 , 39 ( 5 ): 1380 - 1391 .
KUMAR N , VERMA R , SHARMA S , et al . A dataset and a technique for generalized nuclear segmentation for computational pathology [J ] . IEEE Transactions on Medical Imaging , 2017 , 36 ( 7 ): 1550 - 1560 .
0
浏览量
16
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621