基于交叉视觉状态空间与多分支交互注意力的医学图像分割

薛伟; 陈创慧; 杜明洋; 钟平; 郑啸

doi:10.12263/DZXB.20250642

您当前的位置：

首页 >

文章列表页 >

基于交叉视觉状态空间与多分支交互注意力的医学图像分割

学术论文 | 更新时间：2025-12-27

- 基于交叉视觉状态空间与多分支交互注意力的医学图像分割
- A Medical Image Segmentation Network Based on Cross-Visual State Space and Multi-Branch Interactive Attention
- 电子学报 2025年53卷第9期页码：3331-3344
- 作者机构：
  
  1.安徽工业大学计算机科学与技术学院，安徽马鞍山 243032
  2.国防科技大学电子对抗学院，安徽合肥 230037
  3.国防科技大学电子科学学院，湖南长沙 410073
- 作者简介：
  
  [ "薛伟男，1986年11月出生于江苏省南通市.现为安徽工业大学计算机科学与技术学院副院长、副教授、博士生导师.主要研究方向为机器学习、计算机视觉、数据挖掘.中国电子学会会员编号：E190188441M. E-mail: xuewei@ahut.edu.cn" ]
  [ "陈创慧女，2000年9月出生于广东省茂名市.现为安徽工业大学计算机科学与技术学院硕士研究生.主要研究方向为医学图像分割. E-mail: chenchuanghui16@foxmail.com" ]
  [ "杜明洋男，1994年7月出生于安徽省蚌埠市.现为国防科技大学电子对抗学院讲师.主要研究方向为雷达智能感知与对抗.中国电子学会会员编号：E190087642M. E-mail: dumingyang17@nudt.edu.cn" ]
  [ "钟平男，1979年6月出生于四川省内江市.现为国防科技大学电子科学学院研究员、博士生导师.主要研究方向为计算机视觉、机器学习、模式识别. E-mail: zhongping@nudt.edu.cn" ]
  [ "郑啸男，1975年11月出生于福建省莆田市.现为安徽工业大学副校长、教授、博士生导师.主要研究方向为工业互联网、群智感知网络、数据隐私保护. E-mail: xzheng@ahut.edu.cn" ]
- 基金信息：
  
  国家自然科学基金(62441207);马鞍山市科技创新攻坚计划项目(2024RGZN001)
- DOI：10.12263/DZXB.20250642
  中图分类号： TP391.4;
- 收稿：2025-07-23，
  
  录用：2025-09-22，
  
  纸质出版：2025-09-25
- 稿件说明：
移动端阅览
薛伟, 陈创慧, 杜明洋, 等. 基于交叉视觉状态空间与多分支交互注意力的医学图像分割[J]. 电子学报, 2025, 53(09): 3331-3344.

XUE Wei, CHEN Chuang-hui, DU Ming-yang, et al. A Medical Image Segmentation Network Based on Cross-Visual State Space and Multi-Branch Interactive Attention[J]. Acta Electronica Sinica, 2025, 53(09): 3331-3344.
薛伟, 陈创慧, 杜明洋, 等. 基于交叉视觉状态空间与多分支交互注意力的医学图像分割[J]. 电子学报, 2025, 53(09): 3331-3344. DOI：10.12263/DZXB.20250642

XUE Wei, CHEN Chuang-hui, DU Ming-yang, et al. A Medical Image Segmentation Network Based on Cross-Visual State Space and Multi-Branch Interactive Attention[J]. Acta Electronica Sinica, 2025, 53(09): 3331-3344. DOI：10.12263/DZXB.20250642

摘要

医学图像分割是智慧医疗领域的关键技术，旨在精准识别并分割影像中的器官或病变区域，为临床诊断与治疗决策提供可靠的量化依据.近年来，基于卷积神经网络（Convolutional Neural Network，CNN）的医学图像分割方法因其优异的局部特征提取能力得到广泛应用.然而，受限于卷积操作固有的局部感受野，CNN在建模长距离空间依赖和全局上下文信息方面仍存在不足.尽管基于Transformer的方法通过自注意力机制实现了对全局特征的建模，但计算复杂度随序列长度的平方增长，制约了其实际应用效率.针对上述问题，本文提出一种新的医学图像分割网络，该网络包含交叉视觉状态空间（Cross-Vision State Space，C-VSS）和多分支交互注意力（Multi-Branch Interactive Attention，MBIA）两个核心模块.C-VSS模块融合卷积操作的局部感知优势与状态空间的长序列建模能力，通过双分支协作策略，在保持线性计算复杂度的同时，实现对局部和全局特征的有效提取与融合.MBIA模块则通过多分支架构增强多尺度上下文信息的表征能力，并在编码器与解码器之间建立双向信息交互通道，实现跨层特征的动态融合，从而提升模型对复杂结构的感知能力.为验证所提方法的有效性，在CVC-ColonDB、ISIC2017、ISIC2018和COVID-19这4个公开医学图像分割数据集上开展试验.结果表明：与次优方法相比，本文方法在交并比（Intersection over Union，IoU）指标上分别提升了约0.94、0.83、1.04和2.28个百分点，在Dice相似系数（Dice Similarity Coefficient，DSC）指标上分别提升了约0.63、0.50、1.56和1.51个百分点.此外，平均数（Average，Avg）指标在4个数据集上分别达到91.51%、91.74%、91.30%和88.78%，均优于所有对比方法，展现出最优性能，充分验证了所提方法在分割性能上的优越性.进一步开展消融实验以验证核心模块的作用，实验表明：单独移除C-VSS模块后，IoU指标分别下降3.62、2.15、1.69和2.13个百分点，DSC指标分别下降2.25、1.29、1.02和1.40个百分点；单独移除MBIA模块后，IoU指标分别下降10.11、0.50、1.08和1.97个百分点，DSC指标分别下降6.54、0.30、0.65和1.30个百分点.实验结果充分证明C-VSS与MBIA模块的有效性，且MBIA模块对性能提升的贡献更为显著，二者协同作用可进一步优化模型性能.

Abstract

Medical image segmentation is a key technology in the field of smart healthcare

aiming to accurately identify and segment organs or pathological regions within images

thereby providing reliable quantitative evidence for clinical diagnosis and treatment decision-making. In recent years

medical image segmentation methods based on convolutional neural network (CNN) have been widely adopted due to their excellent capability in extracting local features. However

due to the inherent local receptive field of convolution operations

CNN still suffers from limitations in modeling long-range spatial dependencies and global contextual information. Although Transformer-based methods achieve global feature modeling through the self-attention mechanism

their computational complexity grows quadratically with sequence length

limiting their efficiency in practical applications. To mitigate the aforementioned issues

this paper proposes a new medical image segmentation network

which mainly consist of two core modules: cross-vision state space (C-VSS) and multi-branch interactive attention (MBIA). The C-VSS module integrates the local perception advantage of convolutional operation with the long-sequence modeling capability of state space model. Through a dual-branch collaborative strategy

it achieves effective extraction and fusion of local and global features while maintaining linear computational complexity. The MBIA module enhances the representation of multi-scale contextual information through a multi-branch architecture and establishes bidirectional information interaction pathways between the encoder and the decoder to enable dynamic fusion of cross-level features

thereby improving the model’s ability to perceive complex structures. Experimental results on four public medical image segmentation datasets

including CVC-ColonDB

ISIC2017

ISIC2018

and COVID-19

demonstrate that our method outperforms the second-best approach by approximately 0.94

0.83

1.04

and 2.28 percentage points in intersection over union (IoU) and 0.63

0.50

1.56

and 1.51 percentage points in dice similarity coefficient (DSC)

respectively. In addition

the proposed method achieves average (Avg) scores of 91.51%

91.74%

91.30%

and 88.78% on the four datasets

respectively

all of which are higher than those of the comparative methods

demonstrating its superior segmentation performance. Furthermore

ablation studies show that removing the C-VSS module alone leads to a decrease of 3.62

2.15

1.69

and 2.13 percentage points in IoU

and 2.25

1.29

1.02

and 1.40 percentage points in DSC

respectively. Removing the MBIA module alone results in a decline of 10.11

0.50

1.08

and 1.97 percentage points in IoU

and 6.54

0.30

0.65

and 1.30 percentage points in DSC

respectively. The experimental results fully verify the effectiveness of the C-VSS and MBIA modules

indicate that the MBIA module contributes more significantly to performance improvement

and reveal a notable synergy between the two.

关键词

Keywords

references

XIAO H G , LI L , LIU Q Y , et al . Transformers in medical image segmentation: A review [J ] . Biomedical Signal Processing and Control , 2023 , 84 : 104791 .

吴玉超 , 林岚 , 王婧璇 , 等 . 基于卷积神经网络的语义分割在医学图像中的应用 [J ] . 生物医学工程学杂志 , 2020 , 37 ( 3 ): 533 - 540 .

WU Y C , LIN L , WANG J X , et al . Application of semantic segmentation based on convolutional neural network in medical images [J ] . Journal of Biomedical Engineering , 2020 , 37 ( 3 ): 533 - 540 . (in Chinese)

MEIBURGER K M , ACHARYA U R , MOLINARI F . Automated localization and segmentation techniques for B-mode ultrasound images: A review [J ] . Computers in Biology and Medicine , 2018 , 92 : 210 - 235 .

VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [EB/OL ] . ( 2023-08-02 )[ 2025-07-21 ] . https://arxiv.org/abs/1706.03762 https://arxiv.org/abs/1706.03762 .

贾熹滨 , 郭雄 , 王珞 , 等 . 一种迭代边界优化的医学图像小样本分割网络 [J ] . 自动化学报 , 2024 , 50 ( 10 ): 1988 - 2001 .

JIA X B , GUO X , WANG L , et al . A small sample segmentation network for medical images based on iterative boundary optimization [J ] . Acta Automatica Sinica , 2024 , 50 ( 10 ): 1988 - 2001 . (in Chinese)

刘金平 , 吴娟娟 , 张荣 , 等 . 基于结构重参数化与多尺度深度监督的COVID-19胸部CT图像自动分割 [J ] . 电子学报 , 2023 , 51 ( 5 ): 1163 - 1171 .

LIU J P , WU J J , ZHANG R , et al . Toward automated segmentation of COVID-19 chest CT images based on structural reparameterization and multi-scale deep supervision [J ] . Acta Electronica Sinica , 2023 , 51 ( 5 ): 1163 - 1171 . (in Chinese)

NAM J H , SYAZWANY N S , KIM S J , et al . Modality-agnostic domain generalizable medical image segmentation by multi-frequency in multi-scale attention [C ] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2024 : 11480 - 11491 .

RAYED M E , SAJIBUL ISLAM S M , NIHA S I , et al . Deep learning for medical image segmentation: State-of-the-art advancements and challenges [J ] . Informatics in Medicine Unlocked , 2024 , 47 : 101504 .

WANG H N , CAO P , WANG J Q , et al . UCTransNet: Rethinking the skip connections in U-Net from a channel-wise perspective with transformer [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2022 , 36 ( 3 ): 2441 - 2449 .

周新民 , 熊智谋 , 史长发 , 等 . 基于多尺度卷积调制的医学图像分割 [J ] . 电子学报 , 2024 , 52 ( 9 ): 3159 - 3171 .

ZHOU X M , XIONG Z M , SHI C F , et al . Medical image segmentation based on multi-scale convolution modulation [J ] . Acta Electronica Sinica , 2024 , 52 ( 9 ): 3159 - 3171 . (in Chinese)

XUE W , CHEN C H , QI X , et al . M2ANet: Multi-branch and multi-scale attention network for medical image segmentation [J ] . Chinese Physics B , 2025 , 34 ( 8 ): 080703 .

雷涛 , 张峻铭 , 杜晓刚 , 等 . 基于混洗特征编码与门控解码的医学图像分割网络 [J ] . 电子学报 , 2024 , 52 ( 12 ): 4142 - 4152 .

LEI T , ZHANG J M , DU X G , et al . Medical image segmentation network based on shuffled feature encoding and gated decoding [J ] . Acta Electronica Sinica , 2024 , 52 ( 12 ): 4142 - 4152 . (in Chinese)

RONNEBERGER O , FISCHER P , BROX T . U-Net: Convolutional networks for biomedical image segmentation [C ] // Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015 . Cham : Springer , 2015 : 234 - 241 .

宋艳涛 , 路云里 . SwinT-Unet: 基于双通道自注意力机制的超声图像分割方法 [J ] . 电子学报 , 2024 , 52 ( 11 ): 3835 - 3846 .

SONG Y T , LU Y L . SwinT-unet: Ultrasound image segmentation based on two-channel self-attention mechanism [J ] . Acta Electronica Sinica , 2024 , 52 ( 11 ): 3835 - 3846 . (in Chinese)

GU A , GOEL K , RÉ C . Efficiently modeling long sequences with structured state spaces [EB/OL ] . ( 2022-08-05 )[ 2025-07-21 ] . https://arXiv.org/abs/2111.00396 https://arXiv.org/abs/2111.00396 .

ZHU Q F , FANG Y , CAI Y Z , et al . Rethinking scanning strategies with vision mamba in semantic segmentation of remote sensing imagery: An experimental study [J ] . IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , 2024 , 17 : 18223 - 18234 .

GU A , DAO T . Mamba: Linear-time sequence modeling with selective state spaces [EB/OL ] . ( 2023-12-01 )[ 2025-07-21 ] . https://arXiv.org/abs/2312.00752 https://arXiv.org/abs/2312.00752 .

ZHU L H , LIAO B C , ZHANG Q , et al . Vision mamba: Efficient visual representation learning with bidirectional state space model [EB/OL ] . ( 2024-11-14 )[ 2025-08-30 ] . https://arXiv.org/abs/2401.09417 https://arXiv.org/abs/2401.09417 .

LIU Y , TIAN Y J , ZHAO Y Z , et al . VMamba: Visual state space model [EB/OL ] . ( 2024-12-29 )[ 2025-07-21 ] . https://arXiv.org/abs/2401.10166 https://arXiv.org/abs/2401.10166 .

LI G J , HUANG Q H , WANG W , et al . Selective and multi-scale fusion Mamba for medical image segmentation [J ] . Expert Systems with Applications , 2025 , 261 : 125518.

IBTEHAZ N , KIHARA D . ACC-UNet: A completely convolutional UNet model forthe2020s [C ] // Medical Image Computing and Computer Assisted Intervention-MICCAI 2023 . Cham : Springer , 2023 : 692 - 702 .

WANG Z H , MIN X K , SHI F Y , et al . SMESwin Unet: Merging CNN and transformer for medical image segmentation [C ] // Medical Image Computing and Computer Assisted Intervention-MICCAI 2022 . Cham : Springer , 2022 : 517 - 526 .

WANG Z Y , ZHENG J Q , ZHANG Y C , et al . Mamba-UNet: UNet-like pure visual mamba for medical image segmentation [EB/OL ] . ( 2024-03-30 )[ 2025-08-30 ] . https://arXiv.org/abs/2402.05079 https://arXiv.org/abs/2402.05079 .

YAO W J , BAI J J , LIAO W , et al . From CNN to transformer: A review of medical image segmentation models [J ] . Journal of Imaging Informatics in Medicine , 2024 , 37 ( 4 ): 1529 - 1547 .

ZHAO X Q , JIA H P , PANG Y W , et al . M 2 SNet: Multi-scale in multi-scale subtraction network for medical image segmentation [EB/OL ] . ( 2025-09-30 )[ 2025-07-21 ] . https://arXiv.org/abs/2303.10894 https://arXiv.org/abs/2303.10894 .

RAHMAN M M , MUNIR M , MARCULESCU R . EMCAD: Efficient multi-scale convolutional attention decoding for medical image segmentation [C ] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2024 : 11769-11779.

LIU Z , LIN Y T , CAO Y , et al . Swin transformer: Hierarchical vision transformer using shifted windows [C ] // 2021 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2021 : 9992 - 10002 .

CAO H , WANG Y Y , CHEN J , et al . Swin-unet: Unet-like pure transformer for medical image segmentation [C ] // Computer Vision - ECCV 2022 Workshops . Cham : Springer , 2023 : 205 - 218 .

RAHMAN M M , MARCULESCU R . Medical image segmentation via cascaded attention decoding [C ] // 2023 IEEE/CVF Winter Conference on Applications of Computer Vision . Piscataway : IEEE , 2023 : 6211 - 6220 .

ZHANG Y H , BALESTRA G , ZHANG K , et al . MultiTrans: Multi-branch transformer network for medical image segmentation [J ] . Computer Methods and Programs in Biomedicine , 2024 , 254 : 108280.

ZHONG J H , TIAN W H , XIE Y L , et al . PMFSNet: Polarized multi-scale feature self-attention network for lightweight medical image segmentation [J ] . Computer Methods and Programs in Biomedicine , 2025 , 261 : 108611.

MA J , LI F F , WANG B . U-mamba: Enhancing long-range dependency for biomedical image segmentation [EB/OL ] . ( 2024-01-09 )[ 2025-08-30 ] . https://arXiv.org/abs/2401.04722 https://arXiv.org/abs/2401.04722 .

RUAN J C , LI J C , XIANG S C . VM-UNet: Vision mamba UNet for medical image segmentation [EB/OL ] . ( 2024-11-08 )[ 2025-07-21 ] . https://arXiv.org/abs/2402.02491 https://arXiv.org/abs/2402.02491 .

ZHANG M Y , YU Y , JIN S , et al . VM-UNET-V2: Rethinking vision mamba UNet for medical image segmentation [C ] // Bioinformatics Research and Applications . Singapore : Springer , 2024 : 335 - 346 .

LIU J R , YANG H , ZHOU H Y , et al . Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining [C ] // Medical Image Computing and Computer Assisted Intervention-MICCAI 2024 . New York : ACM , 2024 : 615 - 625 .

WU R K , LIU Y H , LIANG P C , et al . H-vmunet: High-order Vision Mamba UNet for medical image segmentation [J ] . Neurocomputing , 2025 , 624 : 129447.

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于多尺度卷积调制的医学图像分割

基于时空注意力Transformer的自动驾驶运动规划方法

基于可疑像素相互修正的半监督医学图像分割