

浏览全部资源
扫码关注微信
中国海洋大学海洋动力-物理环境与智能感知全国重点实验室,中国海洋大学信息科学与工程学部,山东青岛266100
Received:18 December 2025,
Accepted:16 January 2026,
Published:25 January 2026
移动端阅览
尚毅涵, 董兴辉. DGD-SAM:一种用于水下图像实例分割的动态引导SAM[J]. 电子学报, 2026, 54(01): 368-380.
SHANG Yihan, DONG Xinghui. DGD-SAM: A Dynamically-Guided SAM for Underwater Image Instance Segmentation[J]. Acta Electronica Sinica, 2026, 54(01): 368-380.
尚毅涵, 董兴辉. DGD-SAM:一种用于水下图像实例分割的动态引导SAM[J]. 电子学报, 2026, 54(01): 368-380. DOI:10.12263/DZXB.20251002
SHANG Yihan, DONG Xinghui. DGD-SAM: A Dynamically-Guided SAM for Underwater Image Instance Segmentation[J]. Acta Electronica Sinica, 2026, 54(01): 368-380. DOI:10.12263/DZXB.20251002
随着深海探测与海洋资源开发需求的日益增长,水下视觉技术已成为机器人作业、海洋生物监测等领域的关键支撑。在众多的视觉任务中,水下图像实例分割因需同时实现目标的精确定位与像素级掩码预测而具有极高的挑战性。近年来,视觉基础模型,特别是Segment Anything Model(SAM),在通用场景下展现出卓越的零样本泛化能力,但在复杂的水下环境中,其表现仍不尽如人意。水下环境光线吸收、散射严重,导致图像伴随明显的色彩失真、对比度极低以及边缘模糊等退化现象,严重干扰了模型的特征提取。此外,SAM的分割性能高度依赖人工提供的显式提示信息(例如点、框和掩码),这种依赖不仅增加了人工成本,更限制了其在无人值守或复杂水下环境中的适用性。为了解决上述问题,本文提出了一种动态引导SAM(Dynamically Guided SAM,DGD-SAM)。DGD-SAM通过引入动态引导机制,结合特征聚合与多尺度增强模块,构建了完整的自动提示生成与精细化分割流程。首先,针对检测与分割任务特征分布不一致的问题,本文设计了自适应特征聚合模块。该模块通过引入通道注意力机制对特征依赖关系进行重新建模,在空间与通道维度上实现任务对齐,有效增强了模型对水下弱目标区域的感知灵敏度。其次,考虑到水下目标尺寸多变且背景干扰复杂的特性,构建了多尺度特征增强模块。该模块通过构建跨空间分辨率的特征金字塔,显著提升了模型在复杂场景下对各种尺度目标的捕捉能力。最后,在解码阶段,本文提出了动态引导解码器,先融合初始分割掩码与图像特征以生成动态引导信息,再通过提示与图像特征间的双向注意力交互实现精细掩码预测。实验结果显示,DGD-SAM在四个公开水下数据集LIACI、USIS10K、UIIS和UIIS10K以及两个陆地场景数据集COME15K-E和COME15K-H上均优于当前的先进方法,这表明本文方法不仅在水下场景中表现出色,在陆地场景中同样能够获得稳定且具有竞争力的分割性能,说明模型未过度依赖特定场景特征,具备良好的泛化能力和可扩展性。
With the growing demand for deep-sea exploration and marine resource exploitation
underwater vision technologies have become a critical enabler for applications
such as robotic operations and marine biological monitoring. Among various vision tasks
underwater image instance segmentation (UIIS) is particularly challenging
as it requires both precise object localization and pixel-level mask generation. In recent years
vision foundation models
in particular
the segment anything model (SAM)
have demonstrated remarkable zero-shot generalization capabilities in generic scenes. However
their performance remains unsatisfactory in complex underwater environments. Severe light absorption and scattering in underwater environments lead to significant image degradation
including color distortion
extremely low contrast
and blurred boundaries
which substantially hinder effective feature extraction. Moreover
the segmentation performance of SAM heavily relies on manually provided explicit prompts (e.g.
points
boxes
and masks). This dependency not only increases annotation costs but also limits its applicability in unattended or complex underwater scenarios. To address these challenges
we propose a dynamically-guided SAM (DGD-SAM). By introducing a dynamically-guided mechanism and integrating feature aggregation with a multi-scale feature enhancement module
DGD-SAM establishes a complete pipeline for automatic prompt generation and refined segmentation. First
to mitigate the feature distribution discrepancy between detection and segmentation tasks
an adaptive feature aggregator (AFA) is designed. This module re-models inter-channel dependencies through a channel attention mechanism
achieving task alignment across both spatial and channel dimensions and effectively enhancing the model’s sensitivity to weak underwater targets. Second
considering the large variation in underwater target scales and the complexity of background interference
a multi-scale feature enhancement module is constructed. By building a cross-resolution feature pyramid
this module significantly improves the model’s ability to capture targets of various scales in complex scenes. During the decoding stage
a dynamically-guided decoder (DGD) is proposed
which first integrates the initial segmentation mask with image features to generate dynamic guidance information
and then performs refined mask prediction through bidirectional attention interactions between the prompts and image features. Experimental results demonstrate that DGD-SAM consistently outperforms state-of-the-art methods on four public underwater data sets
including LIACI
USIS10K
UIIS
and UIIS10K
as well as two terrestrial scene data sets
i.e.
COME15K-E and COME15K-H. These results indicate that the proposed method not only achieves superior performance in underwater environments but also maintains stable and competitive segmentation performance in terrestrial scenes
suggesting that the model does not overly rely on scene-specific characteristics and exhibits strong generalizability and scalability.
Zhou Dingfu , Fang Jin , Song Xibin , et al . Joint 3D instance segmentation and object detection for autonomous driving [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 1836 - 1846 . DOI: 10.1109/cvpr42600.2020.00191 http://dx.doi.org/10.1109/cvpr42600.2020.00191
Zhou Sihang , Nie D , Adeli E , et al . Semantic instance segmentation with discriminative deep supervision for medical images [J ] . Medical Image Analysis , 2022 , 82 : 102626 . DOI: 10.1016/j.media.2022.102626 http://dx.doi.org/10.1016/j.media.2022.102626
Chen Keyan , Liu Chenyang , Chen Hao , et al . RSPrompter: Learning to prompt for remote sensing instance segmentation based on visual foundation model [J ] . IEEE Transactions on Geoscience and Remote Sensing , 2024 , 62 : 4701117 . DOI: 10.1109/tgrs.2024.3356074 http://dx.doi.org/10.1109/tgrs.2024.3356074
Su Hao , Wei Shunjun , Liu Shan , et al . HQ-ISNet: High-quality instance segmentation for remote sensing imagery [J ] . Remote Sensing , 2020 , 12 ( 6 ): 989 . DOI: 10.3390/rs12060989 http://dx.doi.org/10.3390/rs12060989
牛玉贞 , 张凌昕 , 兰杰 , 等 . 基于分频式生成对抗网络的非成对水下图像增强 [J ] . 电子学报 , 2025 , 53 ( 2 ): 527 - 544 .
Niu Yuzhen , Zhang Lingxin , Lan Jie , et al . FD-GAN: Frequency-decomposed generative adversarial network for unpaired underwater image enhancement [J ] . Acta Electronica Sinica , 2025 , 53 ( 2 ): 527 - 544 . (in Chinese)
Lian Shijie , Li Hua , Cong Runmin , et al . WaterMask: Instance segmentation for underwater imagery [C ] // 2023 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2023 : 1305 - 1315 . DOI: 10.1109/iccv51070.2023.00126 http://dx.doi.org/10.1109/iccv51070.2023.00126
Lian Shijie , Zhang Ziyi , Li Hua , et al . Diving into underwater: Segment anything model guided underwater salient instance segmentation and a large-scale dataset [PP/OL ] . V1. arXiv ( 2024-06-10 )[ 2025-12-19 ] . https://arxiv.org/abs/2406.06039 https://arxiv.org/abs/2406.06039 .
Kirillov A , Mintun E , Ravi N , et al . Segment anything [C ] // 2023 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2023 : 3992 - 4003 . DOI: 10.1109/iccv51070.2023.00371 http://dx.doi.org/10.1109/iccv51070.2023.00371
Zhang Xin , Liu Yu , Lin Yuming , et al . UV-SAM: Adapting segment anything model for urban village identification [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2024 , 38 ( 20 ): 22520 - 22528 . DOI: 10.1609/aaai.v38i20.30260 http://dx.doi.org/10.1609/aaai.v38i20.30260
Wu Junde , Wang Ziyue , Hong Mingxuan , et al . Medical SAM adapter: Adapting segment anything model for medical image segmentation [J ] . Medical Image Analysis , 2025 , 102 : 103547 . DOI: 10.1016/j.media.2025.103547 http://dx.doi.org/10.1016/j.media.2025.103547
Huang Jiaxing , Jiang Kai , Zhang Jingyi , et al . Learning to prompt segment anything models [PP/OL ] . V1. arXiv ( 2024-01-09 )[ 2025-12-18 ] . https://doi.org/10.48550/arXiv.2401.04651 https://doi.org/10.48550/arXiv.2401.04651 .
Chen Tianrun , Zhu Lanyun , Ding Chaotao , et al . SAM fails to segment anything? : SAM-adapter: Adapting SAM in underperformed scenes: Camouflage, shadow, medical image segmentation, and more[PP/OL ] . V3.arXiv ( 2023-05-02 )[ 2025-12-18 ] . https://doi.org/10.48550/arXiv.2304.09148 https://doi.org/10.48550/arXiv.2304.09148 .
Hu E J , Shen Yelong , Wallis P , et al . Lora: Low-rank adaptation of large language models [PP/OL ] . V2.arXiv ( 2021-10-16 )[ 2025-12-18 ] . https://arxiv.org/abs/2106.09685 https://arxiv.org/abs/2106.09685 .
梁新宇 , 林洗坤 , 权冀川 , 等 . 基于深度学习的图像实例分割技术研究进展 [J ] . 电子学报 , 2020 , 48 ( 12 ): 2476 - 2486 . DOI: 10.3969/j.issn.0372-2112.2020.12.025 http://dx.doi.org/10.3969/j.issn.0372-2112.2020.12.025
Liang Xinyu , Lin Xikun , Quan Jichuan , et al . Research on the progress of image instance segmentation based on deep learning [J ] . Acta Electronica Sinica , 2020 , 48 ( 12 ): 2476 - 2486 . (in Chinese) . DOI: 10.3969/j.issn.0372-2112.2020.12.025 http://dx.doi.org/10.3969/j.issn.0372-2112.2020.12.025
He Kaiming , Gkioxari G , Dollár P , et al . Mask R-CNN [C ] // 2017 IEEE International Conference on Computer Vision . Piscataway : IEEE , 2017 : 2980 - 2988 . DOI: 10.1109/iccv.2017.322 http://dx.doi.org/10.1109/iccv.2017.322
Cai Zhaowei , Vasconcelos N . Cascade R-CNN: High quality object detection and instance segmentation [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2021 , 43 ( 5 ): 1483 - 1498 . DOI: 10.1109/tpami.2019.2956516 http://dx.doi.org/10.1109/tpami.2019.2956516
Chen Kai , Pang Jiangmiao , Wang Jiaqi , et al . Hybrid task cascade for instance segmentation [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2019 : 4969 - 4978 . DOI: 10.1109/cvpr.2019.00511 http://dx.doi.org/10.1109/cvpr.2019.00511
Ren Shaoqing , He Kaiming , Girshick R , et al . Faster R-CNN: Towards real-time object detection with region proposal networks [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017 , 39 ( 6 ): 1137 - 1149 . DOI: 10.1109/tpami.2016.2577031 http://dx.doi.org/10.1109/tpami.2016.2577031
Bolya D , Zhou Chong , Xiao Fanyi , et al . YOLACT: Real-time instance segmentation [C ] // 2019 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2019 : 9156 - 9165 . DOI: 10.1109/iccv.2019.00925 http://dx.doi.org/10.1109/iccv.2019.00925
Chen Hao , Sun Kunyang , Tian Zhi , et al . BlendMask: Top-down meets bottom-up for instance segmentation [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2020 : 8570 - 8578 . DOI: 10.1109/cvpr42600.2020.00860 http://dx.doi.org/10.1109/cvpr42600.2020.00860
Tian Zhi , Shen Chunhua , Chen Hao . Conditional convolutions for instance segmentation [C ] // Computer Vision - ECCV 2020 . Cham : Springer , 2020 : 282 - 298 . DOI: 10.1007/978-3-030-58452-8_17 http://dx.doi.org/10.1007/978-3-030-58452-8_17
Wang Xinlong , Zhang Rufeng , Kong Tao , et al . Solov2: Dynamic and fast instance segmentation [J ] . Advances in Neural Information Processing Systems , 2020 , 33 : 17721 - 17732 .
Cheng Bowen , Schwing A G , Kirillov A . Per-pixel classification is not all you need for semantic segmentation [J ] . Advances in Neural Information Processing Systems , 2021 , 34 : 17864 - 17875 .
Cheng Bowen , Misra I , Schwing A G , et al . Masked-attention mask transformer for universal image segmentation [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 1280 - 1289 . DOI: 10.1109/cvpr52688.2022.00135 http://dx.doi.org/10.1109/cvpr52688.2022.00135
Li Feng , Zhang Hao , Xu Huaizhe , et al . Mask DINO: Towards a unified transformer-based framework for object detection and segmentation [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2023 : 3041 - 3050 . DOI: 10.1109/cvpr52729.2023.00297 http://dx.doi.org/10.1109/cvpr52729.2023.00297
Zhang Wenwei , Pang Jiangmiao , Chen Kai , et al . K-net: Towards unified image segmentation [J ] . Advances in Neural Information Processing Systems , 2021 , 34 : 10326 - 10338 .
Dosovitskiy A , Beyer L , Kolesnikov A , et al . An image is worth 16x16 words: Transformers for image recognition at scale [PP/OL ] . V2.arXiv ( 2021-06-03 )[ 2025-12-18 ] . https://doi.org/10.48550/arXiv.2010.11929 https://doi.org/10.48550/arXiv.2010.11929 .
Li K , Rajpurkar P . Adapting segment anything models to medical imaging via fine-tuning without domain pretraining [C/OL ] // AAAI 2024 Spring Symposium on Clinical Foundation Models . Openreview , 2024 . https://openreview.net/forum?id=Fxi7pRmnYJ https://openreview.net/forum?id=Fxi7pRmnYJ .
Jia Menglin , Tang Luming , Chen Bochun , et al . Visual prompt tuning [C ] // Computer Vision - ECCV 2022 . Cham : Springer , 2022 : 709 - 727 . DOI: 10.1007/978-3-031-19827-4_41 http://dx.doi.org/10.1007/978-3-031-19827-4_41
Radford A , Kim J W , Hallacy C , et al . Learning transferable visual models from natural language supervision [C ] // International Conference on Machine Learning . PmLR , 2021 : 8748 - 8763 . DOI: 10.48550/arXiv.2103.00020 http://dx.doi.org/10.48550/arXiv.2103.00020
Waszak M , Cardaillac A , Elvesæter B , et al . Semantic segmentation in underwater ship inspections: Benchmark and data set [J ] . IEEE Journal of Oceanic Engineering , 2023 , 48 ( 2 ): 462 - 473 . DOI: 10.1109/joe.2022.3219129 http://dx.doi.org/10.1109/joe.2022.3219129
Li Hua , Lian Shijie , Li Zhiyuan , et al . UWSAM: Segment anything model guided underwater instance segmentation and a large-scale benchmark dataset [PP/OL ] . V1.arXiv ( 2025-05-21 )[ 2025-12-18 ] . https://arxiv.org/html/2505.15581v1 https://arxiv.org/html/2505.15581v1 . DOI: 10.2139/ssrn.5197295 http://dx.doi.org/10.2139/ssrn.5197295
Zhang Jing , Fan Dengping , Dai Yuchao , et al . RGB-D saliency detection via cascaded mutual information minimization [C ] // 2021 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2021 : 4318 - 4327 . DOI: 10.1109/iccv48922.2021.00430 http://dx.doi.org/10.1109/iccv48922.2021.00430
Lin T Y , Maire M , Belongie S , et al . Microsoft COCO: Common objects in context [M ] // Computer Vision - ECCV 2014 . Cham : Springer International Publishing , 2014 : 740 - 755 . DOI: 10.1007/978-3-319-10602-1_48 http://dx.doi.org/10.1007/978-3-319-10602-1_48
Chen Kai , Wang Jiaqi , Pang Jiangmiao , et al . MMDetection: Open MMLab detection toolbox and benchmark [PP/OL ] . V1. arXiv ( 2019-06-17 )[ 2025-12-18 ] . https://doi.org/10.48550/arXiv.1906.07155 https://doi.org/10.48550/arXiv.1906.07155 .
Fang Yuxin , Yang Shusheng , Wang Xinggang , et al . Instances as queries [C ] // 2021 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2021 : 6890 - 6899 . DOI: 10.1109/iccv48922.2021.00683 http://dx.doi.org/10.1109/iccv48922.2021.00683
He Junjie , Li Pengyu , Geng Yifeng , et al . FastInst: A simple query-based model for real-time instance segmentation [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2023 : 23663 - 23672 . DOI: 10.1109/cvpr52729.2023.02266 http://dx.doi.org/10.1109/cvpr52729.2023.02266
Tian Zhi , Shen Chunhua , Wang Xinlong , et al . BoxInst: High-performance instance segmentation with box annotations [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2021 : 5439 - 5448 . DOI: 10.1109/cvpr46437.2021.00540 http://dx.doi.org/10.1109/cvpr46437.2021.00540
Xiong Yunyang , Varadarajan B , Wu Lemeng , et al . EfficientSAM: Leveraged masked image pretraining for efficient segment anything [C ] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2024 : 16111 - 16121 . DOI: 10.1109/cvpr52733.2024.01525 http://dx.doi.org/10.1109/cvpr52733.2024.01525
Wu Yuhuan , Liu Yun , Zhang Le , et al . Regularized densely-connected pyramid network for salient instance segmentation [J ] . IEEE Transactions on Image Processing , 2021 , 30 : 3897 - 3907 . DOI: 10.1109/tip.2021.3065822 http://dx.doi.org/10.1109/tip.2021.3065822
Pei Jialun , Cheng Tianyang , Tang He , et al . Transformer-based efficient salient instance segmentation networks with orientative query [J ] . IEEE Transactions on Multimedia , 2023 , 25 : 1964 - 1978 . DOI: 10.1109/tmm.2022.3141891 http://dx.doi.org/10.1109/tmm.2022.3141891
Fan Ruochen , Cheng Mingming , Hou Qibin , et al . S4Net : Single stage salient-instance segmentation [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2019: 6096 - 6105 .
Kirillov A , Wu Yuxin , He Kaiming , et al . PointRend: Image segmentation as rendering [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2020 : 9796 - 9805 . DOI: 10.1109/cvpr42600.2020.00982 http://dx.doi.org/10.1109/cvpr42600.2020.00982
0
Views
14
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621