DGD-SAM: A Dynamically-Guided SAM for Underwater Image Instance Segmentation

SHANG Yihan; DONG Xinghui

doi:10.12263/DZXB.20251002

您当前的位置：

首页 >

文章列表页 >

DGD-SAM: A Dynamically-Guided SAM for Underwater Image Instance Segmentation

PAPERS | 更新时间：2026-06-04

- DGD-SAM: A Dynamically-Guided SAM for Underwater Image Instance Segmentation
- ACTA ELECTRONICA SINICA Vol. 54, Issue 1, Pages: 368-380(2026)
- 作者机构：
  
  中国海洋大学海洋动力－物理环境与智能感知全国重点实验室，中国海洋大学信息科学与工程学部，山东青岛266100
- 作者简介：
- 基金信息：
  
  National Natural Science Foundation of China (NSFC)(42576200);Key Research and Development Program of Shandong Province, China(2024ZLGX06)
- DOI：10.12263/DZXB.20251002
  CLC： TP391;
- Received：18 December 2025，
  
  Accepted：16 January 2026，
  
  Published：25 January 2026
- 稿件说明：
移动端阅览
尚毅涵, 董兴辉. DGD-SAM：一种用于水下图像实例分割的动态引导SAM[J]. 电子学报, 2026, 54(01): 368-380.

SHANG Yihan, DONG Xinghui. DGD-SAM: A Dynamically-Guided SAM for Underwater Image Instance Segmentation[J]. Acta Electronica Sinica, 2026, 54(01): 368-380.
尚毅涵, 董兴辉. DGD-SAM：一种用于水下图像实例分割的动态引导SAM[J]. 电子学报, 2026, 54(01): 368-380. DOI：10.12263/DZXB.20251002

SHANG Yihan, DONG Xinghui. DGD-SAM: A Dynamically-Guided SAM for Underwater Image Instance Segmentation[J]. Acta Electronica Sinica, 2026, 54(01): 368-380. DOI：10.12263/DZXB.20251002

摘要

随着深海探测与海洋资源开发需求的日益增长，水下视觉技术已成为机器人作业、海洋生物监测等领域的关键支撑。在众多的视觉任务中，水下图像实例分割因需同时实现目标的精确定位与像素级掩码预测而具有极高的挑战性。近年来，视觉基础模型，特别是Segment Anything Model（SAM），在通用场景下展现出卓越的零样本泛化能力，但在复杂的水下环境中，其表现仍不尽如人意。水下环境光线吸收、散射严重，导致图像伴随明显的色彩失真、对比度极低以及边缘模糊等退化现象，严重干扰了模型的特征提取。此外，SAM的分割性能高度依赖人工提供的显式提示信息（例如点、框和掩码），这种依赖不仅增加了人工成本，更限制了其在无人值守或复杂水下环境中的适用性。为了解决上述问题，本文提出了一种动态引导SAM（Dynamically Guided SAM，DGD-SAM）。DGD-SAM通过引入动态引导机制，结合特征聚合与多尺度增强模块，构建了完整的自动提示生成与精细化分割流程。首先，针对检测与分割任务特征分布不一致的问题，本文设计了自适应特征聚合模块。该模块通过引入通道注意力机制对特征依赖关系进行重新建模，在空间与通道维度上实现任务对齐，有效增强了模型对水下弱目标区域的感知灵敏度。其次，考虑到水下目标尺寸多变且背景干扰复杂的特性，构建了多尺度特征增强模块。该模块通过构建跨空间分辨率的特征金字塔，显著提升了模型在复杂场景下对各种尺度目标的捕捉能力。最后，在解码阶段，本文提出了动态引导解码器，先融合初始分割掩码与图像特征以生成动态引导信息，再通过提示与图像特征间的双向注意力交互实现精细掩码预测。实验结果显示，DGD-SAM在四个公开水下数据集LIACI、USIS10K、UIIS和UIIS10K以及两个陆地场景数据集COME15K-E和COME15K-H上均优于当前的先进方法，这表明本文方法不仅在水下场景中表现出色，在陆地场景中同样能够获得稳定且具有竞争力的分割性能，说明模型未过度依赖特定场景特征，具备良好的泛化能力和可扩展性。

Abstract

With the growing demand for deep-sea exploration and marine resource exploitation

underwater vision technologies have become a critical enabler for applications

such as robotic operations and marine biological monitoring. Among various vision tasks

underwater image instance segmentation (UIIS) is particularly challenging

as it requires both precise object localization and pixel-level mask generation. In recent years

vision foundation models

in particular

the segment anything model (SAM)

have demonstrated remarkable zero-shot generalization capabilities in generic scenes. However

their performance remains unsatisfactory in complex underwater environments. Severe light absorption and scattering in underwater environments lead to significant image degradation

including color distortion

extremely low contrast

and blurred boundaries

which substantially hinder effective feature extraction. Moreover

the segmentation performance of SAM heavily relies on manually provided explicit prompts (e.g.

points

boxes

and masks). This dependency not only increases annotation costs but also limits its applicability in unattended or complex underwater scenarios. To address these challenges

we propose a dynamically-guided SAM (DGD-SAM). By introducing a dynamically-guided mechanism and integrating feature aggregation with a multi-scale feature enhancement module

DGD-SAM establishes a complete pipeline for automatic prompt generation and refined segmentation. First

to mitigate the feature distribution discrepancy between detection and segmentation tasks

an adaptive feature aggregator (AFA) is designed. This module re-models inter-channel dependencies through a channel attention mechanism

achieving task alignment across both spatial and channel dimensions and effectively enhancing the model’s sensitivity to weak underwater targets. Second

considering the large variation in underwater target scales and the complexity of background interference

a multi-scale feature enhancement module is constructed. By building a cross-resolution feature pyramid

this module significantly improves the model’s ability to capture targets of various scales in complex scenes. During the decoding stage

a dynamically-guided decoder (DGD) is proposed

which first integrates the initial segmentation mask with image features to generate dynamic guidance information

and then performs refined mask prediction through bidirectional attention interactions between the prompts and image features. Experimental results demonstrate that DGD-SAM consistently outperforms state-of-the-art methods on four public underwater data sets

including LIACI

USIS10K

UIIS

and UIIS10K

as well as two terrestrial scene data sets

i.e.

COME15K-E and COME15K-H. These results indicate that the proposed method not only achieves superior performance in underwater environments but also maintains stable and competitive segmentation performance in terrestrial scenes

suggesting that the model does not overly rely on scene-specific characteristics and exhibits strong generalizability and scalability.

关键词

Keywords

references

Zhou Dingfu , Fang Jin , Song Xibin , et al . Joint 3D instance segmentation and object detection for autonomous driving [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 1836 - 1846 . DOI: 10.1109/cvpr42600.2020.00191 http://dx.doi.org/10.1109/cvpr42600.2020.00191

Zhou Sihang , Nie D , Adeli E , et al . Semantic instance segmentation with discriminative deep supervision for medical images [J ] . Medical Image Analysis , 2022 , 82 : 102626 . DOI: 10.1016/j.media.2022.102626 http://dx.doi.org/10.1016/j.media.2022.102626

Chen Keyan , Liu Chenyang , Chen Hao , et al . RSPrompter: Learning to prompt for remote sensing instance segmentation based on visual foundation model [J ] . IEEE Transactions on Geoscience and Remote Sensing , 2024 , 62 : 4701117 . DOI: 10.1109/tgrs.2024.3356074 http://dx.doi.org/10.1109/tgrs.2024.3356074

Su Hao , Wei Shunjun , Liu Shan , et al . HQ-ISNet: High-quality instance segmentation for remote sensing imagery [J ] . Remote Sensing , 2020 , 12 ( 6 ): 989 . DOI: 10.3390/rs12060989 http://dx.doi.org/10.3390/rs12060989

牛玉贞 , 张凌昕 , 兰杰 , 等 . 基于分频式生成对抗网络的非成对水下图像增强 [J ] . 电子学报 , 2025 , 53 ( 2 ): 527 - 544 .

Niu Yuzhen , Zhang Lingxin , Lan Jie , et al . FD-GAN: Frequency-decomposed generative adversarial network for unpaired underwater image enhancement [J ] . Acta Electronica Sinica , 2025 , 53 ( 2 ): 527 - 544 . (in Chinese)

Lian Shijie , Li Hua , Cong Runmin , et al . WaterMask: Instance segmentation for underwater imagery [C ] // 2023 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2023 : 1305 - 1315 . DOI: 10.1109/iccv51070.2023.00126 http://dx.doi.org/10.1109/iccv51070.2023.00126

Lian Shijie , Zhang Ziyi , Li Hua , et al . Diving into underwater: Segment anything model guided underwater salient instance segmentation and a large-scale dataset [PP/OL ] . V1. arXiv ( 2024-06-10 )[ 2025-12-19 ] . https://arxiv.org/abs/2406.06039 https://arxiv.org/abs/2406.06039 .

Kirillov A , Mintun E , Ravi N , et al . Segment anything [C ] // 2023 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2023 : 3992 - 4003 . DOI: 10.1109/iccv51070.2023.00371 http://dx.doi.org/10.1109/iccv51070.2023.00371

Zhang Xin , Liu Yu , Lin Yuming , et al . UV-SAM: Adapting segment anything model for urban village identification [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2024 , 38 ( 20 ): 22520 - 22528 . DOI: 10.1609/aaai.v38i20.30260 http://dx.doi.org/10.1609/aaai.v38i20.30260

Wu Junde , Wang Ziyue , Hong Mingxuan , et al . Medical SAM adapter: Adapting segment anything model for medical image segmentation [J ] . Medical Image Analysis , 2025 , 102 : 103547 . DOI: 10.1016/j.media.2025.103547 http://dx.doi.org/10.1016/j.media.2025.103547

Huang Jiaxing , Jiang Kai , Zhang Jingyi , et al . Learning to prompt segment anything models [PP/OL ] . V1. arXiv ( 2024-01-09 )[ 2025-12-18 ] . https://doi.org/10.48550/arXiv.2401.04651 https://doi.org/10.48550/arXiv.2401.04651 .

Chen Tianrun , Zhu Lanyun , Ding Chaotao , et al . SAM fails to segment anything? : SAM-adapter: Adapting SAM in underperformed scenes: Camouflage, shadow, medical image segmentation, and more[PP/OL ] . V3.arXiv ( 2023-05-02 )[ 2025-12-18 ] . https://doi.org/10.48550/arXiv.2304.09148 https://doi.org/10.48550/arXiv.2304.09148 .

Hu E J , Shen Yelong , Wallis P , et al . Lora: Low-rank adaptation of large language models [PP/OL ] . V2.arXiv ( 2021-10-16 )[ 2025-12-18 ] . https://arxiv.org/abs/2106.09685 https://arxiv.org/abs/2106.09685 .

梁新宇 , 林洗坤 , 权冀川 , 等 . 基于深度学习的图像实例分割技术研究进展 [J ] . 电子学报 , 2020 , 48 ( 12 ): 2476 - 2486 . DOI: 10.3969/j.issn.0372-2112.2020.12.025 http://dx.doi.org/10.3969/j.issn.0372-2112.2020.12.025

Liang Xinyu , Lin Xikun , Quan Jichuan , et al . Research on the progress of image instance segmentation based on deep learning [J ] . Acta Electronica Sinica , 2020 , 48 ( 12 ): 2476 - 2486 . (in Chinese) . DOI: 10.3969/j.issn.0372-2112.2020.12.025 http://dx.doi.org/10.3969/j.issn.0372-2112.2020.12.025

He Kaiming , Gkioxari G , Dollár P , et al . Mask R-CNN [C ] // 2017 IEEE International Conference on Computer Vision . Piscataway : IEEE , 2017 : 2980 - 2988 . DOI: 10.1109/iccv.2017.322 http://dx.doi.org/10.1109/iccv.2017.322

Cai Zhaowei , Vasconcelos N . Cascade R-CNN: High quality object detection and instance segmentation [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2021 , 43 ( 5 ): 1483 - 1498 . DOI: 10.1109/tpami.2019.2956516 http://dx.doi.org/10.1109/tpami.2019.2956516

Chen Kai , Pang Jiangmiao , Wang Jiaqi , et al . Hybrid task cascade for instance segmentation [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2019 : 4969 - 4978 . DOI: 10.1109/cvpr.2019.00511 http://dx.doi.org/10.1109/cvpr.2019.00511

Ren Shaoqing , He Kaiming , Girshick R , et al . Faster R-CNN: Towards real-time object detection with region proposal networks [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017 , 39 ( 6 ): 1137 - 1149 . DOI: 10.1109/tpami.2016.2577031 http://dx.doi.org/10.1109/tpami.2016.2577031

Bolya D , Zhou Chong , Xiao Fanyi , et al . YOLACT: Real-time instance segmentation [C ] // 2019 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2019 : 9156 - 9165 . DOI: 10.1109/iccv.2019.00925 http://dx.doi.org/10.1109/iccv.2019.00925

Chen Hao , Sun Kunyang , Tian Zhi , et al . BlendMask: Top-down meets bottom-up for instance segmentation [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2020 : 8570 - 8578 . DOI: 10.1109/cvpr42600.2020.00860 http://dx.doi.org/10.1109/cvpr42600.2020.00860

Tian Zhi , Shen Chunhua , Chen Hao . Conditional convolutions for instance segmentation [C ] // Computer Vision - ECCV 2020 . Cham : Springer , 2020 : 282 - 298 . DOI: 10.1007/978-3-030-58452-8_17 http://dx.doi.org/10.1007/978-3-030-58452-8_17

Wang Xinlong , Zhang Rufeng , Kong Tao , et al . Solov2: Dynamic and fast instance segmentation [J ] . Advances in Neural Information Processing Systems , 2020 , 33 : 17721 - 17732 .

Cheng Bowen , Schwing A G , Kirillov A . Per-pixel classification is not all you need for semantic segmentation [J ] . Advances in Neural Information Processing Systems , 2021 , 34 : 17864 - 17875 .

Cheng Bowen , Misra I , Schwing A G , et al . Masked-attention mask transformer for universal image segmentation [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 1280 - 1289 . DOI: 10.1109/cvpr52688.2022.00135 http://dx.doi.org/10.1109/cvpr52688.2022.00135

Li Feng , Zhang Hao , Xu Huaizhe , et al . Mask DINO: Towards a unified transformer-based framework for object detection and segmentation [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2023 : 3041 - 3050 . DOI: 10.1109/cvpr52729.2023.00297 http://dx.doi.org/10.1109/cvpr52729.2023.00297

Zhang Wenwei , Pang Jiangmiao , Chen Kai , et al . K-net: Towards unified image segmentation [J ] . Advances in Neural Information Processing Systems , 2021 , 34 : 10326 - 10338 .

Dosovitskiy A , Beyer L , Kolesnikov A , et al . An image is worth 16x16 words: Transformers for image recognition at scale [PP/OL ] . V2.arXiv ( 2021-06-03 )[ 2025-12-18 ] . https://doi.org/10.48550/arXiv.2010.11929 https://doi.org/10.48550/arXiv.2010.11929 .

Li K , Rajpurkar P . Adapting segment anything models to medical imaging via fine-tuning without domain pretraining [C/OL ] // AAAI 2024 Spring Symposium on Clinical Foundation Models . Openreview , 2024 . https://openreview.net/forum?id=Fxi7pRmnYJ https://openreview.net/forum?id=Fxi7pRmnYJ .

Jia Menglin , Tang Luming , Chen Bochun , et al . Visual prompt tuning [C ] // Computer Vision - ECCV 2022 . Cham : Springer , 2022 : 709 - 727 . DOI: 10.1007/978-3-031-19827-4_41 http://dx.doi.org/10.1007/978-3-031-19827-4_41

Radford A , Kim J W , Hallacy C , et al . Learning transferable visual models from natural language supervision [C ] // International Conference on Machine Learning . PmLR , 2021 : 8748 - 8763 . DOI: 10.48550/arXiv.2103.00020 http://dx.doi.org/10.48550/arXiv.2103.00020

Waszak M , Cardaillac A , Elvesæter B , et al . Semantic segmentation in underwater ship inspections: Benchmark and data set [J ] . IEEE Journal of Oceanic Engineering , 2023 , 48 ( 2 ): 462 - 473 . DOI: 10.1109/joe.2022.3219129 http://dx.doi.org/10.1109/joe.2022.3219129

Li Hua , Lian Shijie , Li Zhiyuan , et al . UWSAM: Segment anything model guided underwater instance segmentation and a large-scale benchmark dataset [PP/OL ] . V1.arXiv ( 2025-05-21 )[ 2025-12-18 ] . https://arxiv.org/html/2505.15581v1 https://arxiv.org/html/2505.15581v1 . DOI: 10.2139/ssrn.5197295 http://dx.doi.org/10.2139/ssrn.5197295

Zhang Jing , Fan Dengping , Dai Yuchao , et al . RGB-D saliency detection via cascaded mutual information minimization [C ] // 2021 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2021 : 4318 - 4327 . DOI: 10.1109/iccv48922.2021.00430 http://dx.doi.org/10.1109/iccv48922.2021.00430

Lin T Y , Maire M , Belongie S , et al . Microsoft COCO: Common objects in context [M ] // Computer Vision - ECCV 2014 . Cham : Springer International Publishing , 2014 : 740 - 755 . DOI: 10.1007/978-3-319-10602-1_48 http://dx.doi.org/10.1007/978-3-319-10602-1_48

Chen Kai , Wang Jiaqi , Pang Jiangmiao , et al . MMDetection: Open MMLab detection toolbox and benchmark [PP/OL ] . V1. arXiv ( 2019-06-17 )[ 2025-12-18 ] . https://doi.org/10.48550/arXiv.1906.07155 https://doi.org/10.48550/arXiv.1906.07155 .

Fang Yuxin , Yang Shusheng , Wang Xinggang , et al . Instances as queries [C ] // 2021 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2021 : 6890 - 6899 . DOI: 10.1109/iccv48922.2021.00683 http://dx.doi.org/10.1109/iccv48922.2021.00683

He Junjie , Li Pengyu , Geng Yifeng , et al . FastInst: A simple query-based model for real-time instance segmentation [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2023 : 23663 - 23672 . DOI: 10.1109/cvpr52729.2023.02266 http://dx.doi.org/10.1109/cvpr52729.2023.02266

Tian Zhi , Shen Chunhua , Wang Xinlong , et al . BoxInst: High-performance instance segmentation with box annotations [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2021 : 5439 - 5448 . DOI: 10.1109/cvpr46437.2021.00540 http://dx.doi.org/10.1109/cvpr46437.2021.00540

Xiong Yunyang , Varadarajan B , Wu Lemeng , et al . EfficientSAM: Leveraged masked image pretraining for efficient segment anything [C ] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2024 : 16111 - 16121 . DOI: 10.1109/cvpr52733.2024.01525 http://dx.doi.org/10.1109/cvpr52733.2024.01525

Wu Yuhuan , Liu Yun , Zhang Le , et al . Regularized densely-connected pyramid network for salient instance segmentation [J ] . IEEE Transactions on Image Processing , 2021 , 30 : 3897 - 3907 . DOI: 10.1109/tip.2021.3065822 http://dx.doi.org/10.1109/tip.2021.3065822

Pei Jialun , Cheng Tianyang , Tang He , et al . Transformer-based efficient salient instance segmentation networks with orientative query [J ] . IEEE Transactions on Multimedia , 2023 , 25 : 1964 - 1978 . DOI: 10.1109/tmm.2022.3141891 http://dx.doi.org/10.1109/tmm.2022.3141891

Fan Ruochen , Cheng Mingming , Hou Qibin , et al . S4Net : Single stage salient-instance segmentation [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2019: 6096 - 6105 .

Kirillov A , Wu Yuxin , He Kaiming , et al . PointRend: Image segmentation as rendering [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2020 : 9796 - 9805 . DOI: 10.1109/cvpr42600.2020.00982 http://dx.doi.org/10.1109/cvpr42600.2020.00982

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

SDDA: Unsupervised Style and Distribution Domain Adaptation Method for Nighttime Semantic Segmentation

Enhanced PFCM Algorithm for Noisy Image Segmentation Combining Regional and Neighborhood-Level Information

SwinT-Unet: Ultrasound Image Segmentation Based on Two-Channel Self-Attention Mechanism

A Variational Level Set Model Based on Beer-Lambert Law

A TCAD-DNN-Based Total-Ionizing-Dose Effect Model for FinFET Devices

Related Author

LEI Xiaochun

WU Weilin

JIANG Zetao

ZHU Wencai

LIU Yingjian

CHEN Dongmei

WU Siqi

WANG Xiao-peng

Related Institution

School of Computer and Information Security, Guilin University of Electronic Technology

Guangxi Key Laboratory of Image and Graphics Intelligent Processing

School of Computer Science, Northwestern Polytechnical University

School of Electronic and Information Engineering, Lanzhou Jiaotong University

Institute of Big Data Science & Industry, Shanxi University

⁰