跨模态融合和边界可变形卷积引导的RGB-D显著性目标检测

孟令兵; 袁梦雅; 时雪涵; 张乐; 吴锦华; 程菲

doi:10.12263/DZXB.20230042

您当前的位置：

首页 >

文章列表页 >

跨模态融合和边界可变形卷积引导的RGB-D显著性目标检测

学术论文 | 更新时间：2025-12-08

- 跨模态融合和边界可变形卷积引导的RGB-D显著性目标检测
- RGB-D Salient Object Detection Based on Cross-Modal Fusion and Boundary Deformable Convolution Guidance
- 电子学报 2023年51卷第11期页码：3155-3166
- 作者机构：
  
  1.安徽信息工程学院计算机与软件工程学院,安徽芜湖 241000
  2.安徽信息工程学院电气与电子工程学院,安徽芜湖 241000
  3.杭州电子科技大学管理学院,浙江杭州 310000
- 作者简介：
  
  [ "孟令兵男，1994年11月出生，安徽霍邱人.现为安徽信息工程学院计算机与软件工程学院助教.主要研究方向为计算机视觉（显著性目标检测、医学图像分割等）. E-mail: lbmeng@iflytek.com" ]
  [ "袁梦雅女，2003年4月出生，安徽合肥人.现为安徽信息工程学院本科生.主要研究方向为计算机视觉、传感器网络. E-mail: 1464616739@qq.com" ]
  [ "时雪涵女，1986年11月出生，安徽阜阳人.现为安徽信息工程学院计算机与软件工程学院高级工程师.主要研究方向为计算机视觉、信息安全、软件测试等. E-mail: xhshi3@iflytek.com" ]
  [ "张乐女，1997年10月出生，安徽池州人.现为安徽信息工程学院计算机与软件工程学院助教，主要研究方向为智能感知与物体识别. E-mail: 1ezhang7@iflytek.com" ]
  [ "吴锦华男，1991年12月出生，安徽省枞阳人.现为安徽信息工程学院计算机与软件工程学院讲师.主要研究方向为模式识别. E-mail: jhwu3@iflytek.com" ]
  [ "程菲（通讯作者）女，1968年7月出生，安徽黄山人.现为安徽信息工程学院大数据与人工智能学院副教授.主要研究方向为智能控制." ]
- 基金信息：
  
  安徽省自然科学基金(2008085MF201);安徽省教育厅自然科学重点项目(2022AH051894;2022AH051887);安徽省高校优秀青年人才支持计划(gxyq2022147)
- DOI：10.12263/DZXB.20230042
  中图分类号： TP751
- 收稿：2023-01-10，
  
  修回：2023-08-28，
  
  纸质出版：2023-11-25
- 稿件说明：
移动端阅览
孟令兵,袁梦雅,时雪涵等.跨模态融合和边界可变形卷积引导的RGB-D显著性目标检测[J].电子学报,2023,51(11):3155-3166.

MENG Ling-bing,YUAN Meng-ya,SHI Xue-han,et al.RGB-D Salient Object Detection Based on Cross-Modal Fusion and Boundary Deformable Convolution Guidance[J].ACTA ELECTRONICA SINICA,2023,51(11):3155-3166.
孟令兵,袁梦雅,时雪涵等.跨模态融合和边界可变形卷积引导的RGB-D显著性目标检测[J].电子学报,2023,51(11):3155-3166. DOI： 10.12263/DZXB.20230042.

MENG Ling-bing,YUAN Meng-ya,SHI Xue-han,et al.RGB-D Salient Object Detection Based on Cross-Modal Fusion and Boundary Deformable Convolution Guidance[J].ACTA ELECTRONICA SINICA,2023,51(11):3155-3166. DOI： 10.12263/DZXB.20230042.

摘要

RGB-Depth（RGB-D）显著性目标检测是一项有意义且具有挑战性的任务，基于现有卷积神经网络检测方法在简单场景中获得了良好的检测性能，但不能有效应对背景信息混乱，深度图质量低和目标轮廓复杂的情况.为应对上述问题，本文提出了一种跨模态融合和边界可变形卷积引导的RGB-D显著性目标检测方法.首先，本文以Swin-Transformer为特征提取器，分别对RGB模态与深度图模态进行特征提取，并通过跨模态注意力增强特征模块对两种模态特征进行融合以挖掘显著物的共性与互补特征.接着将提出的相邻多尺度特征增强模块嵌入编码器深层，以获得丰富的全局上下文特征信息，更精准地定位显著物的位置.然后通过构建一个边界特征提取解码器（U-Net架构）生成显著物的边界线索图，并重复采用跨模态融合特征确保生成显著物边界的完整性.最后，本文设计了一个边界可变形卷积引导模块，使用边界线索图与可变形卷积引导跨模态融合特征进行解码以得到更加准确的显著图.通过在6个公开基准数据集上与25种主流方法相比较，本文所提模型在多个指标上均有较明显的提升，从而证明了本文方法的有效性.

Abstract

RGB-Depth (RGB-D) salient object detection is a meaningful and challenging task. The current method based on convolutional neural networks has achieved good detection performance in simple scenes

but cannot effectively handle scenes with cluttered background information

low-quality depth maps

and complex object contours. In order to solve the above problems

an RGB-D SOD model based on cross-modal fusion and boundary deformable convolution guidance is proposed in this paper. Firstly

the Swin Transformer is used as an extractor to extract features from the RGB modality and depth modality

respectively

which fuse the two modalities by using a cross-modal attention enhancement feature (CMAEF) module

to explore the common and complementary features of salient objects. Then

the proposed adjacent multi-scale feature enhancement (AMFE) module is embedded deep-level into the encoder to obtain rich global contextual feature information

which can locate the position of salient objects more accurately. Next

the boundary cue maps of salient objects are generated by boundary feature extraction decoder (U-Net architecture) constructed and repeated using cross-modal fusion features to ensure the integrity of the generated salient object boundaries. Finally

we designed a boundary deformable convolution guidance (BDCG) module that uses boundary cue maps with deformable convolution to guide the decoding of cross-modal fusion features to obtain more accurate saliency maps. Comprehensive experiments on six popular benchmark datasets compared with 25 mainstream methods demonstrate that the proposed model shows significant improvement in metrics

which proves the effectiveness of the proposed model.

关键词

Keywords

references

CHEN H , LI Y F . Three-stream attention-aware network for RGB-D salient object detection [J ] . IEEE Transactions on Image Processing , 2019 , 28 ( 6 ): 2825 - 2835 .

WANG J , SONG K C , BAO Y Q , et al . CGFNet: Cross-guided fusion network for RGB-T salient object detection [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2022 , 32 ( 5 ): 2949 - 2961 .

梁大川 , 李静 , 刘赛 , 等 . 基于图和稀疏主成分分析的多目标显著性检测 [J ] . 计算机研究与发展 , 2018 , 55 ( 5 ): 1078 - 1089 .

LIANG D C , LI J , LIU S , et al . Multiple object saliency detection based on graph and sparse principal component analysis [J ] . Journal of Computer Research and Development , 2018 , 55 ( 5 ): 1078 - 1089 . (in Chinese)

张荣国 , 贾玉闪 , 胡静 , 等 . 超像素内容感知先验的多尺度贝叶斯显著性检测方法 [J ] . 电子学报 , 2020 , 48 ( 8 ): 1509 - 1515 .

ZHANG R G , JIA Y S , HU J , et al . Superpixel content-aware priors based multi-scale Bayesian saliency detection [J ] . Acta Electronica Sinica , 2020 , 48 ( 8 ): 1509 - 1515 . (in Chinese)

LI J X , PAN Z F , LIU Q S , et al . Stacked U-shape network with channel-wise attention for salient object detection [J ] . IEEE Transactions on Multimedia , 2021 , 23 : 1397 - 1409 .

WANG W G , SHEN J B , LING H B . A deep network solution for attention and aesthetics aware photo cropping [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2019 , 41 ( 7 ): 1531 - 1544 .

CHEN Z X , ZHOU H J , LAI J H , et al . Contour-aware loss: Boundary-aware learning for salient object segmentation [J ] . IEEE Transactions on Image Processing , 2021 , 30 : 431 - 443 .

LEE M S , SHIN W , HAN S W . TRACER: Extreme attention guided salient object tracing network (student abstract) [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2022 , 36 ( 11 ): 12993 - 12994 .

WANG F Y , PAN J S , XU S K , et al . Learning discriminative cross-modality features for RGB-D saliency detection [J ] . IEEE Transactions on Image Processing , 2022 , 31 : 1285 - 1297 .

ZHAO X Q , PANG Y W , ZHANG L H , et al . Self-supervised pretraining for RGB-D salient object detection [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2022 , 36 ( 3 ): 3463 - 3471 .

LI G Y , LIU Z , CHEN M Y , et al . Hierarchical alternate interaction network for RGB-D salient object detection [J ] . IEEE Transactions on Image Processing , 2021 , 30 : 3528 - 3542 .

JI W , LI J J , YU S , et al . Calibrated RGB-D salient object detection [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 9466 - 9476 .

FAN D P , LIN Z , ZHANG Z , et al . Rethinking RGB-D salient object detection: Models, data sets, and large-scale benchmarks [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2021 , 32 ( 5 ): 2075 - 2089 .

CHEN S H , FU Y . Progressively guided alternate refinement network for RGB-D salient object detection [C ] // Computer Vision — ECCV 2020 . Cham : Springer International Publishing , 2020 : 520 - 538 .

ZHAO X Q , ZHANG L H , PANG Y W , et al . A single stream setwork for robust and real-time RGB-D salient object detection [C ] // Computer Vision — ECCV 2020 . Cham : Springer International Publishing , 2020 : 646 - 662 .

ZHAO X Q , PANG Y W , ZHANG L H , et al . Suppress and balance: A simple gated network for salient object detection [C ] // Computer Vision — ECCV 2020 . Cham : Springer International Publishing , 2020 : 35 - 51 .

YAO Z J , WANG L P . Boundary information progressive guidance network for salient object detection [J ] . IEEE Transactions on Multimedia , 2022 , 24 : 4236 - 4249 .

ZHOU X F , SHEN K Y , WENG L , et al . Edge-guided recurrent positioning network for salient object detection in optical remote sensing images [J ] . IEEE Transactions on Cybernetics , 2023 , 53 ( 1 ): 539 - 552 .

ZHOU X F , SHEN K Y , LIU Z , et al . Edge-aware multiscale feature integration networkfor salient object detection in optical remote sensing images [J ] . IEEE Transactions on Geoscience and Remote Sensing , 2022 , 60 : 1 - 15 .

WOO S , PARK J , LEE J Y , et al . CBAM: Convolutional Block Attention Module [C ] // Computer Vision — ECCV 2018 . Cham : Springer International Publishing , 2018 : 3 - 19 .

DAI J F , QI H Z , XIONG Y W , et al . Deformable convolutional networks [C ] // 2017 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2017 : 764 - 773 .

LI N Y , YE J W , JI Y , et al . Saliency detection on light field [C ] // 2014 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2014 : 2806 - 2813 .

PENG H W , LI B , XIONG W H , et al . RGBD salient object detection: A benchmark and algorithms [C ] // Computer Vision — ECCV 2014 . Cham : Springer International Publishing , 2014 : 92 - 109 .

JU R , GE L , GENG W J , et al . Depth saliency based on anisotropic center-surround difference [C ] // 2014 IEEE International Conference on Image Processing (ICIP) . Piscataway : IEEE , 2015 : 1115 - 1119 .

LI G , ZHU C B . A three-pathway psychobiological framework of salient object detection using stereoscopic technology [C ] // 2017 IEEE International Conference on Computer Vision Workshops (ICCVW) . Piscataway : IEEE , 2018 : 3008 - 3014 .

NIU Y Z , GENG Y J , LI X Q , et al . Leveraging stereopsis for saliency analysis [C ] // 2012 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2012 : 454 - 461 .

ACHANTA R , HEMAMI S , ESTRADA F , et al . Frequency-tuned salient region detection [C ] // 2009 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2009 : 1597 - 1604 .

FAN D P , GONG C , CAO Y , et al . Enhanced-alignment measure for binary foreground map evaluation [C ] // Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence . Freiburg : Morgan Kaufmann , 2018 : 698 - 704 .

ZHAI Y J , FAN D P , YANG J F , et al . Bifurcated backbone strategy for RGB-D salient object detection [J ] . IEEE Transactions on Image Processing , 2021 , 30 : 8727 - 8742 .

PANG Y W , ZHANG L H , ZHAO X Q , et al . Hierarchical dynamic filtering network for RGB-D salient object detection [C ] // Computer Vision — ECCV 2020 . Cham : Springer International Publishing , 2020 : 235 - 252 .

LI G Y , LIU Z , LING H B . ICNet: Information conversion network for RGB-D based salient object detection [J ] . IEEE Transactions on Image Processing , 2020 , 29 : 4873 - 4884 .

LI G Y , LIU Z , YE L W , et al . Cross-modal weighting network for RGB-D salient object detection [C ] // Computer Vision — ECCV 2020 . Cham : Springer International Publishing , 2020 : 665 - 681 .

LIU N , ZHANG N , WAN K Y , et al . Visual saliency transformer [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2022 : 4702 - 4712 .

SUN P , ZHANG W H , WANG H Y , et al . Deep RGB-D saliency detection with depth-sensitive attention and automatic multi-modal fusion [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 1407 - 1417 .

JIN W D , XU J , HAN Q , et al . CDNet: Complementary depth network for RGB-D salient object detection [J ] . IEEE Transactions on Image Processing , 2021 , 30 : 3376 - 3390 .

ZHAO Y F , ZHAO J W , LI J , et al . RGB-D salient object detection with ubiquitous target awareness [J ] . IEEE Transactions on Image Processing , 2021 , 30 : 7717 - 7731 .

ZHANG W B , JIANG Y , FU K R , et al . BTS-net: Bi-directional transfer-and-selection network for RGB-D salient object detection [C ] // 2021 IEEE International Conference on Multimedia and Expo (ICME) . Piscataway : IEEE , 2021 : 1 - 6 .

ZHANG W B , JI G P , WANG Z , et al . Depth quality-inspired feature manipulation for efficient RGB-D salient object detection [C ] // Proceedings of the 29th ACM International Conference on Multimedia . New York : ACM , 2021 : 731 - 740 .

LIU Z Y , WANG Y , TU Z Z , et al . TriTransNet: RGB-D salient object detection with a triplet transformer embedding network [C ] // Proceedings of the 29th ACM International Conference on Multimedia . New York : ACM , 2021 : 4481 - 4490 .

ZHANG C , CONG R M , LIN Q W , et al . Cross-modality discrepant interaction network for RGB-D salient object detection [C ] // Proceedings of the 29th ACM International Conference on Multimedia . New York : ACM , 2021 : 2094 - 2102 .

WU Y H , LIU Y , XU J , et al . MobileSal: Extremely efficient RGB-D salient object detection [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022 , 44 ( 12 ): 10261 - 10269 .

CHENG X L , ZHENG X , PEI J L , et al . Depth-induced gap-reducing network for RGB-D salient object detection: An interaction, guidance and refinement approach [J ] . IEEE Transactions on Multimedia , 2023 , 25 : 4253 - 4266 .

ZHOU W J , ZHU Y , LEI J S , et al . CCAFNet: Crossflow and cross-scale adaptive fusion network for detecting salient objects in RGB-D images [J ] . IEEE Transactions on Multimedia , 2022 , 24 : 2192 - 2204 .

GAO W , LIAO G B , MA S W , et al . Unified information fusion network for multi-modal RGB-D and RGB-T salient object detection [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2022 , 32 ( 4 ): 2091 - 2106 .

ZHANG M , YAO S Y , HU B Q , et al . C 2 DFNet:Criss-cross dynamic filter network for RGB-D salient object detection [J ] . IEEE Transactions on Multimedia , 2023 , 25 : 5142 - 5154 .

PANG Y W , ZHAO X Q , ZHANG L H , et al . CAVER: Cross-modal view-mixed Transformer for bi-modal salient object detection [J ] . IEEE Transactions on Image Processing , 2023 , 32 : 892 - 904 .

LIU Z Y , TAN Y C , HE Q , et al . SwinNet: Swin transformer drives edge-aware RGB-D and RGB-T salient object detection [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2022 , 32 ( 7 ): 4486 - 4497 .

ZHOU T , FU H Z , CHEN G , et al . Specificity-preserving RGB-D saliency detection [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2022 : 4661 - 4671 .

ZHAO H S , SHI J P , QI X J , et al . Pyramid scene parsing network [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 6230 - 6239 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于边界特征融合和前景引导的伪装目标检测

基于显著图的电磁信号对抗样本生成方法

基于深度学习的显著性目标检测方法综述