面向交通场景基于双注意力机制和自适应代价卷的自监督单目深度估计

武港; 刘威; 胡骏; 程帅; 杨文兴; 孙令岿

doi:10.12263/DZXB.20220710

您当前的位置：

首页 >

文章列表页 >

面向交通场景基于双注意力机制和自适应代价卷的自监督单目深度估计

学术论文 | 更新时间：2025-12-11

- 面向交通场景基于双注意力机制和自适应代价卷的自监督单目深度估计
- Self-Supervised Monocular Depth Estimation for Traffic Scenes Based on Dual Attention Mechanism and Adaptive Cost Volume
- 电子学报 2024年52卷第5期页码：1670-1678
- 作者机构：
  
  1.东北大学信息科学与工程学院,辽宁沈阳 110167
  2.东软睿驰汽车技术有限公司,辽宁沈阳 110179
  3.东北大学计算机科学与工程学院,辽宁沈阳 110167
- 作者简介：
  
  [ "武港男,1997年2月出生于山西省临汾市.就读于东北大学信息科学与工程学院.主要研究方向为深度估计、目标检测等.E-mail: 914766938@qq.com" ]
  [ "刘威男,1975年6月出生于辽宁省沈阳市.就职于东北大学计算机科学与工程学院.主要研究方向为计算机视觉、深度学习、多传感器融合及路径规划控制.E-mail: lwei@neusoft.com" ]
  [ "胡骏男,1985年12月出生于安徽省滁州市.就读于东北大学计算机科学与工程学院.主要研究方向为计算机视觉、自动驾驶.E-mail: hu.jun@reachauto.com" ]
  [ "程帅男,1987年8月出生于内蒙古呼伦贝尔市.就职于东软睿驰汽车技术有限公司自动驾驶业务线.主要研究方向为计算机视觉、深度学习.E-mail: cheng.shuai@reachauto.com" ]
  [ "杨文兴男,1992年11月出生于内蒙古呼伦贝尔市.现为东北大学计算机科学与工程学院在职博士，就职于东软睿驰汽车技术有限公司自动驾驶业务线.研究方向为自动驾驶相关的视觉感知及预测.E-mail: yang.wx@reachauto.com" ]
  [ "孙令岿男,1998年12月出生于山东省济宁市.就读于东北大学信息科学与工程学院.主要研究方向为自动驾驶感知算法.E-mail: sunlingkui@163.com" ]
- 基金信息：
  
  辽宁省“兴辽人才计划”项目(XLYC1902029);辽宁省“揭榜挂帅”科技重大专项项目(2022JH1/10400030);国家自然科学基金(U22A2043)
- DOI：10.12263/DZXB.20220710
  中图分类号： TP391.4;
- 收稿：2022-06-20，
  
  修回：2023-02-06，
  
  纸质出版：2024-05-25
- 稿件说明：
移动端阅览
武港, 刘威, 胡骏, 等. 面向交通场景基于双注意力机制和自适应代价卷的自监督单目深度估计[J]. 电子学报, 2024, 52(05): 1670-1678.

WU Gang, LIU Wei, HU Jun, et al. Self-Supervised Monocular Depth Estimation for Traffic Scenes Based on Dual Attention Mechanism and Adaptive Cost Volume[J]. Acta Electronica Sinica, 2024, 52(05): 1670-1678.
武港, 刘威, 胡骏, 等. 面向交通场景基于双注意力机制和自适应代价卷的自监督单目深度估计[J]. 电子学报, 2024, 52(05): 1670-1678. DOI：10.12263/DZXB.20220710

WU Gang, LIU Wei, HU Jun, et al. Self-Supervised Monocular Depth Estimation for Traffic Scenes Based on Dual Attention Mechanism and Adaptive Cost Volume[J]. Acta Electronica Sinica, 2024, 52(05): 1670-1678. DOI：10.12263/DZXB.20220710

摘要

针对当前交通场景下自监督单目深度估计存在特征表达能力弱、深度图局部细节模糊、深度估计精度低的问题，提出一种基于双注意力机制和自适应代价卷的自监督单目深度估计方法.该方法首先利用双注意力机制的特征提取网络，结合通道注意力和空间注意力，对提取的场景特征进行自适应加权，增强特征表达能力.其次，根据提取的全局特征自适应的构建代价卷，引导网络学习精细的深度特征，提升网络模型对深度图局部细节的学习能力，解决现有方法深度估计精度低的问题.在自动驾驶公开数据集KITTI、Cityscapes上的实验结果表明，本文方法优于目前主流方法.

Abstract

Aiming at the problems of self-supervised monocular depth estimation in current traffic scenarios

such as weak feature expression ability

fuzzy local details of depth map and low accuracy of depth estimation

a self-supervised monocular depth estimation method based on dual attention mechanism and adaptive cost volume is proposed. Firstly

a dual attention mechanism combining channel attention and spatial attention is used to adaptively weight the extracted scene features to enhance the feature expression ability of the feature extraction network. Secondly

according to the adaptively constructed cost volume of extracting global features

the network is guided to learn fine depth features

which improves the learning ability of the network model for the local details of the depth map and solves the problem of low accuracy of existing depth estimation methods. Experimental results on public datasets KITTI and Cityscapes show that the proposed method is superior to the current mainstream methods.

关键词

Keywords

references

LI B , SHEN C , DAI Y , et al . Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs [C ] // Computer Vision & Pattern Recognition . Piscataway : IEEE , 2015 : 1119 - 1127 .

WANG P , SHEN X , LIN Z , et al . Towards unified depth and semantic prediction from a single image [C ] // Computer Vision & Pattern Recognition . Piscataway : IEEE , 2015 : 2800 - 2809 .

CAO Y , WU Z , SHEN C . Estimating depth from monocular images as classification using deep fully convolutional residual networks [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2017 , 28 ( 11 ): 3174 - 3182 .

FU H , GONG M , WANG C , et al . Deep ordinal regression network for monocular depth estimation [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition , Piscataway : IEEE , 2018 : 2002 - 2011 .

ZHOU T , BROWN M , SNAVELY N , et al . Unsupervised learning of depth and ego-motion from video [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2017 : 1851 - 1858 .

VIJAYANARASIMHAN S , RICCO S , SCHMID C , et al . Sfm-net: Learning of structure and motion from video [EB/OL ] .( 2017 )[2022 ] . https://arxiv.org/abs/1704.07804 https://arxiv.org/abs/1704.07804 .

MAHJOURIAN R , WICKE M , ANGELOVA A . Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 5667 - 5675 .

ZHAN H , GARG R , WEERASEKERA C S , et al . Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Rrecognition . Piscataway : IEEE , 2018 : 340 - 349 .

GUIZILINI V , AMBRUS R , PILLAI S , et al . 3d packing for self-supervised monocular depth estimation [C ] // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2020 : 2485 - 2494 .

SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition [EB/OL ] . ( 2014 )[2022 ] . https://arxiv.org/abs/1409.1556 https://arxiv.org/abs/1409.1556 .

HE K , ZHANG X , REN S , et al . Deep residual learning for image recognition [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2016 : 770 - 778 .

HU J , SHEN L , SUN G . Squeeze-and-excitation networks [C ] // Proceedings of the IEEE Conference on Ccomputer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 7132 - 7141 .

DAI Y , ZHU Z , RAO Z , et al . Mvs2: Deep unsupervised multi-view stereo with multi-view symmetry [C ] // 2019 International Conference on 3D Vision (3DV) . Piscataway : IEEE , 2019 : 1 - 8 .

HUANG P H , MATZEN K , KOPF J , et al . Deepmvs: Learning multi-view stereopsis [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 2821 - 2830 .

GU X , FAN Z , ZHU S , et al . Cascade cost volume for high-resolution multi-view stereo and stereo matching [C ] // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2020 : 2495 - 2504 .

WATSON J , AODHA O MAC , PRISACARIU V , et al . The temporal opportunist: Self-supervised multi-frame monocular depth [C ] // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2021 : 1164 - 1174 .

GODARD C , AODHA O MAC , FIRMAN M , et al . Digging into self-supervised monocular depth estimation [C ] // Proceedings of the IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2019 : 3828 - 3838 .

GEIGER A , LENZ P , URTASUN R . Are we ready for autonomous driving? the kitti vision benchmark suite [C ] // 2012 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2012 : 3354 - 3361 .

EIGEN D , FERGUS R . Predicting Depth, Surface normals and semantic labels with a common multi-scale convolutional architecture [C ] // 2015 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2014 : 1 - 12 .

CORDTS M , OMRAN M , RAMOS S , et al . The cityscapes dataset for semantic urban scene understanding [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2016 : 3213 - 3223 .

CASSER V , PIRK S , MAHJOURIAN R , et al . Unsupervised monocular depth and ego-motion learning with structure and semantics [C ] // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops . Piscataway : IEEE , 2019 : 1 - 10 .

GORDON A , LI H , JONSCHKOWSKI R , et al . Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras [C ] // Proceedings of the IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2019 : 8977 - 8986 .

RANJAN A , JAMPANI V , BALLES L , et al . Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation [C ] // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2019 : 12240 - 12249 .

LUO C , YANG Z , WANG P , et al . Every pixel counts++: Joint learning of geometry and motion with 3d holistic understanding [J ] . IEEE transactions on Pattern Analysis and Machine Intelligence , 2019 , 42 ( 10 ): 2624 - 2641 .

JOHNSTON A , CARNEIRO G . Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume [C ] // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2020 : 4756 - 4765 .

LI H , GORDON A , ZHAO H , et al . Unsupervised monocular depth learning in dynamic scenes [EB/OL ] . ( 2020 )[2022 ] . https://arxiv.org/abs/2010.16404 https://arxiv.org/abs/2010.16404 .

PATIL V , VAN GANSBEKE W , DAI D , et al . Don’t forget the past: Recurrent depth estimation from monocular video [J ] . IEEE Robotics and Automation Letters , 2020 , 5 ( 4 ): 6813 - 6820 .

WANG J , ZHANG G , WU Z , et al . Self-supervised joint learning framework of depth estimation via implicit cues [EB/OL ] . ( 2020 )[2022 ] . https://arxiv.org/abs/2006.09876 https://arxiv.org/abs/2006.09876 .

PILZER A , XU D , PUSCAS M , et al . Unsupervised adversarial depth estimation using cycled generative networks [C ] // 2018 International Conference on 3D Vision (3DV) . Piscataway : IEEE , 2018 : 587 - 595 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于邻域与超图协作的会话推荐

基于EIMYOLO的高分遥感图像目标检测

基于多重注意力和感知加权学习的单图像高动态范围重建

面向不同挑战及同异质信息分离的RGBT跟踪