Unsupervised Monocular Depth Estimation Based on Scale Clue Enhancement

QU Yi; CHEN Ying

doi:10.12263/DZXB.20230767

您当前的位置：

首页 >

文章列表页 >

Unsupervised Monocular Depth Estimation Based on Scale Clue Enhancement

PAPERS | 更新时间：2025-12-08

- Unsupervised Monocular Depth Estimation Based on Scale Clue Enhancement
- ACTA ELECTRONICA SINICA Vol. 52, Issue 9, Pages: 3217-3227(2024)
- 作者机构：
  
  江南大学轻工过程先进控制教育部重点实验室，江苏无锡 214122
- 作者简介：
- 基金信息：
  
  National Natural Science Foundation of China(62173160)
- DOI：10.12263/DZXB.20230767
  CLC： TP391
- Received：08 August 2023，
  
  Revised：2024-06-11，
  
  Published：25 September 2024
- 稿件说明：
移动端阅览
曲熠, 陈莹. 基于尺度线索增强的无监督单目深度估计[J]. 电子学报, 2024, 52(09): 3217-3227.

QU Yi, CHEN Ying. Unsupervised Monocular Depth Estimation Based on Scale Clue Enhancement[J]. Acta Electronica Sinica, 2024, 52(09): 3217-3227.
曲熠, 陈莹. 基于尺度线索增强的无监督单目深度估计[J]. 电子学报, 2024, 52(09): 3217-3227. DOI：10.12263/DZXB.20230767

QU Yi, CHEN Ying. Unsupervised Monocular Depth Estimation Based on Scale Clue Enhancement[J]. Acta Electronica Sinica, 2024, 52(09): 3217-3227. DOI：10.12263/DZXB.20230767

摘要

由于单目深度估计中图像与深度图存在一对多的对应关系，单目深度估计本身就存在着尺度歧义的问题. 因此，本文引入基于多视图立体匹配（Multi-View Stereo，MVS）的单目多帧深度估计方法，构造移动深度，挖掘尺度线索，将传统单目深度估计与MVS深度估计有机结合，以改善单目深度估计几何建模中固有的模糊性问题.在此基础上，设计两个通道注意力模块，分别提高网络的场景结构感知能力和对局部信息的处理能力，从而更充分地融合不同尺度的特征，产生更精确、更清晰的深度预测.在KITTI数据集的测试结果中，本文方法的平均相对误差和平方相对误差相较基准网络分别最高提升4.7%和8.0%，所有误差和准确率指标均超越其他主流的无监督单目深度估计方法.

Abstract

Due to the relationship of one-to-many between images and depth maps in monocular depth estimation

there is a problem of scale ambiguity in monocular depth estimation itself. In order to improve the inherent ambiguity problem in geometric modeling of monocular depth estimation

this paper introduces a monocular multi-frame depth estimation method based on multi-view stereo (MVS) to construct moving depth and dig the scale clues. The traditional monocular depth estimation and MVS depth estimation are organically combined to improve the inherent ambiguity problem in the geometric modeling of monocular depth estimation. On this basis

two channel attention modules are designed to improve the network's ability to perceive scene structures and process local information

so as to more fully integrate features of different scales and produce more accurate and clearer depth maps.In the test results of the KITTI dataset

the average relative error and square relative error of this paper have been improved by 4.7% and 8.0% respectively compared to the baseline network

with all error and accuracy indicators surpassing other mainstream unsupervised monocular depth estimation methods.

关键词

Keywords

references

苏天康 , 宋慧慧 , 樊佳庆 , 等 . 深度信号引导学习混合变换器的高性能无监督视频目标分割 [J ] . 电子学报 , 2023 , 51 ( 5 ): 1388 - 1395 .

SU T K , SONG H H , FAN J Q , et al . Learning depth signal guided mixed transformer for high-performance unsupervised video object segmentation [J ] . Acta Electronica Sinica , 2023 , 51 ( 5 ): 1388 - 1395 . (in Chinese)

KIRAN B R , SOBH I , TALPAERT V , et al . Deep reinforcement learning for autonomous driving: A survey [J ] . IEEE Transactions on Intelligent Transportation Systems , 2022 , 23 ( 6 ): 4909 - 4926 .

ZHOU T H , BROWN M , SNAVELY N , et al . Unsupervised learning of depth and ego-motion from video [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 6612 - 6619 .

GODARD C , AODHA O MAC , FIRMAN M , et al . Digging into self-supervised monocular depth estimation [C ] // 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2019 : 3827 - 3837 .

SPENCER J , BOWDEN R , HADFIELD S . DeFeat-net: General monocular depth via simultaneous unsupervised representation learning [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 14390 - 14401 .

JOHNSTON A , CARNEIRO G . Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 4756 - 4765 .

叶星余 , 何元烈 , 汝少楠 . 基于生成式对抗网络及自注意力机制的无监督单目深度估计和视觉里程计 [J ] . 机器人 , 2021 , 43 ( 2 ): 203 - 213 .

YE X Y , HE Y L , RU S N . Unsupervised monocular depth estimation and visual odometry based on generative adversarial network and self-attention mechanism [J ] . Robot , 2021 , 43 ( 2 ): 203 - 213 . (in Chinese)

ZHANG H K , SHEN C H , LIi Y , et al . Exploiting temporal consistency for real-time video depth estimation [C ] // 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2019 : 1725 - 1734 .

WANG R , PIZER S M , FRAHM J M . Recurrent neural network for (un-) supervised learning of monocular video visual odometry and depth [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 5555 - 5564 .

WIMBAUER F , YANG N , VON STUMBERG L , et al . MonoRec: Semi-supervised dense reconstruction in dynamic environments from a single moving camera [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 6112 - 6122 .

WANG Z , BOVIK A C , SHEIKH H R , et al . Image quality assessment: From error visibility to structural similarity [J ] . IEEE Transactions on Image Processing: A Publication of the IEEE Signal Processing Society , 2004 , 13 ( 4 ): 600 - 612 .

周晓清 , 王翔 , 郑锦 , 等 . 基于自适应空间稀疏化的高效多视图立体匹配 [J ] . 电子学报 , 2023 , 51 ( 11 ): 3079 - 3091 .

ZHOU X Q , WANG X , ZHENG J , et al . Adaptive spatial sparsification for efficient multi-view stereo matching [J ] . Acta Electronica Sinica , 2023 , 51 ( 11 ): 3079 - 3091 . (in Chinese)

WATSON J , AODHA O MAC , PRISACARIU V , et al . The temporal opportunist: Self-supervised multi-frame monocular depth [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 1164 - 1174 .

FENG Z Y , YANG L , JING L L , et al . Disentangling object motion and occlusion for unsupervised multi-frame monocular depth [M ] // Lecture Notes in Computer Science . Cham : Springer Nature Switzerland , 2022 : 228 - 244 .

GUIZILINI V , AMBRUS R , CHEN D , et al . Multi-frame self-supervised depth with transformers [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 160 - 170 .

ULLMAN S . The interpretation of structure from motion [J ] . Proceedings of the Royal Society of London. Series B , Biological Sciences, 1979 , 203 ( 1153 ): 405 - 426 .

YAO Y , LUO Z X , LI S W , et al . MVSNet: Depth inference for unstructured multi-view stereo [C ] // Computer Vision-ECCV 2018 . Cham : Springer International Publishing , 2018 : 785 - 801 .

SCHÖNBERGER J L , ZHENG E L , FRAHM J M , et al . Pixelwise view selection for unstructured multi-view stereo [C ] // Computer Vision-ECCV 2016 . Cham : Springer International Publishing , 2016 : 501 - 518 .

WANG X F , ZHU Z , HUANG G , et al . Crafting monocular cues and velocity guidance for self-supervised multi-frame depth learning [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2023 , 37 ( 3 ): 2689 - 2697 .

GU X D , FAN Z W , ZHU S Y , et al . Cascade cost volume for high-resolution multi-view stereo and stereo matching [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 2495 - 2504 .

WANG F , GALLIANI S , VOGEL C , et al . PatchmatchNet: learned multi-view patchmatch stereo [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 14194 - 14203 .

HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 770 - 778 .

HWANG S J , PARK S J , BAEK J H , et al . Self-supervised monocular depth estimation using hybrid transformer encoder [J ] . IEEE Sensors Journal , 2022 , 22 ( 19 ): 18762 - 18770 .

ZHAO C Q , ZHANG Y M , POGGI M , et al . MonoViT: self-supervised monocular depth estimation with a vision transformer [C ] // 2022 International Conference on 3D Vision (3DV) . Piscataway : IEEE , 2022 : 668 - 678 .

YAN J X , ZHAO H , BU P H , et al . Channel-wise attention-based network for self-supervised monocular depth estimation [C ] // 2021 International Conference on 3D Vision (3DV) . Piscataway : IEEE , 2021 : 464 - 473 .

WANG Q L , WU B G , ZHU P F , et al . ECA-net: Efficient channel attention for deep convolutional neural networks [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 11534 - 11542 .

LIN T Y , DOLLAR P , GIRSHICK R , et al . Feature pyramid networks for object detection [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 2117 - 2125 .

WANG F , GALLIANI S , VOGEL C , et al . IterMVS: Iterative probability estimation for efficient multi-view stereo [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 8606 - 8615 .

ZHANG J Y , LI S W , LUO Z X , et al . Vis-MVSNet: Visibility-aware multi-view stereo network [J ] . International Journal of Computer Vision , 2023 , 131 ( 1 ): 199 - 214 .

LI B , DAI Y C , HE M Y . Monocular depth estimation with hierarchical fusion of dilated CNNs and soft-weighted-sum inference [J ] . Pattern Recognition , 2018 , 83 : 328 - 339 .

AKADA H , BHAT S F , ALHASHIM I , et al . Self-supervised learning of domain invariant features for depth estimation [C ] // 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) . Piscataway : IEEE , 2022 : 3377 - 3387 .

ZHOU H , GREENWOOD D , TAYLOR S . Self-supervised monocular depth estimation with internal feature fusion [C ] // The 32nd British Machine Vision Conference . Durham : BMVA , 2021 : 378 - 391 .

KLINGNER M , TERMÖHLEN J A , MIKOLAJCZYK J , et al . Self-supervised monocular depth estimation: Solving the dynamic object problem by semantic guidance [C ] // Computer Vision-ECCV 2020 . Cham : Springer International Publishing , 2020 : 582 - 600 .

CHEN Z , YE X Q , YANG W , et al . Revealing the reciprocal relations between self-supervised stereo and monocular depth estimation [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2021 : 15529 - 15538 .

ZHANG S , ZHANG J , TAO D C . Towards scale-aware, robust, and generalizable unsupervised monocular depth estimation by integrating imu motion dynamics [M ] // Lecture Notes in Computer Science . Cham : Springer Nature Switzerland , 2022 : 143 - 160 .

SUN L B , BIAN J W , ZHAN H Y , et al . SC-DepthV3: Robust self-supervised monocular depth estimation for dynamic scenes [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2024 , 46 ( 1 ): 497 - 508 .

GEIGER A , LENZ P , STILLER C , et al . Vision meets robotics: The KITTI dataset [J ] . The International Journal of Robotics Research , 2013 , 32 ( 11 ): 1231 - 1237 .

EIGEN D , PUHRSCH C , FERGUS R . Depth map prediction from a single image using a multi-scale deep network [J ] . Advances in Neural Information Processing Systems , 2014 , 3(January): 2366- 2374 .

GUIZILINI V , AMBRUS R , PILLAI S , et al . 3D packing for self-supervised monocular depth estimation [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 2485 - 2494 .

KINGMA D P , BA J . Adam: A method for stochastic optimization [EB/OL ] . ( 2017-01-30 )[ 2023-06-08 ] . https://arxiv.org/abs/1412.6980 https://arxiv.org/abs/1412.6980 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Remote Sensing Image Segmentation of Around Plateau Lakes Based on Multi-Attention Fusion

Research Advances on Deep Learning Based Single Image Dehazing

An Algorithm Based on Modified Momentum Using Restricted Boltzmann Machine

Related Author

HE Zi-fen

SHI Ben-jie

ZHANG Yin-hui

LI Su-min

JIA Tong-yao

ZHUO Li

LI Jia-feng

ZHANG Jing

Related Institution

College of Mechanical and Electrical Engineering， Kunming University of Science and Technology

School of Land and Resources Engineering， Kunming University of Science and Technology

Faculty of Information Technology, Beijing University of Technology

Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology

中国地质大学数理学院

⁰