Binocular Vision Depth Estimation Method Based on Geometric Prior Knowledge Constraints

ZHANG Zehui; WANG Yang; CHEN Boyang; ZHANG Haoxuan; XU Xiaobin; WU Fulong; CHENG Shenglong; SHAO Haibin; LI Hao

doi:10.12263/DZXB.20250504

您当前的位置：

首页 >

文章列表页 >

Binocular Vision Depth Estimation Method Based on Geometric Prior Knowledge Constraints

PAPERS | 更新时间：2026-06-04

- Binocular Vision Depth Estimation Method Based on Geometric Prior Knowledge Constraints
- ACTA ELECTRONICA SINICA Vol. 54, Issue 1, Pages: 195-205(2026)
- 作者机构：
  
  1.杭州电子科技大学，浙江杭州 310018
  2.中汽信息科技（天津）有限公司，天津 300399
  3.上海交通大学，上海 200240
  4.宁夏石化银骏安全技术咨询有限公司，宁夏银川 750000
- 作者简介：
- 基金信息：
  
  National Natural Science Foundation of China(52401376);Zhejiang Provincial Natural Science Foundation(LTGG24F030004);“Pioneer” and “Leading Goose” R&D Program of Zhejiang(2025C04005);Science and Technology Project of Quzhou(2024K154)
- DOI：10.12263/DZXB.20250504
  CLC： TP391;
- Received：12 June 2025，
  
  Accepted：12 January 2026，
  
  Published：25 January 2026
- 稿件说明：
移动端阅览
张泽辉, 王阳, 陈博洋, 等. 基于几何先验知识约束的双目视觉深度估计方法[J]. 电子学报, 2026, 54(01): 195-205.

ZHANG Zehui, WANG Yang, CHEN Boyang, et al. Binocular Vision Depth Estimation Method Based on Geometric Prior Knowledge Constraints[J]. Acta Electronica Sinica, 2026, 54(01): 195-205.
张泽辉, 王阳, 陈博洋, 等. 基于几何先验知识约束的双目视觉深度估计方法[J]. 电子学报, 2026, 54(01): 195-205. DOI：10.12263/DZXB.20250504

ZHANG Zehui, WANG Yang, CHEN Boyang, et al. Binocular Vision Depth Estimation Method Based on Geometric Prior Knowledge Constraints[J]. Acta Electronica Sinica, 2026, 54(01): 195-205. DOI：10.12263/DZXB.20250504

摘要

近年来，随着自动驾驶、机器人导航及三维重建等领域的迅速发展，深度估计技术作为感知环境三维结构的关键手段，受到广泛关注。然而，现有基于监督学习的深度估计方法虽然在特定数据集上表现优异，但其泛化能力较弱，且依赖大规模、高质量的标注数据，这严重限制了其在真实工业场景中的应用。因此，本研究提出一种基于几何先验知识约束的双目视觉深度估计方法。首先，组合残差卷积与上下文编码器，从图像数据中提取多尺度特征。接下来，利用特征金字塔结构捕捉不同尺度匹配信息，并保留图像边缘结构细节。然后，设计多级门控制循环（Gated Recurrent Unit，GRU）单元结合不同尺度特征信息对特征匹配参数进行更新，优化视差匹配结果，实现双目视觉深度估计。特别地，本文构建了一种结合监督信号与物理先验的混合损失函数。该函数在传统监督损失的基础上，引入了源自自监督学习范式的几何约束作为正则化项，具体包括左右视差一致性约束和视差结构一致性约束。其中，左右一致性约束通过强制左右视图预测视差满足几何对应关系，以增强模型的几何理解并缓解遮挡区域的误匹配，而结构一致性约束则通过引导视差图在纹理平坦区域保持平滑、在物体边缘处保持清晰，进而提升深度图的结构完整性与视觉质量，以实现增强双目视觉深度估计模型的泛化能力。为验证所提方法的有效性，本文在KITTI 2015和Middlebury等公开数据集上进行训练与评估，并利用SceneFlow数据集进行跨数据集泛化性能测试。实验结果表明，引入几何先验约束后，基线模型的性能得到稳定提升，在KITTI数据集上，端点误差（End-Point Error，EPE）降低了3%~5%，综合误匹配率（D1-all）降低了5%~8%。同时，在Middlebury数据集上的结果进一步证实了该方法在不同场景下的良好泛化性与鲁棒性。消融实验验证了各模块的贡献，超参数敏感性实验确定了损失函数权重的最优配置。此外，迁移实验表明，本文提出的几何先验约束机制具有良好的可移植性，能够适配于多种主流深度估计网络架构，并普遍带来性能增益。

Abstract

In recent years

with the rapid development of fields such as autonomous driving

robot navigation

and 3D reconstruction

depth estimation technology

as a key means of perceiving the three-dimensional structure of the environment

has garnered widespread attention. However

although the existing deep estimation methods based on supervised learning perform well on specific datasets

their generalization ability is weak and they rely on large-scale

high-quality labeled data

which severely limits their application in real industrial scenarios. Hence

this study proposes a binocular vision depth estimation method based on geometric prior knowledge constraints. First

this study combines residual convolution with the context encoder to extract multi-scale features from image data

and utilizes the feature pyramid structure to capture matching information at different scales for retaining the edge structure details of the image. Then

a multi-level gated recurrent unit (GRU) unit is designed to update the feature matching parameters in combination with feature information of different scales

optimize the disparity matching results

and achieve binocular vision depth estimation. Notably

this paper constructs a hybrid loss function that combines supervised signals with physical priors. Based on the traditional supervised loss

this function introduces geometric constraints derived from the self-supervised learning paradigm as regularization terms

specifically including the left-right disparity consistency constraint and the disparity structure consistency constraint. The left-right consistency constraint enforces geometric correspondence between the predicted disparities of the left and right views

enhancing the model geometric understanding and mitigating mismatches in occluded areas. The structural consistency constraint guides the disparity map to remain smooth in texture-flat regions and sharp at object edges

thereby improving the structural integrity and visual quality of the depth map

ultimately enhancing the generalization capability of the binocular vision depth estimation model. To verify the effectiveness of the proposed method

this paper conducts training and evaluation on public datasets such as KITTI 2015 and Middlebury

and uses the SceneFlow dataset for cross-dataset generalization performance. Experimental results show that after introducing geometric prior constraints

the baseline model’s performance is consistently improved: on the KITTI dataset

the endpoint error (EPE) is reduced by 3% to 5%

and the overall mismatch rate (D1-all) is reduced by 5% to 8%. Simultaneously

results on the Middlebury dataset further confirm the method’s good generalization and robustness across different scenarios. Ablation experiments verify the contributions of each module

while hyperparameter sensitivity experiments determine the optimal configuration for the loss function weights. Additionally

transfer experiments demonstrate that the proposed geometric prior constraint mechanism exhibits good portability

adapting to various mainstream depth estimation network architectures and generally providing performance gains.

关键词

Keywords

references

Wofk D , Ma Fangchang , Yang T J , et al . FastDepth: Fast monocular depth estimation on embedded systems [C ] // 2019 International Conference on Robotics and Automation . Piscataway : IEEE , 2019 : 6101 - 6108 . DOI: 10.1109/icra.2019.8794182 http://dx.doi.org/10.1109/icra.2019.8794182

He Qingdong , Wang Zhengning , Zeng Hao , et al . Stereo RGB and deeper LIDAR-based network for 3D object detection in autonomous driving [J ] . IEEE Transactions on Intelligent Transportation Systems , 2023 , 24 ( 1 ): 152 - 162 . DOI: 10.1109/tits.2022.3215766 http://dx.doi.org/10.1109/tits.2022.3215766

曲熠 , 陈莹 . 基于尺度线索增强的无监督单目深度估计 [J ] . 电子学报 , 2024 , 52 ( 9 ): 3217 - 3227 .

Qu Yi , Chen Ying . Unsupervised monocular depth estimation based on scale clue enhancement [J ] . Acta Electronica Sinica , 2024 , 52 ( 9 ): 3217 - 3227 . (in Chinese)

周晓清 , 王翔 , 郑锦 , 等 . 基于自适应空间稀疏化的高效多视图立体匹配 [J ] . 电子学报 , 2023 , 51 ( 11 ): 3079 - 3091 .

Zhou Xiaoqing , Wang Xiang , Zheng Jin , et al . Adaptive spatial sparsification for efficient multi-view stereo matching [J ] . Acta Electronica Sinica , 2023 , 51 ( 11 ): 3079 - 3091 . (in Chinese)

Kendall A , Martirosyan H , Dasgupta S , et al . End-to-end learning of geometry and context for deep stereo regression [C ] // 2017 IEEE International Conference on Computer Vision . Piscataway : IEEE , 2017 : 66 - 75 . DOI: 10.1109/iccv.2017.17 http://dx.doi.org/10.1109/iccv.2017.17

Chang Jiaren , Chen Yongsheng . Pyramid stereo matching network [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 5410 - 5418 . DOI: 10.1109/cvpr.2018.00567 http://dx.doi.org/10.1109/cvpr.2018.00567

Laga H , Jospin L V , Boussaid F , et al . A survey on deep learning techniques for stereo-based depth estimation [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022 , 44 ( 4 ): 1738 - 1764 . DOI: 10.1109/tpami.2020.3032602 http://dx.doi.org/10.1109/tpami.2020.3032602

张振宇 , 杨健 . 基于元学习的双目深度估计在线适应算法 [J ] . 自动化学报 , 2023 , 49 ( 7 ): 1446 - 1455 .

Zhang Zhenyu , Yang Jian . Online adaptation through meta-learning for stereo depth estimation [J ] . Acta Automatica Sinica , 2023 , 49 ( 7 ): 1446 - 1455 . (in Chinese)

Xu Haofei , Zhang Juyong . AANet: Adaptive aggregation network for efficient stereo matching [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 1956 - 1965 . DOI: 10.1109/cvpr42600.2020.00203 http://dx.doi.org/10.1109/cvpr42600.2020.00203

Yang Guanglei , Rota P , Alameda-Pineda X , et al . Variational structured attention networks for deep visual representation learning [J ] . IEEE Transactions on Image Processing , 2024 : 3137647 . DOI: 10.1109/tip.2021.3137647 http://dx.doi.org/10.1109/tip.2021.3137647

Tankovich V , Häne C , Zhang Yinda , et al . HITNet: Hierarchical iterative tile refinement network for real-time stereo matching [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2021 : 14357 - 14367 . DOI: 10.1109/cvpr46437.2021.01413 http://dx.doi.org/10.1109/cvpr46437.2021.01413

Li Jiankun , Wang Peisen , Xiong Pengfei , et al . Practical stereo matching via cascaded recurrent network with adaptive correlation [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 16242 - 16251 . DOI: 10.1109/cvpr52688.2022.01578 http://dx.doi.org/10.1109/cvpr52688.2022.01578

Godard C , Mac Aodha O , Brostow G J . Unsupervised monocular depth estimation with left-right consistency [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2017 : 6602 - 6611 . DOI: 10.1109/cvpr.2017.699 http://dx.doi.org/10.1109/cvpr.2017.699

Lipson L , Teed Z , Deng Jia . RAFT-stereo: Multilevel recurrent field transforms for stereo matching [C ] // 2021 International Conference on 3D Vision . Piscataway : IEEE , 2021 : 218 - 227 . DOI: 10.1109/3dv53792.2021.00032 http://dx.doi.org/10.1109/3dv53792.2021.00032

Hirschmuller H . Stereo processing by semiglobal matching and mutual information [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2008 , 30 ( 2 ): 328 - 341 . DOI: 10.1109/tpami.2007.1166 http://dx.doi.org/10.1109/tpami.2007.1166

Ma Tao , Zhu Hangbiao , Huang Weijian , et al . Stereo image dense matching based on SGM constrained by feature matching [C ] // 2023 9th International Conference on Computer and Communications . Piscataway : IEEE , 2023 : 1911 - 1915 . DOI: 10.1109/iccc59590.2023.10507454 http://dx.doi.org/10.1109/iccc59590.2023.10507454

王笛 , 胡辽林 . 基于双目视觉的改进特征立体匹配方法 [J ] . 电子学报 , 2022 , 50 ( 1 ): 157 - 166 .

Wang Di , Hu Liaolin . Improved feature stereo matching method based on binocular vision [J ] . Acta Electronica Sinica , 2022 , 50 ( 1 ): 157 - 166 . (in Chinese)

狄红卫 , 柴颖 , 李逵 . 一种快速双目视觉立体匹配算法 [J ] . 光学学报 , 2009 , 29 ( 8 ): 2180 - 2184 . DOI: 10.3788/aos20092908.2180 http://dx.doi.org/10.3788/aos20092908.2180

Di Hongwei , Chai Ying , Li Kui . A fast binocular vision stereo matching algorithm [J ] . Acta Optica Sinica , 2009 , 29 ( 8 ): 2180 - 2184 . (in Chinese) . DOI: 10.3788/aos20092908.2180 http://dx.doi.org/10.3788/aos20092908.2180

Zhang Feihu , Prisacariu V , Yang Ruigang , et al . GA-net: Guided aggregation net for end-to-end stereo matching [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2019 : 185 - 194 . DOI: 10.1109/cvpr.2019.00027 http://dx.doi.org/10.1109/cvpr.2019.00027

Duggal S , Wang Shenlong , Ma W C , et al . DeepPruner: Learning efficient stereo matching via differentiable PatchMatch [C ] // 2019 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2019 : 4383 - 4392 . DOI: 10.1109/iccv.2019.00448 http://dx.doi.org/10.1109/iccv.2019.00448

Xu Gangwei Cheng Junda , Guo Peng , et al . Attention concatenation volume for accurate and efficient stereo matching [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 12971 - 12980 . DOI: 10.1109/cvpr52688.2022.01264 http://dx.doi.org/10.1109/cvpr52688.2022.01264

谢昭 , 马海龙 , 吴克伟 , 等 . 基于采样汇集网络的场景深度估计 [J ] . 自动化学报 , 2020 , 46 ( 3 ): 600 - 612 .

Xie Zhao , Ma Hailong , Wu Kewei , et al . Sampling aggregate network for scene depth estimation [J ] . Acta Automatica Sinica , 2020 , 46 ( 3 ): 600 - 612 . (in Chinese)

陈震 , 张道文 , 张聪炫 , 等 . 基于深度匹配的由稀疏到稠密大位移运动光流估计 [J ] . 自动化学报 , 2022 , 48 ( 9 ): 2316 - 2326 .

Chen Zhen , Zhang Daowen , Zhang Congxuan , et al . Sparse-to-dense large displacement motion optical flow estimation based on deep matching [J ] . Acta Automatica Sinica , 2022 , 48 ( 9 ): 2316 - 2326 . (in Chinese)

Guo Xiaoyang , Yang Kai , Yang Wukui , et al . Group-wise correlation stereo network [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2019 : 3268 - 3277 . DOI: 10.1109/cvpr.2019.00339 http://dx.doi.org/10.1109/cvpr.2019.00339

Yang Delong , Luo Zhaohui , Shang Peng , et al . Unsupervised deep learning of depth, ego-motion, and optical flow from stereo images [C ] // 2021 9th International Conference on Traffic and Logistic Engineering . Piscataway : IEEE , 2021 : 51 - 56 . DOI: 10.1109/ictle53360.2021.9525746 http://dx.doi.org/10.1109/ictle53360.2021.9525746

Emlek A , Peker M . P3SNet: Parallel pyramid pooling stereo network [J ] . IEEE Transactions on Intelligent Transportation Systems , 2023 , 24 ( 10 ): 10433 - 10444 . DOI: 10.1109/tits.2023.3276328 http://dx.doi.org/10.1109/tits.2023.3276328

Chen Ziyang , Long Wei , Yao He , et al . MoCha-stereo: Motif channel attention network for stereo matching [C ] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2024 : 27768 - 27777 . DOI: 10.1109/cvpr52733.2024.02623 http://dx.doi.org/10.1109/cvpr52733.2024.02623

Cheng Xuelian , Zhong Yiran , Harandi M , et al . Hierarchical neural architecture search for deep stereo matching [C ] // Proceedings of the 34th International Conference on Neural Information Processing Systems . New York : ACM , 2020 : 22158 - 22169 .

Xu H F , Zhang J , Cai J F , et al . Unifying flow, stereo and depth estimation [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2023 , 45 ( 11 ): 13941 - 13958 . DOI: 10.1109/tpami.2023.3298645 http://dx.doi.org/10.1109/tpami.2023.3298645

Tosi F , Tonioni A , De Gregorio D , et al . NeRF-supervised deep stereo [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2023 : 855 - 866 . DOI: 10.1109/cvpr52729.2023.00089 http://dx.doi.org/10.1109/cvpr52729.2023.00089

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

SDDA: Unsupervised Style and Distribution Domain Adaptation Method for Nighttime Semantic Segmentation

Automatic Modulation Recognition Method via Conceal-Truth-While-Showing-Fake Strategy in Non-Cooperative Adversarial Scenarios

A Unified Remote Sensing Image Restoration Framework Integrating Frequency-Domain Physical Perception and Higher-Order Semantic Fusion

Modulation Recognition Method Based on Improved Res2Net and Adaptive Multi-Scale Window Pooling

Scene Graph Generation of Livestreaming Video via VLM Convex Optimization

Related Author

CHENG Shenglong

LI Jun

LI Yu

CHEN Li

XU Hanzheng

LIU Zheng

XU Shuwen

GUO Zekun

Related Institution

China Auto Information Technology (Tianjin) Co., Ltd

College of Computer Science and Technology, Wuhan University of Science and Technology

Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System

National Key Laboratory of Radar Signal Processing, Xidian University

School of Electronics and Information, Northwestern Polytechnical University

⁰