

浏览全部资源
扫码关注微信
1.杭州电子科技大学,浙江杭州 310018
2.中汽信息科技(天津)有限公司,天津 300399
3.上海交通大学,上海 200240
4.宁夏石化银骏安全技术咨询有限公司,宁夏银川 750000
Received:12 June 2025,
Accepted:12 January 2026,
Published:25 January 2026
移动端阅览
张泽辉, 王阳, 陈博洋, 等. 基于几何先验知识约束的双目视觉深度估计方法[J]. 电子学报, 2026, 54(01): 195-205.
ZHANG Zehui, WANG Yang, CHEN Boyang, et al. Binocular Vision Depth Estimation Method Based on Geometric Prior Knowledge Constraints[J]. Acta Electronica Sinica, 2026, 54(01): 195-205.
张泽辉, 王阳, 陈博洋, 等. 基于几何先验知识约束的双目视觉深度估计方法[J]. 电子学报, 2026, 54(01): 195-205. DOI:10.12263/DZXB.20250504
ZHANG Zehui, WANG Yang, CHEN Boyang, et al. Binocular Vision Depth Estimation Method Based on Geometric Prior Knowledge Constraints[J]. Acta Electronica Sinica, 2026, 54(01): 195-205. DOI:10.12263/DZXB.20250504
近年来,随着自动驾驶、机器人导航及三维重建等领域的迅速发展,深度估计技术作为感知环境三维结构的关键手段,受到广泛关注。然而,现有基于监督学习的深度估计方法虽然在特定数据集上表现优异,但其泛化能力较弱,且依赖大规模、高质量的标注数据,这严重限制了其在真实工业场景中的应用。因此,本研究提出一种基于几何先验知识约束的双目视觉深度估计方法。首先,组合残差卷积与上下文编码器,从图像数据中提取多尺度特征。接下来,利用特征金字塔结构捕捉不同尺度匹配信息,并保留图像边缘结构细节。然后,设计多级门控制循环(Gated Recurrent Unit,GRU)单元结合不同尺度特征信息对特征匹配参数进行更新,优化视差匹配结果,实现双目视觉深度估计。特别地,本文构建了一种结合监督信号与物理先验的混合损失函数。该函数在传统监督损失的基础上,引入了源自自监督学习范式的几何约束作为正则化项,具体包括左右视差一致性约束和视差结构一致性约束。其中,左右一致性约束通过强制左右视图预测视差满足几何对应关系,以增强模型的几何理解并缓解遮挡区域的误匹配,而结构一致性约束则通过引导视差图在纹理平坦区域保持平滑、在物体边缘处保持清晰,进而提升深度图的结构完整性与视觉质量,以实现增强双目视觉深度估计模型的泛化能力。为验证所提方法的有效性,本文在KITTI 2015和Middlebury等公开数据集上进行训练与评估,并利用SceneFlow数据集进行跨数据集泛化性能测试。实验结果表明,引入几何先验约束后,基线模型的性能得到稳定提升,在KITTI数据集上,端点误差(End-Point Error,EPE)降低了3%~5%,综合误匹配率(D1-all)降低了5%~8%。同时,在Middlebury数据集上的结果进一步证实了该方法在不同场景下的良好泛化性与鲁棒性。消融实验验证了各模块的贡献,超参数敏感性实验确定了损失函数权重的最优配置。此外,迁移实验表明,本文提出的几何先验约束机制具有良好的可移植性,能够适配于多种主流深度估计网络架构,并普遍带来性能增益。
In recent years
with the rapid development of fields such as autonomous driving
robot navigation
and 3D reconstruction
depth estimation technology
as a key means of perceiving the three-dimensional structure of the environment
has garnered widespread attention. However
although the existing deep estimation methods based on supervised learning perform well on specific datasets
their generalization ability is weak and they rely on large-scale
high-quality labeled data
which severely limits their application in real industrial scenarios. Hence
this study proposes a binocular vision depth estimation method based on geometric prior knowledge constraints. First
this study combines residual convolution with the context encoder to extract multi-scale features from image data
and utilizes the feature pyramid structure to capture matching information at different scales for retaining the edge structure details of the image. Then
a multi-level gated recurrent unit (GRU) unit is designed to update the feature matching parameters in combination with feature information of different scales
optimize the disparity matching results
and achieve binocular vision depth estimation. Notably
this paper constructs a hybrid loss function that combines supervised signals with physical priors. Based on the traditional supervised loss
this function introduces geometric constraints derived from the self-supervised learning paradigm as regularization terms
specifically including the left-right disparity consistency constraint and the disparity structure consistency constraint. The left-right consistency constraint enforces geometric correspondence between the predicted disparities of the left and right views
enhancing the model geometric understanding and mitigating mismatches in occluded areas. The structural consistency constraint guides the disparity map to remain smooth in texture-flat regions and sharp at object edges
thereby improving the structural integrity and visual quality of the depth map
ultimately enhancing the generalization capability of the binocular vision depth estimation model. To verify the effectiveness of the proposed method
this paper conducts training and evaluation on public datasets such as KITTI 2015 and Middlebury
and uses the SceneFlow dataset for cross-dataset generalization performance. Experimental results show that after introducing geometric prior constraints
the baseline model’s performance is consistently improved: on the KITTI dataset
the endpoint error (EPE) is reduced by 3% to 5%
and the overall mismatch rate (D1-all) is reduced by 5% to 8%. Simultaneously
results on the Middlebury dataset further confirm the method’s good generalization and robustness across different scenarios. Ablation experiments verify the contributions of each module
while hyperparameter sensitivity experiments determine the optimal configuration for the loss function weights. Additionally
transfer experiments demonstrate that the proposed geometric prior constraint mechanism exhibits good portability
adapting to various mainstream depth estimation network architectures and generally providing performance gains.
Wofk D , Ma Fangchang , Yang T J , et al . FastDepth: Fast monocular depth estimation on embedded systems [C ] // 2019 International Conference on Robotics and Automation . Piscataway : IEEE , 2019 : 6101 - 6108 . DOI: 10.1109/icra.2019.8794182 http://dx.doi.org/10.1109/icra.2019.8794182
He Qingdong , Wang Zhengning , Zeng Hao , et al . Stereo RGB and deeper LIDAR-based network for 3D object detection in autonomous driving [J ] . IEEE Transactions on Intelligent Transportation Systems , 2023 , 24 ( 1 ): 152 - 162 . DOI: 10.1109/tits.2022.3215766 http://dx.doi.org/10.1109/tits.2022.3215766
曲熠 , 陈莹 . 基于尺度线索增强的无监督单目深度估计 [J ] . 电子学报 , 2024 , 52 ( 9 ): 3217 - 3227 .
Qu Yi , Chen Ying . Unsupervised monocular depth estimation based on scale clue enhancement [J ] . Acta Electronica Sinica , 2024 , 52 ( 9 ): 3217 - 3227 . (in Chinese)
周晓清 , 王翔 , 郑锦 , 等 . 基于自适应空间稀疏化的高效多视图立体匹配 [J ] . 电子学报 , 2023 , 51 ( 11 ): 3079 - 3091 .
Zhou Xiaoqing , Wang Xiang , Zheng Jin , et al . Adaptive spatial sparsification for efficient multi-view stereo matching [J ] . Acta Electronica Sinica , 2023 , 51 ( 11 ): 3079 - 3091 . (in Chinese)
Kendall A , Martirosyan H , Dasgupta S , et al . End-to-end learning of geometry and context for deep stereo regression [C ] // 2017 IEEE International Conference on Computer Vision . Piscataway : IEEE , 2017 : 66 - 75 . DOI: 10.1109/iccv.2017.17 http://dx.doi.org/10.1109/iccv.2017.17
Chang Jiaren , Chen Yongsheng . Pyramid stereo matching network [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 5410 - 5418 . DOI: 10.1109/cvpr.2018.00567 http://dx.doi.org/10.1109/cvpr.2018.00567
Laga H , Jospin L V , Boussaid F , et al . A survey on deep learning techniques for stereo-based depth estimation [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022 , 44 ( 4 ): 1738 - 1764 . DOI: 10.1109/tpami.2020.3032602 http://dx.doi.org/10.1109/tpami.2020.3032602
张振宇 , 杨健 . 基于元学习的双目深度估计在线适应算法 [J ] . 自动化学报 , 2023 , 49 ( 7 ): 1446 - 1455 .
Zhang Zhenyu , Yang Jian . Online adaptation through meta-learning for stereo depth estimation [J ] . Acta Automatica Sinica , 2023 , 49 ( 7 ): 1446 - 1455 . (in Chinese)
Xu Haofei , Zhang Juyong . AANet: Adaptive aggregation network for efficient stereo matching [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 1956 - 1965 . DOI: 10.1109/cvpr42600.2020.00203 http://dx.doi.org/10.1109/cvpr42600.2020.00203
Yang Guanglei , Rota P , Alameda-Pineda X , et al . Variational structured attention networks for deep visual representation learning [J ] . IEEE Transactions on Image Processing , 2024 : 3137647 . DOI: 10.1109/tip.2021.3137647 http://dx.doi.org/10.1109/tip.2021.3137647
Tankovich V , Häne C , Zhang Yinda , et al . HITNet: Hierarchical iterative tile refinement network for real-time stereo matching [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2021 : 14357 - 14367 . DOI: 10.1109/cvpr46437.2021.01413 http://dx.doi.org/10.1109/cvpr46437.2021.01413
Li Jiankun , Wang Peisen , Xiong Pengfei , et al . Practical stereo matching via cascaded recurrent network with adaptive correlation [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 16242 - 16251 . DOI: 10.1109/cvpr52688.2022.01578 http://dx.doi.org/10.1109/cvpr52688.2022.01578
Godard C , Mac Aodha O , Brostow G J . Unsupervised monocular depth estimation with left-right consistency [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2017 : 6602 - 6611 . DOI: 10.1109/cvpr.2017.699 http://dx.doi.org/10.1109/cvpr.2017.699
Lipson L , Teed Z , Deng Jia . RAFT-stereo: Multilevel recurrent field transforms for stereo matching [C ] // 2021 International Conference on 3D Vision . Piscataway : IEEE , 2021 : 218 - 227 . DOI: 10.1109/3dv53792.2021.00032 http://dx.doi.org/10.1109/3dv53792.2021.00032
Hirschmuller H . Stereo processing by semiglobal matching and mutual information [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2008 , 30 ( 2 ): 328 - 341 . DOI: 10.1109/tpami.2007.1166 http://dx.doi.org/10.1109/tpami.2007.1166
Ma Tao , Zhu Hangbiao , Huang Weijian , et al . Stereo image dense matching based on SGM constrained by feature matching [C ] // 2023 9th International Conference on Computer and Communications . Piscataway : IEEE , 2023 : 1911 - 1915 . DOI: 10.1109/iccc59590.2023.10507454 http://dx.doi.org/10.1109/iccc59590.2023.10507454
王笛 , 胡辽林 . 基于双目视觉的改进特征立体匹配方法 [J ] . 电子学报 , 2022 , 50 ( 1 ): 157 - 166 .
Wang Di , Hu Liaolin . Improved feature stereo matching method based on binocular vision [J ] . Acta Electronica Sinica , 2022 , 50 ( 1 ): 157 - 166 . (in Chinese)
狄红卫 , 柴颖 , 李逵 . 一种快速双目视觉立体匹配算法 [J ] . 光学学报 , 2009 , 29 ( 8 ): 2180 - 2184 . DOI: 10.3788/aos20092908.2180 http://dx.doi.org/10.3788/aos20092908.2180
Di Hongwei , Chai Ying , Li Kui . A fast binocular vision stereo matching algorithm [J ] . Acta Optica Sinica , 2009 , 29 ( 8 ): 2180 - 2184 . (in Chinese) . DOI: 10.3788/aos20092908.2180 http://dx.doi.org/10.3788/aos20092908.2180
Zhang Feihu , Prisacariu V , Yang Ruigang , et al . GA-net: Guided aggregation net for end-to-end stereo matching [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2019 : 185 - 194 . DOI: 10.1109/cvpr.2019.00027 http://dx.doi.org/10.1109/cvpr.2019.00027
Duggal S , Wang Shenlong , Ma W C , et al . DeepPruner: Learning efficient stereo matching via differentiable PatchMatch [C ] // 2019 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2019 : 4383 - 4392 . DOI: 10.1109/iccv.2019.00448 http://dx.doi.org/10.1109/iccv.2019.00448
Xu Gangwei Cheng Junda , Guo Peng , et al . Attention concatenation volume for accurate and efficient stereo matching [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 12971 - 12980 . DOI: 10.1109/cvpr52688.2022.01264 http://dx.doi.org/10.1109/cvpr52688.2022.01264
谢昭 , 马海龙 , 吴克伟 , 等 . 基于采样汇集网络的场景深度估计 [J ] . 自动化学报 , 2020 , 46 ( 3 ): 600 - 612 .
Xie Zhao , Ma Hailong , Wu Kewei , et al . Sampling aggregate network for scene depth estimation [J ] . Acta Automatica Sinica , 2020 , 46 ( 3 ): 600 - 612 . (in Chinese)
陈震 , 张道文 , 张聪炫 , 等 . 基于深度匹配的由稀疏到稠密大位移运动光流估计 [J ] . 自动化学报 , 2022 , 48 ( 9 ): 2316 - 2326 .
Chen Zhen , Zhang Daowen , Zhang Congxuan , et al . Sparse-to-dense large displacement motion optical flow estimation based on deep matching [J ] . Acta Automatica Sinica , 2022 , 48 ( 9 ): 2316 - 2326 . (in Chinese)
Guo Xiaoyang , Yang Kai , Yang Wukui , et al . Group-wise correlation stereo network [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2019 : 3268 - 3277 . DOI: 10.1109/cvpr.2019.00339 http://dx.doi.org/10.1109/cvpr.2019.00339
Yang Delong , Luo Zhaohui , Shang Peng , et al . Unsupervised deep learning of depth, ego-motion, and optical flow from stereo images [C ] // 2021 9th International Conference on Traffic and Logistic Engineering . Piscataway : IEEE , 2021 : 51 - 56 . DOI: 10.1109/ictle53360.2021.9525746 http://dx.doi.org/10.1109/ictle53360.2021.9525746
Emlek A , Peker M . P3SNet: Parallel pyramid pooling stereo network [J ] . IEEE Transactions on Intelligent Transportation Systems , 2023 , 24 ( 10 ): 10433 - 10444 . DOI: 10.1109/tits.2023.3276328 http://dx.doi.org/10.1109/tits.2023.3276328
Chen Ziyang , Long Wei , Yao He , et al . MoCha-stereo: Motif channel attention network for stereo matching [C ] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2024 : 27768 - 27777 . DOI: 10.1109/cvpr52733.2024.02623 http://dx.doi.org/10.1109/cvpr52733.2024.02623
Cheng Xuelian , Zhong Yiran , Harandi M , et al . Hierarchical neural architecture search for deep stereo matching [C ] // Proceedings of the 34th International Conference on Neural Information Processing Systems . New York : ACM , 2020 : 22158 - 22169 .
Xu H F , Zhang J , Cai J F , et al . Unifying flow, stereo and depth estimation [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2023 , 45 ( 11 ): 13941 - 13958 . DOI: 10.1109/tpami.2023.3298645 http://dx.doi.org/10.1109/tpami.2023.3298645
Tosi F , Tonioni A , De Gregorio D , et al . NeRF-supervised deep stereo [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2023 : 855 - 866 . DOI: 10.1109/cvpr52729.2023.00089 http://dx.doi.org/10.1109/cvpr52729.2023.00089
0
Views
14
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621