Adaptive Spatial Sparsification for Efficient Multi-View Stereo Matching

ZHOU Xiao-qing; WANG Xiang; ZHENG Jin; BAI Xiao

doi:10.12263/DZXB.20230353

您当前的位置：

首页 >

文章列表页 >

Adaptive Spatial Sparsification for Efficient Multi-View Stereo Matching

PAPERS | 更新时间：2025-12-08

- Adaptive Spatial Sparsification for Efficient Multi-View Stereo Matching
- ACTA ELECTRONICA SINICA Vol. 51, Issue 11, Pages: 3079-3091(2023)
- 作者机构：
  
  1.北京航空航天大学计算机学院,北京 100191
  2.北京航空航天大学软件开发环境国家重点实验室,北京 100191
  3.北京航空航天大学江西研究院,江西南昌 330000
- 作者简介：
- 基金信息：
  
  National Natural Science Foundation of China(62276016;62372029)
- DOI：10.12263/DZXB.20230353
  CLC： TP391
- Received：18 April 2023，
  
  Revised：2023-09-27，
  
  Published：25 November 2023
- 稿件说明：
移动端阅览
周晓清,王翔,郑锦等.基于自适应空间稀疏化的高效多视图立体匹配[J].电子学报,2023,51(11):3079-3091.

ZHOU Xiao-qing,WANG Xiang,ZHENG Jin,et al.Adaptive Spatial Sparsification for Efficient Multi-View Stereo Matching[J].ACTA ELECTRONICA SINICA,2023,51(11):3079-3091.
周晓清,王翔,郑锦等.基于自适应空间稀疏化的高效多视图立体匹配[J].电子学报,2023,51(11):3079-3091. DOI： 10.12263/DZXB.20230353.

ZHOU Xiao-qing,WANG Xiang,ZHENG Jin,et al.Adaptive Spatial Sparsification for Efficient Multi-View Stereo Matching[J].ACTA ELECTRONICA SINICA,2023,51(11):3079-3091. DOI： 10.12263/DZXB.20230353.

摘要

针对多视图立体匹配中构建和聚合匹配代价体时计算复杂度高的问题，现有研究通常采用级联架构或迭代优化方法.然而这些方法仍面临两个亟待解决的挑战：级联架构在精细阶段缩小了深度采样范围，导致深度不连续区域可能陷入低分辨率的错误估计；而迭代优化网络的推理时间随迭代次数线性增长，难以满足实时系统需求.为此，本文提出一种基于自适应空间稀疏化的高效多视图立体匹配网络.我们提出一种稀疏匹配代价体构建方法，通过在完整深度范围内稀疏采样，在降低计算复杂度的同时保持了网络对深度不连续区域的建模能力.同时，我们提出一种稀疏迭代优化方法，在迭代中通过自适应变分Dropout逐步剪枝深度值已收敛的区域，使推理时间随迭代次数亚线性增长.在DTU和Tanks & Temples公共数据集上的实验结果表明，本文方法的推理速度相比CasMVSNet和PatchmatchNet分别快1.2倍和0.35倍，同时点云重建效果优异，边缘伪影显著减少，且泛化能力表现出色.

Abstract

To reduce the high computational complexity in constructing and aggregating cost volumes for multi-view stereo matching

existing methods commonly employ cascaded architectures or iterative optimization. However

these approaches still face two main challenges. The cascaded architectures narrow down the depth sampling range during the refinement stage

which may lead to erroneous estimation of depth discontinuities. While the inference time of iterative optimization networks linearly increases with the number of iterations

making it difficult to meet the requirements of real-time systems. To address these challenges

this paper proposes an efficient multi-view stereo matching network via adaptive spatial sparsification. We introduce a sparse matching cost volume that sparsely samples within the complete depth range

reducing computational complexity while maintaining the network's ability to model depth-discontinuous regions. Meanwhile

we propose a sparse iterative optimization method that progressively prunes regions with converged depth values during iterations using adaptive variational Dropout

resulting in sub-linear growth in inference time with iteration count. Experimental results on the public datasets

DTU and Tanks & Temples

demonstrate that the proposed method achieves 1.2× and 0.35× improvements of inference speed compared to CasMVSNet and PatchmatchNet

respectively. Moreover

it exhibits excellent performance in point cloud reconstruction

effectively handles details in depth-discontinuous regions

and demonstrates outstanding generalization capability.

关键词

Keywords

references

李博洋 , 刘思健 , 崔明月 , 等 . 基于最小回环检测的多车协同SLAM框架 [J ] . 电子学报 , 2021 , 49 ( 11 ): 2241 - 2250 .

LI B Y , LIU S J , CUI M Y , et al . Multi-vehicle collaborative SLAM framework for minimum loop detection [J ] . Acta Electronica Sinica , 2021 , 49 ( 11 ): 2241 - 2250 . (in Chinese)

金紫凤 , 潘思聪 , 危辉 . 可变环境下基于位姿变换矩阵的机器人无标定手眼协调方法 [J ] . 电子学报 , 2022 , 50 ( 10 ): 2318 - 2328 .

JIN Z F , PAN S C , WEI H . Uncalibrated hand eye coordination method for robot based on pose transformation matrix in variable environment [J ] . Acta Electronica Sinica , 2022 , 50 ( 10 ): 2318 - 2328 . (in Chinese)

樊亚红 , 刘宾 , 陈平 , 等 . 基于轮廓先验约束的复杂异形工件CT成像方法研究 [J ] . 电子学报 , 2020 , 48 ( 10 ): 1976 - 1982 .

FAN Y H , LIU B , CHEN P , et al . Research on CT imaging method of complex shaped workpiece based on contour prior constraint [J ] . Acta Electronica Sinica , 2020 , 48 ( 10 ): 1976 - 1982 . (in Chinese)

SEITZ S M , DYER C R . Photorealistic scene reconstruction by voxel coloring [C ] // Proceedings of 1997 IEEE Computer Society Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2002 : 1067 - 1073 .

FURUKAWA Y , PONCE J . Accurate, dense, and robust multiview stereopsis [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2010 , 32 ( 8 ): 1362 - 1376 .

SCHÖNBERGER J L , FRAHM J M . Structure-from-motion revisited [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 4104 - 4113 .

XU Q S , TAO W B . Multi-scale geometric consistency guided multi-view stereo [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 5478 - 5487 .

YAO Y , LUO Z X , LI S W , et al . MVSNet: Depth inference for unstructured multi-view stereo [C ] // Computer Vision—ECCV 2018 . Cham : Springer International Publishing , 2018 : 785 - 801 .

DING Y K , YUAN W T , ZHU Q T , et al . TransMVSNet: Global context-aware multi-view stereo network with transformers [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 8575 - 8584 .

YAO Y , LUO Z X , LI S W , et al . Recurrent MVSNet for high-resolution multi-view stereo depth inference [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 5520 - 5529 .

CHEN R , HAN S F , XU J , et al . Point-based multi-view stereo network [C ] // 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2020 : 1538 - 1547 .

YU Z H , GAO S H . Fast-MVSNet: Sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 1946 - 1955 .

GU X D , FAN Z W , ZHU S Y , et al . Cascade cost volume for high-resolution multi-view stereo and stereo matching [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 2492 - 2501 .

CHENG S , XU Z X , ZHU S L , et al . Deep stereo using adaptive thin volume representation with uncertainty awareness [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 2521 - 2531 .

WANG F , GALLIANI S , VOGEL C , et al . IterMVS: Iterative probability estimation for efficient multi-view stereo [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 8596 - 8605 .

WANG S Q , LI B , DAI Y C . Efficient multi-view stereo by iterative dynamic cost volume [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 8645 - 8654 .

WANG F , GALLIANI S , VOGEL C , et al . PatchmatchNet: Learned multi-view patchmatch stereo [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 14189 - 14198 .

JIANG S H , LU Y , LI H D , et al . Learning optical flow from a few matches [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 16587 - 16595 .

AANÆS H , JENSEN R R , VOGIATZIS G , et al . Large-scale data for multiple-view stereopsis [J ] . International Journal of Computer Vision , 2016 , 120 ( 2 ): 153 - 168 .

KNAPITSCH A , PARK J , ZHOU Q Y , et al . Tanks and temples: Benchmarking large-scale scene reconstruction [J ] . ACM Transactions on Graphics , 2017 , 36 ( 4 ): 1 - 13 .

TORSTEN H , DAN A , TAL B N , et al . Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks [J ] . Journal of Machine Learning Research , 2021 , 22 ( 241 ): 1 - 124 .

SRIVASTAVA N , HINTON G , KRIZHEVSKY A , et al . Dropout: A simple way to prevent neural networks from overfitting [J ] . Journal of Machine Learning Research , 2014 , 15 : 1929 - 1958 .

KINGMA D P , SALIMANS T , WELLING M . Variational dropout and the local reparameterization trick [C ] // Proceedings of the 28th International Conference on Neural Information Processing Systems — Volume 2 . New York : ACM , 2015 : 2575 - 2583 .

MOLCHANOV D , ASHUKHA A , VETROV D . Variational dropout sparsifies deep neural networks [C ] // Proceedings of the 34th International Conference on Machine Learning - Volume 70 . New York : ACM , 2017 : 2498 - 2507 .

FAN , X J , ZHANG , S J , TANWISUTH , K , et al . Contextual dropout: An efficient sample-dependent dropout module [C ] // Proceedings of the 9th International Conference on Learning Representations . Appleton : ICLR , 2021 : 1 - 12 .

SEMENIUTA S , SEVERYN A , BARTH E . Recurrent dropout without memory loss [C ] // Proceedings of the 26th International Conference on Computational Linguistics . Stroudsburg : ACL , 2016 : 175 - 1766 .

LOBACHEVA E , CHIRKOVA N , MARKOVICH A , et al . Structured sparsification of gated recurrent neural networks [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2020 , 34 ( 4 ): 4989 - 4996 .

KATHAROPOULOS A , VYAS A , PAPPAS N , et al . Transformers are RNNs: Fast autoregressive transformers with linear attention [C ] // Proceedings of the 37th International Conference on Machine Learning - Volume 119 . Cambridge : JMLR , 2020 : 5156 - 5165 .

COLLINS R T . A space-sweep approach to true multi-image matching [C ] // Proceedings of 1996 IEEE Computer Society Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2002 : 358 - 363 .

XU Q S , TAO W B . Learning inverse depth regression for multi-view stereo with correlation cost volume [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2020 , 34 ( 7 ): 12508 - 12515 .

CHEN R , HAN S F , XU J , et al . Visibility-aware point-based multi-view stereo network [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2020 , 43 ( 10 ): 3695 - 3708 .

CHEN C R , CHEN X Z , CHENG H . On the over-smoothing problem of CNN based disparity estimation [C ] // 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2020 : 8996 - 9004 .

KIM S , KIM S , MIN D B , et al . LAF-net: Locally adaptive fusion networks for stereo confidence estimation [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 205 - 214 .

TEED Z , DENG J A . RAFT: Recurrent all-pairs field transforms for optical flow [C ] // Computer Vision—ECCV 2020 . Cham : Springer International Publishing , 2020 : 402 - 419 .

LI M Y , LIN J , MENG C L , et al . Efficient spatially sparse inference for conditional GANs and diffusion models [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2023 , 45 ( 12 ): 14465 - 14480 .

JIAO J B , CAO Y , SONG Y B , et al . Look deeper into depth: Monocular depth estimation with semantic booster and attention-driven loss [C ] // Computer Vision—ECCV 2018 . Cham : Springer International Publishing , 2018 : 55 - 71 .

YAO Y , LUO Z X , LI S W , et al . BlendedMVS: A large-scale dataset for generalized multi-view stereo networks [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 1787 - 1796 .

TOSI F , LIAO Y Y , SCHMITT C , et al . SMD-Nets: Stereo mixture density networks [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 8938 - 8948 .

YAN J F , WEI Z Z , YI H W , et al . Dense hybrid recurrent multi-view stereo net with dynamic consistency checking [C ] // Computer Vision—ECCV 2020 . Cham : Springer International Publishing , 2020 : 674 - 689 .

SCHÖNBERGER J L , ZHENG E L , FRAHM J M , et al . Pixelwise view selection for unstructured multi-view stereo [C ] // Computer Vision—ECCV 2016 . Cham : Springer International Publishing , 2016 : 501 - 518 .

YANG J Y , MAO W , ALVAREZ J M , et al . Cost volume pyramid based depth inference for multi-view stereo [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 4876 - 4885 .

GIANG K T , SONG S , JO S . Curvature-guided dynamic scale networks for multi-view stereo [C ] // Proceedings of the 10th International Conference on Learning Representations . Appleton : ICLR , 2022 : 1 - 16 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Progress and Prospect of 3D Reconstruction Based on Light Field Cameras

A Medical Image Segmentation Network Based on Cross-Visual State Space and Multi-Branch Interactive Attention

A Motion Planning Method for Autonomous Driving Based on Spatiotemporal Attention Transformer

Related Author

BAI Xiao

LIU Yu-xuan

ZHANG Li

AI Hai-bin

XU Biao

SUN Yu-shan

FAN Zhong-li

Yu-xuan LIU

Related Institution

Institute of Photogrammetry and Remote Sensing, Chinese Academy of Surveying and Mapping

Institute of Photogrammetry and Remote Sensing， Chinese Academy of Surveying and Mapping

College of Electronic Science and Technology, National University of Defense Technology

College of Electronic Engineering, National University of Defense Technology

School of Computer Science and Technology, Anhui University of Technology, Maanshan

⁰