3D Object Detection Based on Feature Distribution Convergence Guided by LiDar Point Cloud and Semantic Association

ZHENG Jin; JIANG Bo-tao; PENG Wei; WANG Sen

doi:10.12263/DZXB.20221141

您当前的位置：

首页 >

文章列表页 >

3D Object Detection Based on Feature Distribution Convergence Guided by LiDar Point Cloud and Semantic Association

PAPERS | 更新时间：2025-12-11

- 3D Object Detection Based on Feature Distribution Convergence Guided by LiDar Point Cloud and Semantic Association
- ACTA ELECTRONICA SINICA Vol. 52, Issue 5, Pages: 1700-1715(2024)
- 作者机构：
  
  1.北京航空航天大学计算机学院，北京 100191
  2.虚拟现实技术与系统全国重点实验室，北京 100191
- 作者简介：
- 基金信息：
  
  National Natural Science Foundation of China(61876014)
- DOI：10.12263/DZXB.20221141
  CLC： TP391.4;
- Received：11 October 2022，
  
  Revised：2022-12-23，
  
  Published：25 May 2024
- 稿件说明：
移动端阅览
郑锦, 蒋博韬, 彭微, 等. LiDar点云指导下特征分布趋同与语义关联的3D目标检测[J]. 电子学报, 2024, 52(05): 1700-1715.

ZHENG Jin, JIANG Bo-tao, PENG Wei, et al. 3D Object Detection Based on Feature Distribution Convergence Guided by LiDar Point Cloud and Semantic Association[J]. Acta Electronica Sinica, 2024, 52(05): 1700-1715.
郑锦, 蒋博韬, 彭微, 等. LiDar点云指导下特征分布趋同与语义关联的3D目标检测[J]. 电子学报, 2024, 52(05): 1700-1715. DOI：10.12263/DZXB.20221141

ZHENG Jin, JIANG Bo-tao, PENG Wei, et al. 3D Object Detection Based on Feature Distribution Convergence Guided by LiDar Point Cloud and Semantic Association[J]. Acta Electronica Sinica, 2024, 52(05): 1700-1715. DOI：10.12263/DZXB.20221141

摘要

针对现有基于伪点云的3D目标检测算法精度远低于基于真实激光雷达（Light Detection and ranging，LiDar）点云的3D目标检测，本文研究伪点云重构，并提出适合伪点云的3D目标检测网络.考虑到由图像深度转换得到的伪点云稠密且随深度增大逐渐稀疏，本文提出深度相关伪点云稀疏化方法，在减少后续计算量的同时保留中远距离更多的有效伪点云，实现伪点云重构.本文提出LiDar点云指导下特征分布趋同与语义关联的3D目标检测网络，在网络训练时引入LiDar点云分支来指导伪点云目标特征的生成，使生成的伪点云特征分布趋同于LiDar点云特征分布，从而降低数据源不一致造成的检测性能损失；针对RPN（Region Proposal Network）网络获取的3D候选框内的伪点云间语义关联不足的问题，设计注意力感知模块，在伪点云特征表示中通过注意力机制嵌入点间的语义关联关系，提升3D目标检测精度.在KITTI 3D目标检测数据集上的实验结果表明：现有的3D目标检测网络采用重构后的伪点云，检测精度提升了2.61%；提出的特征分布趋同与语义关联的3D目标检测网络，将基于伪点云的3D目标检测精度再提升0.57%，相比其他优秀的3D目标检测方法在检测精度上也有提升.

Abstract

In view of the accuracy of existing 3D object detection algorithms based on Pseudo-LiDar is far lower than that based on real LiDAR (Light Detection and ranging)

this paper studies the reconstruction of Pseudo-LiDar and proposes a 3D object detection algorithm suitable for Pseudo-LiDar. Considering that the Pseudo-LiDAR obtained by image depth is dense and gradually sparse along the increase of depth

a depth related Pseudo-LiDAR sparsification method is proposed to reduce the subsequent calculation amount while retaining more useful Pseudo-LiDAR in the middle and long distance

so as to realize the reconstruction of Pseudo-LiDAR. Furthermore

a 3D object detection algorithm based on object feature distribution convergence under the guidance of LiDar point cloud and semantic association is proposed. During network training

a laser point cloud branch is introduced to guide the generation of Pseudo-LiDAR object features

so that the generated Pseudo-LiDar object feature distribution converges to the feature distribution of laser point cloud object

thereby correcting the detection error caused by the difference between the two data sources. Aiming at the insufficient semantic association between Pseudo-LiDar in the 3D candidate bounding-box obtained by RPN (Region Proposal Network) network

an attention perception module is designed to embed the semantic association between points through the attention mechanism in the feature representation of Pseudo-LiDar

so as to improve the accuracy of 3D object detection. The experimental results on KITTI 3D object detection dataset show when the existing 3D object detection network adopts the reconstructed Pseudo-LiDar

the detection accuracy is improved by 2.61%. Furthermore

the proposed 3D object detection network with the feature distribution convergence and semantic association improves the accuracy by 0.57%. Compared with other excellent methods

it also improves the detection accuracy.

关键词

Keywords

references

CHEN X Z , KUNDU K , ZHU Y K , et al . 3D object proposals using stereo imagery for accurate object class detection [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2018 , 40 ( 5 ): 1259 - 1272 .

QIN Z Y , WANG J L , LU Y . Triangulation learning network: From monocular to stereo 3D object detection [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 7607 - 7615 .

LI P L , CHEN X Z , SHEN S J . Stereo R-CNN based 3D object detection for autonomous driving [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 7636 - 7644 .

SUN J M , CHEN L H , XIE Y M , et al . Disp R-CNN: Stereo 3D object detection via shape prior guided instance disparity estimation [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 10545 - 10554 .

XU Z B , ZHANG W , YE X Q , et al . ZoomNet: Part-aware adaptive zooming neural network for 3D object detection [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2020 , 34 ( 7 ): 12557 - 12564 .

CHEN Y L , LIU S , SHEN X Y , et al . DSGN: Deep stereo geometry network for 3D object detection [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 12533 - 12542 .

WANG Y , YANG B , HU R , et al . PLUMENet: Efficient 3D object detection from stereo images [EB/OL ] . ( 2021-01-17 )[ 2022-10-10 ] . https://arxiv.org/abs/2101.06594 https://arxiv.org/abs/2101.06594 .

YAN C , SALMAN E . Mono3D: Open source cell library for monolithic 3-D integrated circuits [J ] . IEEE Transactions on Circuits and Systems I: Regular Papers , 2018 , 65 ( 3 ): 1075 - 1085 .

LIMAYE A , MATHEW M , NAGORI S , et al . S S3D : Single shot 3D object detector [EB/OL ] . ( 2020-04-30 )[ 2022-10-10 ] . https://arxiv.org/abs/2004.14674 https://arxiv.org/abs/2004.14674 .

CHABOT F , CHAOUCH M , RABARISOA J , et al . Deep manta: A coarse-to-fine many task network for joint 2D and 3D vehicle analysis from monocular image [C ] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 2040 - 2049 .

LEPETIT V , MORENO-NOGUER F , FUA P . EPnP: An accurate O( n ) solution to the PnP problem [J ] . International Journal of Computer Vision , 2009 , 81 ( 2 ): 155 - 166 .

HE T , SOATTO S . Mono3D++: Monocular 3D vehicle detection with two-scale 3D hypotheses and task priors [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2019 , 33 ( 1 ): 8409 - 8416 .

MOUSAVIAN A , ANGUELOV D , FLYNN J , et al . 3D bounding box estimation using deep learning and geometry [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 5632 - 5640 .

FANG J J , ZHOU L T , LIU G Z . 3 D bounding box estimation for autonomous vehicles by cascaded geometric constraints and depurated 2detections using 3 D results [EB/OL ] . ( 2019-09-01 )[ 2022-10-10 ] . http://arxiv.org/abs/1909.01867v1 http://arxiv.org/abs/1909.01867v1 .

LIU Y X , YI Y X , LIU M . Ground-aware monocular 3D object detection for autonomous driving [J ] . IEEE Robotics and Automation Letters , 2021 , 6 ( 2 ): 919 - 926 .

WANG Y , CHAO W L , GARG D , et al . Pseudo-LiDAR from visual depth estimation: Bridging the gap in 3D object detection for autonomous driving [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 8437 - 8445 .

YOU Y R , WANG Y , CHAO W L , et al . Pseudo-LiDAR++: Accurate depth for 3D object detection in autonomous driving [EB/OL ] . ( 2019-06-14 )[ 2022-10-10 ] . http://arxiv.org/abs/1906.06310 http://arxiv.org/abs/1906.06310 .

VIANNEY J M U , AICH S , LIU B B . RefinedMPL: Refined monocular PseudoLiDAR for 3D object detection in autonomous driving [EB/OL ] . ( 2019-11-21 )[ 2022-10-10 ] . http://arxiv.org/abs/1911.09712 http://arxiv.org/abs/1911.09712 .

ZOU Z X , CHEN K Y , SHI Z W , et al . Object detection in 20 years: A survey [EB/OL ] . ( 2019-05-13 )[ 2022-10-10 ] . http://arxiv.org/abs/1905.05055 http://arxiv.org/abs/1905.05055 .

GIDARIS S , KOMODAKIS N . Object detection via a multi-region and semantic segmentation-aware CNN model [C ] // 2015 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2015 : 1134 - 1142 .

SHRIVASTAVA A , GUPTA A . Contextual priming and feedback for faster R-CNN [C ] // European Conference on Computer Vision . Cham : Springer , 2016 : 330 - 348 .

BRAHMBHATT S , CHRISTENSEN H I , HAYS J . StuffNet: Using ‘stuff’ to improve object detection [C ] // 2017 IEEE Winter Conference on Applications of Computer Vision (WACV) . Piscataway : IEEE , 2017 : 934 - 943 .

GÜNEY F , GEIGER A . Displets: Resolving stereo ambiguities using object knowledge [C ] // 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2015 : 4165 - 4175 .

KU J , MOZIFIAN M , LEE J , et al . Joint 3D proposal generation and object detection from view aggregation [C ] // 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . New York : ACM , 2018 : 1 - 8 .

VORA S , LANG A H , HELOU B , et al . PointPainting: Sequential fusion for 3D object detection [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 4603 - 4611 .

HU J , SHEN L , SUN G . Squeeze-and-excitation networks [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 7132 - 7141 .

JADERBERG M , SIMONYAN K , ZISSERMAN A , et al . Spatial transformer networks [C ] // Proceedings of the 28th International Conference on Neural Information Processing Systems . Cambridge : MIT Press , 2015 : 2017 - 2025 .

VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems . Red Hook : Curran Associates Inc. , 2017 : 6000 - 6010 .

FENG M T , ZHANG L , LIN X F , et al . Point attention network for semantic segmentation of 3D point clouds [J ] . Pattern Recognition , 2020 , 107 : 107446 .

QIU S , ANWAR S , BARNES N . Geometric back-projection network for point cloud classification [J ] . IEEE Transactions on Multimedia , 2021 , 24 : 1943 - 1955 .

GEIGER A , LENZ P , URTASUN R . Are we ready for autonomous driving? The KITTI vision benchmark suite [C ] // 2012 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2012 : 3354 - 3361 .

SHENGA H L , CAI S J , LIU Y , et al . Improving 3D object detection with channel-wise transformer [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2021 : 2723 - 2732 .

YAN Y , MAO Y X , LI B . SECOND: Sparsely embedded convolutional detection [J ] . Sensors , 2018 , 18 ( 10 ): 3337 .

RUBINSTEIN R . The cross-entropy method for combinatorial and continuous optimization [J ] . Methodology And Computing In Applied Probability , 1999 , 1 ( 2 ): 127 - 190 .

LIN T Y , GOYAL P , GIRSHICK R , et al . Focal loss for dense object detection [C ] // 2017 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2017 : 2999 - 3007 .

ZHOU Y , TUZEL O . VoxelNet: End-to-end learning for point cloud based 3D object detection [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 4490 - 4499 .

PON A D , KU J , LI C Y , et al . Object-centric stereo matching for 3D object detection [C ] // 2020 IEEE International Conference on Robotics and Automation (ICRA) . Piscataway : IEEE , 2020 : 8383 - 8389 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Research Advances on 3D Object Detection in Autonomous Driving

Related Author

ZHENG Jin

JIANG Bo-tao

PENG Wei

WANG Sen

CHEN Jian

SU Si-jiao

HUANG Li-qin

ZHAO Tie-song

Related Institution

School of Computer Science and Engineering， Beihang University

College of Physics and Information Engineering, Fuzhou University

Fujian Key Laboratory for Intelligent Processing and Wireless Transmission of Media Information

Fujian Key Lab for Intelligent Processing and Wireless Transmission of Media Information

⁰