在自动驾驶感知系统中视觉传感器与激光雷达是关键的信息来源,但在目前的3D目标检测任务中大部分纯点云的网络检测能力都优于图像和激光点云融合的网络,现有的研究将其原因总结为图像与雷达信息的视角错位以及异构特征难以匹配,单阶段融合算法难以充分融合二者的特征.为此,本文提出一种新的多层多模态融合的3D目标检测方法:首先,前融合阶段通过在2D检测框形成的锥视区内对点云进行局部顺序的色彩信息(Red Green Blue, RGB)涂抹编码;然后将编码后点云输入融合了自注意力机制上下文感知的通道扩充PointPillars检测网络;后融合阶段将2D候选框与3D候选框在非极大抑制之前编码为两组稀疏张量,利用相机激光雷达对象候选融合网络得出最终的3D目标检测结果.在KITTI数据集上进行的实验表明,本融合检测方法相较于纯点云网络的基线上有了显著的性能提升,平均mAP提高了6.24%.
Abstract
Camera and lidar are the key sources of information in autonomous vehicles (AVs) . However
in the current 3D object detection tasks
most of the pure point cloud network detection capabilities are better than those of image and laser point cloud fusion networks. Existing studies summarize the reasons for this as the misalignment of view between image and radar information and the difficulty of matching heterogeneous features. Single-stage fusion algorithm is difficult to fully fuse the features of both. For this reason
a nova 3D object detection based on multilayer multimodal fusion (3DMMF) is presented. First
in the early-fusion phase
point clouds are encoded locally by Frustum-RGB-PointPainting (FRP) formed by the 2D detection frame. Then
the encoded point cloud input is combined with the self-attention mechanism context-aware channel to expand the PointPillars detection network. In the later-fusion phase
2D and 3D candidate boxes are coded as two sets of sparse tensors before they are not greatly suppressed
and the final 3D target detection result is obtained by using the camera lidar object candidates fusion (CLOCs) network. Experiments on KITTI datasets show that this fusion detection method has a significant performance improvement over the baseline of pure point cloud networks
SHEN H , LI S M , BAI F C , et al . Visual method for real-time detection and tracking of road vehicles [J ] . Acta Optica Sinica , 2010 , 30 ( 4 ): 1076 - 1083 . (in Chinese)
YU J X , ZHANG M Q , SU Y T . 3D vehicle detection algorithm based on binocular vision [J ] . Laser & Optoelectronics Progress , 2021 , 58 ( 2 ): 0215004 . (in Chinese)
GEIGER A , LENZ P , URTASUN R . Are we ready for autonomous driving? The KITTI vision benchmark suite [C ] // 2012 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2012 : 3354 - 3361 .
LANG A H , VORA S , CAESAR H , et al . PointPillars: Fast encoders for object detection from point clouds [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 12689 - 12697 .
SHI S S , WANG X G , LI H S . PointRCNN: 3D object proposal generation and detection from point cloud [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 770 - 779 .
ZHOU Y , TUZEL O . VoxelNet: End-to-end learning for point cloud based 3D object detection [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 4490 - 4499 .
QI C R , LIU W , WU C X , et al . Frustum PointNets for 3D object detection from RGB-D data [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 918 - 927 .
CHEN X Z , MA H M , WAN J , et al . Multi-view 3D object detection network for autonomous driving [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 6526 - 6534 .
KU J , MOZIFIAN M , LEE J , et al . Joint 3D proposal generation and object detection from view aggregation [C ] // 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . Piscataway : IEEE , 2018 : 1 - 8 .
VORA S , LANG A H , HELOU B , et al . PointPainting: Sequential fusion for 3D object detection [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 4603 - 4611 .
XIE L , XIANG C , YU Z X , et al . PI-RCNN: An efficient multi-sensor 3D object detector with point-based attentive cont-conv fusion module [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2020 , 34 ( 7 ): 12460 - 12467 .
BHATTACHARYYA P , HUANG C J , CZARNECKI K . SA-Det3D: Self-attention based context-aware 3D object detection [C ] // 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) . Piscataway : IEEE , 2021 : 3022 - 3031 .
PANG S , MORRIS D , RADHA H . CLOCs: Camera-LiDAR object candidates fusion for 3D object detection [C ] // 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . Piscataway : IEEE , 2020 : 10386 - 10393 .
CAESAR H , BANKITI V , LANG A H , et al . nuScenes: A multimodal dataset for autonomous driving [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 11618 - 11628
LIN T Y , GOYAL P , GIRSHICK R , et al . Focal loss for dense object detection [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2020 , 42 ( 2 ): 318 - 327 .