3D Object Detection Based on Multilayer Multimodal Fusion

ZHOU Zhi-guo; MA Wen-hao

doi:10.12263/DZXB.20220593

您当前的位置：

首页 >

文章列表页 >

3D Object Detection Based on Multilayer Multimodal Fusion

PAPERS | 更新时间：2025-12-11

- 3D Object Detection Based on Multilayer Multimodal Fusion
- ACTA ELECTRONICA SINICA Vol. 52, Issue 3, Pages: 696-708(2024)
- 作者机构：
  
  北京理工大学集成电路与电子学院，北京 100081
- 作者简介：
- 基金信息：
  
  Equipment Pre-Research Field Fundation(61403120109)
- DOI：10.12263/DZXB.20220593
  CLC： TP391.4
- Received：23 May 2022，
  
  Revised：2022-11-09，
  
  Published：25 March 2024
- 稿件说明：
移动端阅览
周治国,马文浩.一种多层多模态融合3D目标检测方法[J].电子学报,2024,52(03):696-708.

ZHOU Zhi-guo, MA Wen-hao.3D Object Detection Based on Multilayer Multimodal Fusion[J].Acta Electronica Sinica, 2024, 52(03): 696-708.
周治国,马文浩.一种多层多模态融合3D目标检测方法[J].电子学报,2024,52(03):696-708. DOI：10.12263/DZXB.20220593

ZHOU Zhi-guo, MA Wen-hao.3D Object Detection Based on Multilayer Multimodal Fusion[J].Acta Electronica Sinica, 2024, 52(03): 696-708. DOI：10.12263/DZXB.20220593

摘要

在自动驾驶感知系统中视觉传感器与激光雷达是关键的信息来源，但在目前的3D目标检测任务中大部分纯点云的网络检测能力都优于图像和激光点云融合的网络，现有的研究将其原因总结为图像与雷达信息的视角错位以及异构特征难以匹配，单阶段融合算法难以充分融合二者的特征.为此，本文提出一种新的多层多模态融合的3D目标检测方法：首先，前融合阶段通过在2D检测框形成的锥视区内对点云进行局部顺序的色彩信息（Red Green Blue， RGB）涂抹编码；然后将编码后点云输入融合了自注意力机制上下文感知的通道扩充PointPillars检测网络；后融合阶段将2D候选框与3D候选框在非极大抑制之前编码为两组稀疏张量，利用相机激光雷达对象候选融合网络得出最终的3D目标检测结果.在KITTI数据集上进行的实验表明，本融合检测方法相较于纯点云网络的基线上有了显著的性能提升，平均mAP提高了6.24%.

Abstract

Camera and lidar are the key sources of information in autonomous vehicles (AVs) . However

in the current 3D object detection tasks

most of the pure point cloud network detection capabilities are better than those of image and laser point cloud fusion networks. Existing studies summarize the reasons for this as the misalignment of view between image and radar information and the difficulty of matching heterogeneous features. Single-stage fusion algorithm is difficult to fully fuse the features of both. For this reason

a nova 3D object detection based on multilayer multimodal fusion (3DMMF) is presented. First

in the early-fusion phase

point clouds are encoded locally by Frustum-RGB-PointPainting (FRP) formed by the 2D detection frame. Then

the encoded point cloud input is combined with the self-attention mechanism context-aware channel to expand the PointPillars detection network. In the later-fusion phase

2D and 3D candidate boxes are coded as two sets of sparse tensors before they are not greatly suppressed

and the final 3D target detection result is obtained by using the camera lidar object candidates fusion (CLOCs) network. Experiments on KITTI datasets show that this fusion detection method has a significant performance improvement over the baseline of pure point cloud networks

with an average mAP improvement of 6.24%.

关键词

Keywords

references

沈峘 , 李舜酩 , 柏方超 , 等 . 路面车辆实时检测与跟踪的视觉方法 [J ] . 光学学报 , 2010 , 30 ( 4 ): 1076 - 1083 .

SHEN H , LI S M , BAI F C , et al . Visual method for real-time detection and tracking of road vehicles [J ] . Acta Optica Sinica , 2010 , 30 ( 4 ): 1076 - 1083 . (in Chinese)

于洁潇 , 张美琪 , 苏育挺 . 基于双目视觉的三维车辆检测算法 [J ] . 激光与光电子学进展 , 2021 , 58 ( 2 ): 0215004 .

YU J X , ZHANG M Q , SU Y T . 3D vehicle detection algorithm based on binocular vision [J ] . Laser & Optoelectronics Progress , 2021 , 58 ( 2 ): 0215004 . (in Chinese)

GEIGER A , LENZ P , URTASUN R . Are we ready for autonomous driving? The KITTI vision benchmark suite [C ] // 2012 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2012 : 3354 - 3361 .

LANG A H , VORA S , CAESAR H , et al . PointPillars: Fast encoders for object detection from point clouds [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 12689 - 12697 .

SHI S S , WANG X G , LI H S . PointRCNN: 3D object proposal generation and detection from point cloud [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 770 - 779 .

ZHOU Y , TUZEL O . VoxelNet: End-to-end learning for point cloud based 3D object detection [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 4490 - 4499 .

QI C R , LIU W , WU C X , et al . Frustum PointNets for 3D object detection from RGB-D data [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 918 - 927 .

CHEN X Z , MA H M , WAN J , et al . Multi-view 3D object detection network for autonomous driving [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 6526 - 6534 .

KU J , MOZIFIAN M , LEE J , et al . Joint 3D proposal generation and object detection from view aggregation [C ] // 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . Piscataway : IEEE , 2018 : 1 - 8 .

VORA S , LANG A H , HELOU B , et al . PointPainting: Sequential fusion for 3D object detection [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 4603 - 4611 .

XIE L , XIANG C , YU Z X , et al . PI-RCNN: An efficient multi-sensor 3D object detector with point-based attentive cont-conv fusion module [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2020 , 34 ( 7 ): 12460 - 12467 .

BHATTACHARYYA P , HUANG C J , CZARNECKI K . SA-Det3D: Self-attention based context-aware 3D object detection [C ] // 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) . Piscataway : IEEE , 2021 : 3022 - 3031 .

PANG S , MORRIS D , RADHA H . CLOCs: Camera-LiDAR object candidates fusion for 3D object detection [C ] // 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . Piscataway : IEEE , 2020 : 10386 - 10393 .

CAESAR H , BANKITI V , LANG A H , et al . nuScenes: A multimodal dataset for autonomous driving [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 11618 - 11628

REDMON J , FARHADI A . YOLOv3: An incremental improvement [EB/OL ] . ( 2018-04-08 )[ 2022-04-20 ] . https://arxiv.org/abs/1804.02767.pdf https://arxiv.org/abs/1804.02767.pdf .

REDMON J , DIVVALA S , GIRSHICK R , et al . You only look once: Unified, real-time object detection [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 779 - 788 .

REDMON J , FARHADI A . YOLO9000: Better, faster, stronge [EB/OL ] . 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 6517 - 6525 .

LIN T Y , GOYAL P , GIRSHICK R , et al . Focal loss for dense object detection [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2020 , 42 ( 2 ): 318 - 327 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

The Multi-Behavior Graph Contrastive Learning Recommendation Method with Self-Attention Mechanism

User Trajectory Identification Based on Expandable Self-Attention Spatio-Temporal Graph Convolutional Neural Networks

Self-Attention Neural Machine Translation for Automatic Software Repair

An Fusion SLAM Method for LiDAR Visual and IMU Based on Factor Map Elimination Optimization

Related Author

QIAN Zhong-sheng

HUANG Heng

WAN Zi-long

LEI Tian-liang

JI Li-xin

WANG Geng-run

LIU Shu-xin

WU Lan

Related Institution

School of Computer and Artificial Intelligence, Jiangxi University of Finance & Economics

Institute of Information Technology, Information Engineering University

Henan International Joint Laboratory of Grain Information Processing， Henan University of Technology

College of Information Science and Engineering， Henan University of Technology

Key Laboratory of Grain Information Processing and Control （Henan University of Technology）， Ministry of Education

⁰