一种多层多模态融合3D目标检测方法

周治国; 马文浩

doi:10.12263/DZXB.20220593

您当前的位置：

首页 >

文章列表页 >

一种多层多模态融合3D目标检测方法

学术论文 | 更新时间：2025-12-11

- 一种多层多模态融合3D目标检测方法
- 3D Object Detection Based on Multilayer Multimodal Fusion
- 电子学报 2024年52卷第3期页码：696-708
- 作者机构：
  
  北京理工大学集成电路与电子学院，北京 100081
- 作者简介：
  
  [ "周治国男，1977年9月出生于湖北省武汉市.现为北京理工大学集成电路与电子学院医学图像与信号处理研究所副教授、硕士生导师.主要研究方向包括智能无人系统、感知与导航和机器学习.在国内外发表学术论文20余篇.中国电子学会会员编号：E190015683M.E-mail: zhiguozhou@bit.edu.cn" ]
  [ "马文浩男，1996年6月出生于新疆维吾尔自治区伊宁市.现为北京理工大学集成电路与电子学院医学图像与信号处理研究所硕士，从事无人器融合感知方面的研究工作. E-mail: 3220190552@bit.edu.cn" ]
- 基金信息：
  
  装备预研领域基金(61403120109)
- DOI：10.12263/DZXB.20220593
  中图分类号： TP391.4
- 收稿：2022-05-23，
  
  修回：2022-11-09，
  
  纸质出版：2024-03-25
- 稿件说明：
移动端阅览
周治国,马文浩.一种多层多模态融合3D目标检测方法[J].电子学报,2024,52(03):696-708.

ZHOU Zhi-guo, MA Wen-hao.3D Object Detection Based on Multilayer Multimodal Fusion[J].Acta Electronica Sinica, 2024, 52(03): 696-708.
周治国,马文浩.一种多层多模态融合3D目标检测方法[J].电子学报,2024,52(03):696-708. DOI：10.12263/DZXB.20220593

ZHOU Zhi-guo, MA Wen-hao.3D Object Detection Based on Multilayer Multimodal Fusion[J].Acta Electronica Sinica, 2024, 52(03): 696-708. DOI：10.12263/DZXB.20220593

摘要

在自动驾驶感知系统中视觉传感器与激光雷达是关键的信息来源，但在目前的3D目标检测任务中大部分纯点云的网络检测能力都优于图像和激光点云融合的网络，现有的研究将其原因总结为图像与雷达信息的视角错位以及异构特征难以匹配，单阶段融合算法难以充分融合二者的特征.为此，本文提出一种新的多层多模态融合的3D目标检测方法：首先，前融合阶段通过在2D检测框形成的锥视区内对点云进行局部顺序的色彩信息（Red Green Blue， RGB）涂抹编码；然后将编码后点云输入融合了自注意力机制上下文感知的通道扩充PointPillars检测网络；后融合阶段将2D候选框与3D候选框在非极大抑制之前编码为两组稀疏张量，利用相机激光雷达对象候选融合网络得出最终的3D目标检测结果.在KITTI数据集上进行的实验表明，本融合检测方法相较于纯点云网络的基线上有了显著的性能提升，平均mAP提高了6.24%.

Abstract

Camera and lidar are the key sources of information in autonomous vehicles (AVs) . However

in the current 3D object detection tasks

most of the pure point cloud network detection capabilities are better than those of image and laser point cloud fusion networks. Existing studies summarize the reasons for this as the misalignment of view between image and radar information and the difficulty of matching heterogeneous features. Single-stage fusion algorithm is difficult to fully fuse the features of both. For this reason

a nova 3D object detection based on multilayer multimodal fusion (3DMMF) is presented. First

in the early-fusion phase

point clouds are encoded locally by Frustum-RGB-PointPainting (FRP) formed by the 2D detection frame. Then

the encoded point cloud input is combined with the self-attention mechanism context-aware channel to expand the PointPillars detection network. In the later-fusion phase

2D and 3D candidate boxes are coded as two sets of sparse tensors before they are not greatly suppressed

and the final 3D target detection result is obtained by using the camera lidar object candidates fusion (CLOCs) network. Experiments on KITTI datasets show that this fusion detection method has a significant performance improvement over the baseline of pure point cloud networks

with an average mAP improvement of 6.24%.

关键词

Keywords

references

沈峘 , 李舜酩 , 柏方超 , 等 . 路面车辆实时检测与跟踪的视觉方法 [J ] . 光学学报 , 2010 , 30 ( 4 ): 1076 - 1083 .

SHEN H , LI S M , BAI F C , et al . Visual method for real-time detection and tracking of road vehicles [J ] . Acta Optica Sinica , 2010 , 30 ( 4 ): 1076 - 1083 . (in Chinese)

于洁潇 , 张美琪 , 苏育挺 . 基于双目视觉的三维车辆检测算法 [J ] . 激光与光电子学进展 , 2021 , 58 ( 2 ): 0215004 .

YU J X , ZHANG M Q , SU Y T . 3D vehicle detection algorithm based on binocular vision [J ] . Laser & Optoelectronics Progress , 2021 , 58 ( 2 ): 0215004 . (in Chinese)

GEIGER A , LENZ P , URTASUN R . Are we ready for autonomous driving? The KITTI vision benchmark suite [C ] // 2012 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2012 : 3354 - 3361 .

LANG A H , VORA S , CAESAR H , et al . PointPillars: Fast encoders for object detection from point clouds [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 12689 - 12697 .

SHI S S , WANG X G , LI H S . PointRCNN: 3D object proposal generation and detection from point cloud [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 770 - 779 .

ZHOU Y , TUZEL O . VoxelNet: End-to-end learning for point cloud based 3D object detection [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 4490 - 4499 .

QI C R , LIU W , WU C X , et al . Frustum PointNets for 3D object detection from RGB-D data [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 918 - 927 .

CHEN X Z , MA H M , WAN J , et al . Multi-view 3D object detection network for autonomous driving [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 6526 - 6534 .

KU J , MOZIFIAN M , LEE J , et al . Joint 3D proposal generation and object detection from view aggregation [C ] // 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . Piscataway : IEEE , 2018 : 1 - 8 .

VORA S , LANG A H , HELOU B , et al . PointPainting: Sequential fusion for 3D object detection [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 4603 - 4611 .

XIE L , XIANG C , YU Z X , et al . PI-RCNN: An efficient multi-sensor 3D object detector with point-based attentive cont-conv fusion module [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2020 , 34 ( 7 ): 12460 - 12467 .

BHATTACHARYYA P , HUANG C J , CZARNECKI K . SA-Det3D: Self-attention based context-aware 3D object detection [C ] // 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) . Piscataway : IEEE , 2021 : 3022 - 3031 .

PANG S , MORRIS D , RADHA H . CLOCs: Camera-LiDAR object candidates fusion for 3D object detection [C ] // 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . Piscataway : IEEE , 2020 : 10386 - 10393 .

CAESAR H , BANKITI V , LANG A H , et al . nuScenes: A multimodal dataset for autonomous driving [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 11618 - 11628

REDMON J , FARHADI A . YOLOv3: An incremental improvement [EB/OL ] . ( 2018-04-08 )[ 2022-04-20 ] . https://arxiv.org/abs/1804.02767.pdf https://arxiv.org/abs/1804.02767.pdf .

REDMON J , DIVVALA S , GIRSHICK R , et al . You only look once: Unified, real-time object detection [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 779 - 788 .

REDMON J , FARHADI A . YOLO9000: Better, faster, stronge [EB/OL ] . 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 6517 - 6525 .

LIN T Y , GOYAL P , GIRSHICK R , et al . Focal loss for dense object detection [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2020 , 42 ( 2 ): 318 - 327 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

融合自注意力机制的多行为图对比学习推荐方法

基于可拓展自注意力时空图卷积神经网络的用户轨迹识别模型

基于自注意力机制神经机器翻译的软件缺陷自动修复方法

一种基于因子图消元优化的激光雷达视觉惯性融合SLAM方法