基于双融合框架的多模态3D目标检测算法

葛同澳; 李辉; 郭颖; 王俊印; 周迪

doi:10.12263/DZXB.20230414

您当前的位置：

首页 >

文章列表页 >

基于双融合框架的多模态3D目标检测算法

学术论文 | 更新时间：2025-12-08

- 基于双融合框架的多模态3D目标检测算法
- A Multimodal 3D Object Detection Method Based on Double-Fusion Framework
- 电子学报 2023年51卷第11期页码：3100-3110
- 作者机构：
  
  1.青岛科技大学数据科学学院，山东青岛 266000
  2.武汉理工大学计算机与人工智能学院，湖北武汉 430000
- 作者简介：
  
  [ "葛同澳男，1999年生，山东菏泽人.青岛科技大学数据科学学院硕士研究生.主要研究方向为计算机视觉、3D目标检测. E-mail: getongao24@163.com" ]
  [ "李辉（通讯作者）男，1984年生，河南平顶山人.青岛科技大学数据科学学院副教授、硕士生导师.主要研究方向为计算机视觉、3D目标检测及跟踪等." ]
  [ "郭颖女，1999年生，山东威海人.青岛科技大学数据科学学院硕士研究生.主要研究方向为计算机视觉、3D目标检测.E-mail: guoying_official@163.com" ]
  [ "王俊印男，1997年生，山东泰安人.武汉理工大学计算机与人工智能学院博士研究生.主要研究方向为计算机视觉、3D目标检测. E-mail: wjy199708@163.com" ]
  [ "周迪男，2000年生，湖北武汉人.青岛科技大学数据科学学院硕士研究生.主要研究方向为计算机视觉.中国电子学会会员编号：E190010986M.E-mail: 4022110030@mails.qust.edu.cn" ]
- 基金信息：
  
  中国高校产学研创新基金(2021ITA05047);国家自然科学基金(62002190);山东省高等学校青创科技支持计划(2019KJN047)
- DOI：10.12263/DZXB.20230414
  中图分类号： TP391.4
- 收稿：2023-05-10，
  
  修回：2023-07-19，
  
  纸质出版：2023-11-25
- 稿件说明：
移动端阅览
葛同澳,李辉,郭颖等.基于双融合框架的多模态3D目标检测算法[J].电子学报,2023,51(11):3100-3110.

GE Tong-ao,LI Hui,GUO Ying,et al.A Multimodal 3D Object Detection Method Based on Double-Fusion Framework[J].ACTA ELECTRONICA SINICA,2023,51(11):3100-3110.
葛同澳,李辉,郭颖等.基于双融合框架的多模态3D目标检测算法[J].电子学报,2023,51(11):3100-3110. DOI： 10.12263/DZXB.20230414.

GE Tong-ao,LI Hui,GUO Ying,et al.A Multimodal 3D Object Detection Method Based on Double-Fusion Framework[J].ACTA ELECTRONICA SINICA,2023,51(11):3100-3110. DOI： 10.12263/DZXB.20230414.

摘要

相机和激光雷达多模态融合的3D目标检测可以综合利用两种传感器的优点，提高目标检测的准确度和鲁棒性.然而，由于环境复杂性以及多模态数据间固有的差异性，3D目标检测仍面临着诸多挑战.本文提出了双融合框架的多模态3D目标检测算法.设计体素级和网格级的双融合框架，有效缓解融合时不同模态数据之间的语义差异；提出ABFF（Adaptive Bird-eye-view Features Fusion）模块，增强算法对小目标特征感知能力；通过体素级全局融合信息指导网格级局部融合，提出基于Transformer的多模态网格特征编码器，充分提取3D检测场景中更丰富的上下文信息，并提升算法运行效率.在KITTI标准数据集上的实验结果表明，提出的3D目标检测算法平均检测精度达78.79%，具有更好的3D目标检测性能.

Abstract

The 3D object detection of camera and lidar multimodal fusion can comprehensively utilize the advantages of the two sensors to improve the accuracy and robustness of detection. However

due to the complexity of the environment and the inherent variability among multimodal data

3D object detection still faces many challenges. In this paper

we propose a multimodal 3D object detection algorithm with a double-fusion framework. We design a voxel-level and grid-level double-fusion framework

effectively alleviating the semantic differences between modal data. We propose the ABFF (Adaptive Bird-eye-view Features Fusion) module to enhance the algorithm's ability to perceive small object features. Through voxel-level global fusion information to guide grid-level local fusion

we propose a Transformer-based multimodal grid feature encoder to extract richer context information in 3D detection scenes and improve the efficiency of the algorithm. The experimental results on the KITTI standard dataset show that the average detection accuracy of our proposed 3D object detection algorithm reaches 78.79%

which has better 3D object detection performance.

关键词

Keywords

references

YAN Y , MAO Y X , LI B . SECOND: Sparsely embedded convolutional detection [J ] . Sensors , 2018 , 18 ( 10 ): 3337 .

SHI S S , GUO C X , JIANG L , et al . PV-RCNN: Point-voxel feature set abstraction for 3D object detection [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 10529 - 10538 .

DENG J J , SHI S S , LI P W , et al . Voxel R-CNN: Towards high performance voxel-based 3D object detection [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2021 , 35 ( 2 ): 1201 - 1209 .

ZHENG W , TANG W L , JIANG L , et al . SE-SSD: Self-ensembling single-stage object detector from point cloud [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 14494 - 14503 .

HU J S K , KUAI T S , WASLANDER S L . Point density-aware voxels for LiDAR 3D object detection [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 8469 - 8478 .

WU H , DENG J H , WEN C L , et al . CasA: A cascade attention network for 3-D object detection from LiDAR point clouds [J ] . IEEE Transactions on Geoscience and Remote Sensing , 2022 , 60 : 1 - 11 .

PHILION J , FIDLER S . Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3D [C ] // Computer Vision — ECCV 2020 . Cham : Springer International Publishing , 2020 : 194 - 210 .

LI Y H , GE Z , YU G Y , et al . BEVDepth: Acquisition of reliable depth for multi-view 3D object detection [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2023 , 37 ( 2 ): 1477 - 1485 .

LI Z Q , WANG W H , LI H Y , et al . BEVFormer: Learning Bird’s-eye-view representation from multi-camera images via spatiotemporal Transformers [C ] // Lecture Notes in Computer Science . Cham : Springer Nature Switzerland , 2022 : 1 - 18 .

VORA S , LANG A H , HELOU B , et al . Pointpainting: Sequential fusion for 3D object detection [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 4604 - 4612 .

YIN T , ZHOU X , KRAHENBUHL P . Multimodal virtual point 3D detection [J ] . Advances in Neural Information Processing Systems , 2021 , 34 ( 11 ): 16494 - 16507 .

HUANG T T , LIU Z , CHEN X W , et al . EPNet: Enhancing point features with image semantics for 3D object detection [C ] // Computer Vision — ECCV 2020 . Cham : Springer International Publishing , 2020 : 35 - 52 .

LIU Z , HUANG T T , LI B L , et al . EPNet++: Cascade bi-directional fusion for multi-modal 3D object detection [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022 , 2022( 12 ): 1 - 18 .

ZHANG Y N , CHEN J X , HUANG D . CAT-det: Contrastively augmented transformer for multimodal 3D object detection [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 908 - 917 .

PANG S , MORRIS D , RADHA H . CLOCs: Camera-LiDAR object candidates fusion for 3D object detection [C ] // 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . Piscataway : IEEE , 2020 : 10386 - 10393 .

WU X P , PENG L A , YANG H H , et al . Sparse fuse dense: Towards high quality 3D detection with depth completion [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 5418 - 5427 .

CHEN Z , LI Z , ZHANG S , et al . Autoalign: Pixel-instance feature aggregation for multi-modal 3D object detection [EB/OL ] . ( 2022-01-17 )[ 2022-04-21 ] . https://arxi-v.org/abs/2201.06493 https://arxi-v.org/abs/2201.06493 .

CHEN Z , LI Z , ZHANG S , et al . Autoalignv2: Deformable feature aggregation for dynamic multi-modal 3D object detection [EB/OL ] . ( 2022-07-21 )[ 2022-04-21 ] . https://arxiv.org/abs/2207.10316 https://arxiv.org/abs/2207.10316 .

LI Y W , YU A W , MENG T J , et al . Deepfusion: Lidar-camera deep fusion for multi-modal 3D object detection [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 17182 - 17191 .

CHITTA K , PRAKASH A , JAEGER B , et al . TransFuser: Imitation with transformer-based sensor fusion for autonomous driving [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022 : 1 - 18 .

LIN T Y , DOLLAR P , GIRSHICK R , et al . Feature pyramid networks for object detection [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 2117 - 2125 .

VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [J ] . Advances in neural information processing systems , 2017 , 30 ( 12 ): 5998 - 6008 .

WU H , WEN C L , SHI S S , et al . Virtual sparse convolution for multimodal 3D object detection [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 21653 - 21662 .

DAI J F , QI H Z , XIONG Y W , et al . Deformable convolutional networks [C ] // 2017 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2017 : 764 - 773 .

YIN T W , ZHOU X Y , KRAHENBUHL P . Center-based 3D object detection and tracking [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 11784 - 11793 .

GUO M H , CAI J X , LIU Z N , et al . PCT: Point cloud Transformer [J ] . Computational Visual Media , 2021 , 7 ( 2 ): 187 - 199 .

YUAN L , CHEN Y P , WANG T , et al . Tokens-to-token ViT: Training vision transformers from scratch on ImageNet [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2021 : 558 - 567 .

GEIGER A , LENZ P , STILLER C , et al . Vision meets robotics: The KITTI dataset [J ] . The International Journal of Robotics Research , 2013 , 32 ( 11 ): 1231 - 1237 .

IMRAN S , LIU X M , MORRIS D . Depth completion with twin surface extrapolation at occlusion boundaries [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 2583 - 2592 .

LI J L , DAI H , SHAO L , et al . Anchor-free 3D single stage detector with mask-guided attention for point cloud [C ] // Proceedings of the 29th ACM International Conference on Multimedia . New York : ACM , 2021 : 553 - 562 .

YOO J H , KIM Y , KIM J , et al . 3D-CVF: Generating joint camera and LiDAR features using cross-view spatial feature fusion for 3D object detection [C ] // Computer Vision — ECCV 2020 . Cham : Springer International Publishing , 2020 : 720 - 736 .

MAHMOUD A , HU J S K , WASLANDER S L . Dense voxel fusion for 3D object detection [C ] // 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) . Piscataway : IEEE , 2023 : 663 - 672 .

LI Y W , QI X J , CHEN Y K , et al . Voxel field fusion for 3D object detection [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 1120 - 1129 .

CHEN Y K , LI Y W , ZHANG X Y , et al . Focal sparse convolutional networks for 3D object detection [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 5428 - 5437 .

LI X , MA T , HOU Y N , et al . LoGoNet: Towards accurate 3D object detection with local-to-global cross- modal fusion [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 17524 - 17534 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

面向时序异常检测的可变视距多向扫描方法

基于稀疏平滑自蒸馏的差分隐私深度学习方法

基于非一般类算子融合方法及硬件架构设计

基于注意力融合多尺度特征的解压缩点云质量增强方法