1.青岛科技大学数据科学学院,山东青岛 266000
2.武汉理工大学计算机与人工智能学院,湖北武汉 430000
[ "葛同澳 男,1999年生,山东菏泽人.青岛科技大学数据科学学院硕士研究生.主要研究方向为计算机视觉、3D目标检测. E-mail: getongao24@163.com" ]
[ "李辉(通讯作者) 男,1984年生,河南平顶山人.青岛科技大学数据科学学院副教授、硕士生导师.主要研究方向为计算机视觉、3D目标检测及跟踪等." ]
[ "郭颖 女,1999年生,山东威海人.青岛科技大学数据科学学院硕士研究生.主要研究方向为计算机视觉、3D目标检测.E-mail: guoying_official@163.com" ]
[ "王俊印 男,1997年生,山东泰安人.武汉理工大学计算机与人工智能学院博士研究生.主要研究方向为计算机视觉、3D目标检测. E-mail: wjy199708@163.com" ]
[ "周迪 男,2000年生,湖北武汉人.青岛科技大学数据科学学院硕士研究生.主要研究方向为计算机视觉.中国电子学会会员编号:E190010986M.E-mail: 4022110030@mails.qust.edu.cn" ]
收稿:2023-05-10,
修回:2023-07-19,
纸质出版:2023-11-25
移动端阅览
葛同澳,李辉,郭颖等.基于双融合框架的多模态3D目标检测算法[J].电子学报,2023,51(11):3100-3110.
GE Tong-ao,LI Hui,GUO Ying,et al.A Multimodal 3D Object Detection Method Based on Double-Fusion Framework[J].ACTA ELECTRONICA SINICA,2023,51(11):3100-3110.
葛同澳,李辉,郭颖等.基于双融合框架的多模态3D目标检测算法[J].电子学报,2023,51(11):3100-3110. DOI: 10.12263/DZXB.20230414.
GE Tong-ao,LI Hui,GUO Ying,et al.A Multimodal 3D Object Detection Method Based on Double-Fusion Framework[J].ACTA ELECTRONICA SINICA,2023,51(11):3100-3110. DOI: 10.12263/DZXB.20230414.
相机和激光雷达多模态融合的3D目标检测可以综合利用两种传感器的优点,提高目标检测的准确度和鲁棒性.然而,由于环境复杂性以及多模态数据间固有的差异性,3D目标检测仍面临着诸多挑战.本文提出了双融合框架的多模态3D目标检测算法.设计体素级和网格级的双融合框架,有效缓解融合时不同模态数据之间的语义差异;提出ABFF(Adaptive Bird-eye-view Features Fusion)模块,增强算法对小目标特征感知能力;通过体素级全局融合信息指导网格级局部融合,提出基于Transformer的多模态网格特征编码器,充分提取3D检测场景中更丰富的上下文信息,并提升算法运行效率.在KITTI标准数据集上的实验结果表明,提出的3D目标检测算法平均检测精度达78.79%,具有更好的3D目标检测性能.
The 3D object detection of camera and lidar multimodal fusion can comprehensively utilize the advantages of the two sensors to improve the accuracy and robustness of detection. However
due to the complexity of the environment and the inherent variability among multimodal data
3D object detection still faces many challenges. In this paper
we propose a multimodal 3D object detection algorithm with a double-fusion framework. We design a voxel-level and grid-level double-fusion framework
effectively alleviating the semantic differences between modal data. We propose the ABFF (Adaptive Bird-eye-view Features Fusion) module to enhance the algorithm's ability to perceive small object features. Through voxel-level global fusion information to guide grid-level local fusion
we propose a Transformer-based multimodal grid feature encoder to extract richer context information in 3D detection scenes and improve the efficiency of the algorithm. The experimental results on the KITTI standard dataset show that the average detection accuracy of our proposed 3D object detection algorithm reaches 78.79%
which has better 3D object detection performance.
YAN Y , MAO Y X , LI B . SECOND: Sparsely embedded convolutional detection [J ] . Sensors , 2018 , 18 ( 10 ): 3337 .
SHI S S , GUO C X , JIANG L , et al . PV-RCNN: Point-voxel feature set abstraction for 3D object detection [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 10529 - 10538 .
DENG J J , SHI S S , LI P W , et al . Voxel R-CNN: Towards high performance voxel-based 3D object detection [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2021 , 35 ( 2 ): 1201 - 1209 .
ZHENG W , TANG W L , JIANG L , et al . SE-SSD: Self-ensembling single-stage object detector from point cloud [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 14494 - 14503 .
HU J S K , KUAI T S , WASLANDER S L . Point density-aware voxels for LiDAR 3D object detection [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 8469 - 8478 .
WU H , DENG J H , WEN C L , et al . CasA: A cascade attention network for 3-D object detection from LiDAR point clouds [J ] . IEEE Transactions on Geoscience and Remote Sensing , 2022 , 60 : 1 - 11 .
PHILION J , FIDLER S . Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3D [C ] // Computer Vision — ECCV 2020 . Cham : Springer International Publishing , 2020 : 194 - 210 .
LI Y H , GE Z , YU G Y , et al . BEVDepth: Acquisition of reliable depth for multi-view 3D object detection [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2023 , 37 ( 2 ): 1477 - 1485 .
LI Z Q , WANG W H , LI H Y , et al . BEVFormer: Learning Bird’s-eye-view representation from multi-camera images via spatiotemporal Transformers [C ] // Lecture Notes in Computer Science . Cham : Springer Nature Switzerland , 2022 : 1 - 18 .
VORA S , LANG A H , HELOU B , et al . Pointpainting: Sequential fusion for 3D object detection [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 4604 - 4612 .
YIN T , ZHOU X , KRAHENBUHL P . Multimodal virtual point 3D detection [J ] . Advances in Neural Information Processing Systems , 2021 , 34 ( 11 ): 16494 - 16507 .
HUANG T T , LIU Z , CHEN X W , et al . EPNet: Enhancing point features with image semantics for 3D object detection [C ] // Computer Vision — ECCV 2020 . Cham : Springer International Publishing , 2020 : 35 - 52 .
LIU Z , HUANG T T , LI B L , et al . EPNet++: Cascade bi-directional fusion for multi-modal 3D object detection [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022 , 2022( 12 ): 1 - 18 .
ZHANG Y N , CHEN J X , HUANG D . CAT-det: Contrastively augmented transformer for multimodal 3D object detection [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 908 - 917 .
PANG S , MORRIS D , RADHA H . CLOCs: Camera-LiDAR object candidates fusion for 3D object detection [C ] // 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) . Piscataway : IEEE , 2020 : 10386 - 10393 .
WU X P , PENG L A , YANG H H , et al . Sparse fuse dense: Towards high quality 3D detection with depth completion [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 5418 - 5427 .
CHEN Z , LI Z , ZHANG S , et al . Autoalign: Pixel-instance feature aggregation for multi-modal 3D object detection [EB/OL ] . ( 2022-01-17 )[ 2022-04-21 ] . https://arxi-v.org/abs/2201.06493 https://arxi-v.org/abs/2201.06493 .
CHEN Z , LI Z , ZHANG S , et al . Autoalignv2: Deformable feature aggregation for dynamic multi-modal 3D object detection [EB/OL ] . ( 2022-07-21 )[ 2022-04-21 ] . https://arxiv.org/abs/2207.10316 https://arxiv.org/abs/2207.10316 .
LI Y W , YU A W , MENG T J , et al . Deepfusion: Lidar-camera deep fusion for multi-modal 3D object detection [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 17182 - 17191 .
CHITTA K , PRAKASH A , JAEGER B , et al . TransFuser: Imitation with transformer-based sensor fusion for autonomous driving [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022 : 1 - 18 .
LIN T Y , DOLLAR P , GIRSHICK R , et al . Feature pyramid networks for object detection [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 2117 - 2125 .
VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [J ] . Advances in neural information processing systems , 2017 , 30 ( 12 ): 5998 - 6008 .
WU H , WEN C L , SHI S S , et al . Virtual sparse convolution for multimodal 3D object detection [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 21653 - 21662 .
DAI J F , QI H Z , XIONG Y W , et al . Deformable convolutional networks [C ] // 2017 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2017 : 764 - 773 .
YIN T W , ZHOU X Y , KRAHENBUHL P . Center-based 3D object detection and tracking [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 11784 - 11793 .
GUO M H , CAI J X , LIU Z N , et al . PCT: Point cloud Transformer [J ] . Computational Visual Media , 2021 , 7 ( 2 ): 187 - 199 .
YUAN L , CHEN Y P , WANG T , et al . Tokens-to-token ViT: Training vision transformers from scratch on ImageNet [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2021 : 558 - 567 .
GEIGER A , LENZ P , STILLER C , et al . Vision meets robotics: The KITTI dataset [J ] . The International Journal of Robotics Research , 2013 , 32 ( 11 ): 1231 - 1237 .
IMRAN S , LIU X M , MORRIS D . Depth completion with twin surface extrapolation at occlusion boundaries [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 2583 - 2592 .
LI J L , DAI H , SHAO L , et al . Anchor-free 3D single stage detector with mask-guided attention for point cloud [C ] // Proceedings of the 29th ACM International Conference on Multimedia . New York : ACM , 2021 : 553 - 562 .
YOO J H , KIM Y , KIM J , et al . 3D-CVF: Generating joint camera and LiDAR features using cross-view spatial feature fusion for 3D object detection [C ] // Computer Vision — ECCV 2020 . Cham : Springer International Publishing , 2020 : 720 - 736 .
MAHMOUD A , HU J S K , WASLANDER S L . Dense voxel fusion for 3D object detection [C ] // 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) . Piscataway : IEEE , 2023 : 663 - 672 .
LI Y W , QI X J , CHEN Y K , et al . Voxel field fusion for 3D object detection [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 1120 - 1129 .
CHEN Y K , LI Y W , ZHANG X Y , et al . Focal sparse convolutional networks for 3D object detection [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 5428 - 5437 .
LI X , MA T , HOU Y N , et al . LoGoNet: Towards accurate 3D object detection with local-to-global cross- modal fusion [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 17524 - 17534 .
0
浏览量
13
下载量
1
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621