1.西北工业大学计算机学院,陕西西安 710072
2.西安爱生技术集团有限公司,陕西西安 710065
3.西北工业大学深圳研究院,广东深圳 518063
[ "汪进中 男,1995年生,甘肃民勤人.现为西北工业大学计算机学院硕士研究生.主要研究方向为图像处理、多源目标检测、深度学习等. E-mail: wangjinzhong@mail.nwpu.edu.cn" ]
[ "张秀伟 女,1981年生,新疆塔城人.现为西北工业大学计算机学院教授.主要研究方向为计算机视觉、多源信息协同处理、深度学习等. E-mail: xwzhang@nwpu.edu.cn" ]
收稿:2024-06-26,
修回:2024-09-29,
纸质出版:2025-03-25
移动端阅览
汪进中, 戴顺, 张秀伟, 等. 无人机视角多源目标检测数据集UAV-RGBT及算法基准[J]. 电子学报, 2025, 53(03): 686-704.
WANG Jin-zhong, DAI Shun, ZHANG Xiu-wei, et al. UAV-RGBT Multispectral Object Detection Dataset and Algorithm Benchmark[J]. Acta Electronica Sinica, 2025, 53(03): 686-704.
汪进中, 戴顺, 张秀伟, 等. 无人机视角多源目标检测数据集UAV-RGBT及算法基准[J]. 电子学报, 2025, 53(03): 686-704. DOI:10.12263/DZXB.20240602
WANG Jin-zhong, DAI Shun, ZHANG Xiu-wei, et al. UAV-RGBT Multispectral Object Detection Dataset and Algorithm Benchmark[J]. Acta Electronica Sinica, 2025, 53(03): 686-704. DOI:10.12263/DZXB.20240602
基于无人机(Unmanned Aerial Vehicle,UAV)平台的可见光(Red Green Blue,RGB)和热红外(Thermal infrared,T)多源目标检测,可实现全天时、全天候的目标侦察,在军用和民用领域有着重要的应用价值.受限于数据拍摄获取和处理的复杂性,当前少有公开的UAV视角RGB-T多源目标检测数据集,一定程度上限制了UAV视角RGB-T多源目标检测算法的研究和应用.与此同时,UAV应用场景复杂多变,其飞行高度、速度、焦距和背景等快速变化,所拍摄目标在图像上呈现出尺度多样、稠密/稀疏分布不均衡、类别不平衡等特点,具有一定的挑战性.此外,在诸如目标侦察、交通监控等高时效性应用场景中,算法需在保证高精度的同时实现实时目标检测,因此,算法的设计必须充分考虑精度与速度之间的平衡.针对上述问题,本文构建了一个跨季节、跨昼夜、多类别、多尺度的大规模UAV视角RGB-T多源图像数据集UAV-RGBT,包含20个类别、5 117对RGB-T图像和超11万个标注,有助于推进UAV视角多源目标检测算法的研究.同时,基于YOLOv8n模型,本文提出了一种UAV视角多源目标检测(UAV-based Dual-branch Multispectral object Detection,UAV-DMDet)模型,其通过多源交叉注意力融合和多源特征分解组合方法有效促进了多源特征的深度融合,较好地实现了模型参数量、检测速度和检测精度的均衡.实验结果表明:在UAV-RGBT数据集上,UAV-DMDet模型较单源YOLOv8n模型,在RGB和T模态方面,mAP@0.5分别提高了3.61%、11.03%,mAP@0.5:0.95分别提高了0.84%、6.76%;在DroneVehicle数据集上,mAP@0.5和mAP@0.5:0.95较主流算法I
2
MDet提高了2.66%和12.36%;在检测速度方面,以640
<math id="M1"><mo>×</mo></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=83542137&type=
2.28600001
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=83542172&type=
1.43933344
640分辨率图像为例,UAV-DMDet模型在单张GeForce RTX 3090显卡上FP32精度推理速度可达31帧/s,在华为昇腾710处理器上FP16精度推理速度可达58帧/s,可有效应用于UAV视角RGB-T多源实时目标检测任务.
Unmanned aerial vehicle (UAV)-based multispectral object detection utilizing both visible (RGB) and thermal infrared (T) images
makes all-weather and all-day target monitoring possible
serving critical roles in military a
nd civilian applications. However
due to the complexity of data acquisition and processing
there is currently a lack of publicly available UAV-based RGB-T multispectral object detection datasets
which to some extent limits its research and application. Meanwhile
UAV operational scenarios are characterized by complex and variable conditions
including rapid changes in flight altitude
speed
focal length
and background. So
the captured targets exhibit diverse scales
uneven (dense/sparse) distributions
and category imbalances in images
which presents significant challenges for accurate detection. Furthermore
real-time requirement should be guaranted in applications such as reconnaissance and traffic monitoring. Therefore
it is the key to keep a trade-off between accuracy and speed in the algorithmic design of UAV RGB-T object detector. To address these issues
this paper introduces a large-scale UAV-based RGB-T multispectral dataset named UAV-RGBT
which spans across seasons and day-night cycles
and includes multiple categories and scales. Specifically
UAV-RGBT comprises 20 categories with 5 117 pairs of RGB-T images and over 110 000 annotations
which is conducive to advancing research in UAV-based multispectral object detection algorithms. Moreover
based on the YOLOv8n model
the UAV-based dual-branch multispectral object detection (UAV-DMDet) model is proposed to promote deep fusion of multispectral features through a multi-modal cross-attention fusion module and a multi-modal feature decomposition combination module. This approach achieves a batter trade-off among model parameter size
detection speed
and accuracy. Experimental results demonstrate that the UAV-DMDet model improves the mAP@0.5 on the UAV-RGBT dataset by 3.61% and 11.03% in the visible and thermal modalities
respectively
and enhances the mAP@0.5:0.95 by 0.84% and 6.76%
respectively. On the DroneVehicle dataset
the UAV-DMDet model outperforms the mainstream algorithm I
2
MDet
with mAP@0.5 and mAP@0.5:0.95 improvements
of 2.66% and 12.36%
respectively. Furthermore
with 640
<math id="M2"><mo>×</mo></math>
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=83542160&type=
2.28600001
https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=83542184&type=
1.43933344
640 resolution images as input
the UAV-DMDet model achieve FP32 precision inference speed of 31 frames per second on a GeForce RTX 3090 GPU
and FP16 precision inference speed of 58 frames per second on a Huawei Ascend 710 processor
making it effectively applicable for real-time UAV-based RGB-T multispectral object detection tasks.
GONZÁLEZ A , FANG Z J , SOCARRAS Y , et al . Pedestrian detection at day/night time with visible and FIR cameras: A comparison [J ] . Sensors , 2016 , 16 ( 6 ): 820 .
HWANG S , PARK J , KIM N , et al . Multispectral pedestrian detection: Benchmark dataset and baseline [C ] // 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2015 : 1037 - 1045 .
SUN Y M , CAO B , ZHU P F , et al . Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2022 , 32 ( 10 ): 6700 - 6713 .
HAN Y Q , LIU H P , WANG Y F , et al . A comprehensive review for typical applications based upon unmanned aerial vehicle platform [J ] . IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , 2022 , 15 : 9654 - 9666 .
JOCHER G , CHAURASIA A , QIU J , et al . YOLO by ultralytics [EB/OL ] . ( 2023-01-23 )[ 2024-06-26 ] . https://github.com/ultralytics/ultralytics https://github.com/ultralytics/ultralytics .
JOCHER G . YOLOv5 by Ultralytics [EB/OL ] . ( 2020-01-01 )[ 2024-06-26 ] . https://github.com/ultralytics/yolov5 https://github.com/ultralytics/yolov5 .
LIU W , ANGUELOV D , ERHAN D , et al . SSD: Single Shot MultiBox Detector [M ] // Computer Vision - ECCV 2016 . Cham : Springer International Publishing , 2016 : 21 - 37 .
REN S Q , HE K M , GIRSHICK R , et al . Faster R-CNN: Towards real-time object detection with region proposal networks [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017 , 39 ( 6 ): 1137 - 1149 .
CAI Z W , VASCONCELOS N . Cascade R-CNN: Delving into high quality object detection [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 6154 - 6162 .
DOSOVITSKIY A , BEYER L , KOLESNIKOV A , et al . An image is worth 16 x 16 words: Transformers for image recognition at scale[EB/OL ] . ( 2021-06-03 )[ 2024-06-26 ] . https://arxiv.org/abs/2010.11929 https://arxiv.org/abs/2010.11929 .
CARION N , MASSA F , SYNNAEVE G , et al . End-to-End Object Detection with Transformers [M ] // Computer Vision - ECCV 2020 . Cham : Springer International Publishing , 2020 : 213 - 229 .
YUAN M X , WEI X X . C²Former: Calibrated and complementary transformer for RGB-infrared object detection [J ] . IEEE Transactions on Geoscience and Remote Sensing , 2024 , 62 : 5403712 .
ZHANG N , LIU Y M , LIU H , et al . Oriented infrared vehicle detection in aerial images via mining frequency and semantic information [J ] . IEEE Transactions on Geoscience and Remote Sensing , 2023 , 61 : 5002315 .
YUAN M X , WANG Y Y , WEI X X . Translation, scale and rotation: Cross-modal alignment meets RGB-infrared vehicle detection [M ] // Computer Vision - ECCV 2022 . Cham : Springer Nature Switzerland , 2022 : 509 - 525 .
HUANG Z C , LI W , TAO R . Multimodal knowledge distillation for arbitrary-oriented object detection in aerial images [C ] // ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Piscataway : IEEE , 2023 : 1 - 5 .
WANG A , CHEN H , LIU L H , et al . YOLOv10: Real-time end-to-end object detection [EB/OL ] . ( 2024-05-30 )[ 2024-06-26 ] . https://arxiv.org/abs/2405.14458v2 https://arxiv.org/abs/2405.14458v2 .
DU D W , QI Y K , YU H Y , et al . The unmanned aerial vehicle benchmark: Object detection and tracking [M ] // Computer Vision - ECCV 2018 . Cham : Springer International Publishing , 2018 : 375 - 391 .
BOZCAN I , KAYACAN E . AU-AIR: A multi-modal unmanned aerial vehicle dataset for low altitude traffic surveillance [C ] // 2020 IEEE International Conference on Robotics and Automation (ICRA) . Piscataway : IEEE , 2020 : 8504 - 8510 .
HSIEH M R , LIN Y L , HSU W H . Drone-based object counting by spatially regularized regional proposal network [C ] // 2017 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2017 : 4165 - 4173 .
ZHANG W , LIU C S , CHANG F L , et al . Multi-scale and occlusion aware network for vehicle detection and segmentation on UAV aerial images [J ] . Remote Sensing , 2020 , 12 ( 11 ): 1760 .
ZHU P F , WEN L Y , BIAN X , et al . Vision meets drones: A challenge [EB/OL ] . ( 2018-04-23 )[ 2024-06-26 ] . https://arxiv.org/abs/1804.07437v2 https://arxiv.org/abs/1804.07437v2 .
ZHANG H J , SUN M S , LI Q , et al . An empirical study of multi-scale object detection in high resolution UAV images [J ] . Neurocomputing , 2021 , 421 : 173 - 182 .
WANG J H , TENG X C , LI Z , et al . VSAI: A multi-view dataset for vehicle detection in complex scenarios using aerial images [J ] . Drones , 2022 , 6 ( 7 ): 161 .
PORTMANN J , LYNEN S , CHLI M , et al . People detection and tracking from aerial thermal views [C ] // 2014 IEEE International Conference on Robotics and Automation (ICRA) . Piscataway : IEEE , 2014 : 1794 - 1800 .
SUO J S , WANG T Y , ZHANG X Z , et al . HIT-UAV: A high-altitude infrared thermal dataset for Unmanned Aerial Vehicle-based object detection [J ] . Scientific Data , 2023 , 10 ( 1 ): 227 .
ZHANG X W , LI Y P , QI Z S , et al . Learning multi-domain feature relation for visible and Long-wave Infrared image patch matching [EB/OL ] . ( 2023-08-09 )[ 2024-06-26 ] . https://arxiv.org/abs/2308.04880v1 https://arxiv.org/abs/2308.04880v1 .
LIU S , QI L , QIN H F , et al . Path aggregation network for instance segmentation [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 8759 - 8768 .
VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems . New York : ACM , 2017 : 6000 - 6010 .
BA J L , KIROS J R , HINTON G E . Layer normalization [EB/OL ] . ( 2016-07-21 )[ 2024-06-26 ] . https://arxiv.org/abs/1607.06450v1 https://arxiv.org/abs/1607.06450v1 .
ZHAO Z X , BAI H W , ZHANG J S , et al . CDDFuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 5906 - 5916 .
ZAMIR S W , ARORA A , KHAN S , et al . Restormer: Efficient transformer for high-resolution image restoration [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 5718 - 5729 .
DINH L , SOHL-DICKSTEIN J , BENGIO S . Density estimation using real NVP [EB/OL ] . ( 2017-02-27 )[ 2024-06-26 ] . https://arxiv.org/abs/1605.08803v3 https://arxiv.org/abs/1605.08803v3 .
ZHOU M , HUANG J , FANG Y C , et al . Pan-sharpening with customized transformer and invertible neural network [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2022 , 36 ( 3 ): 3553 - 3561 .
HOWARD A , SANDLER M , CHEN B , et al . Searching for MobileNetV3 [C ] // 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2019 : 1314 - 1324 .
HU J , SHEN L , SUN G . Squeeze-and-excitation networks [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 7132 - 7141 .
ZHAO Y A , LV W Y , XU S L , et al . DETRs beat YOLOs on real-time object detection [EB/OL ] . ( 2024-04-03 )[ 2024-06-26 ] . https://arxiv.org/abs/2304.08069v3 https://arxiv.org/abs/2304.08069v3 .
SHEN J F , CHEN Y F , LIU Y , et al . ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection [J ] . Pattern Recognition , 2024 , 145 : 109913 .
WANG J Z , TIAN X T , DAI S , et al . RGB-T object detection via group shuffled multi-receptive attention and multi-modal supervision [EB/OL ] . ( 2024-05-29 )[ 2024-06-26 ] . https://arxiv.org/abs/2405.18955v1 https://arxiv.org/abs/2405.18955v1 .
XIE X X , CHENG G , WANG J B , et al . Oriented R-CNN for object detection [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2021 : 3500 - 3509 .
DING J , XUE N , LONG Y , et al . Learning RoI transformer for oriented object detection in aerial images [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 2844 - 2853 .
ZHANG L , LIU Z Y , ZHANG S F , et al . Cross-modality interactive attention network for multispectral pedestrian detection [J ] . Information Fusion , 2019 , 50 : 20 - 29 .
ZHANG L , ZHU X Y , CHEN X Y , et al . Weakly aligned cross-modal learning for multispectral pedestrian detection [C ] // 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2019 : 5127 - 5137 .
WANG D , ZHANG Q M , XU Y F , et al . Advancing plain vision transformer toward remote sensing foundation model [J ] . IEEE Transactions on Geoscience and Remote Sensing , 2022 , 61 : 5607315 .
WU Y F , GUAN X R , ZHAO B Y , et al . Vehicle detection based on adaptive multimodal feature fusion and cross-modal vehicle index using RGB-T images [J ] . IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , 2023 , 16 : 8166 - 8177 .
0
浏览量
24
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621