无人机视角多源目标检测数据集UAV-RGBT及算法基准

汪进中; 戴顺; 张秀伟; 田雪涛; 邢颖慧; 汪芳; 尹翰林; 张艳宁

doi:10.12263/DZXB.20240602

您当前的位置：

首页 >

文章列表页 >

无人机视角多源目标检测数据集UAV-RGBT及算法基准

面向无人系统的智能视觉算法 | 更新时间：2025-12-08

- 无人机视角多源目标检测数据集UAV-RGBT及算法基准
- UAV-RGBT Multispectral Object Detection Dataset and Algorithm Benchmark
- 电子学报 2025年53卷第3期页码：686-704
- 作者机构：
  
  1.西北工业大学计算机学院，陕西西安 710072
  2.西安爱生技术集团有限公司，陕西西安 710065
  3.西北工业大学深圳研究院，广东深圳 518063
- 作者简介：
  
  [ "汪进中男，1995年生，甘肃民勤人.现为西北工业大学计算机学院硕士研究生.主要研究方向为图像处理、多源目标检测、深度学习等. E-mail: wangjinzhong@mail.nwpu.edu.cn" ]
  [ "张秀伟女，1981年生，新疆塔城人.现为西北工业大学计算机学院教授.主要研究方向为计算机视觉、多源信息协同处理、深度学习等. E-mail: xwzhang@nwpu.edu.cn" ]
- 基金信息：
  
  国家自然科学基金(61971356);陕西省自然科学基础研究计划(2024JC-DXWT-07;2024JC-YBQN-0719);陕西省重点研发计划(2023-YBGY-012);广东省基础与应用基础研究基金(2024A1515030186)
- DOI：10.12263/DZXB.20240602
  中图分类号： TP389.1;TP391.4
- 收稿：2024-06-26，
  
  修回：2024-09-29，
  
  纸质出版：2025-03-25
- 稿件说明：
移动端阅览
汪进中, 戴顺, 张秀伟, 等. 无人机视角多源目标检测数据集UAV-RGBT及算法基准[J]. 电子学报, 2025, 53(03): 686-704.

WANG Jin-zhong, DAI Shun, ZHANG Xiu-wei, et al. UAV-RGBT Multispectral Object Detection Dataset and Algorithm Benchmark[J]. Acta Electronica Sinica, 2025, 53(03): 686-704.
汪进中, 戴顺, 张秀伟, 等. 无人机视角多源目标检测数据集UAV-RGBT及算法基准[J]. 电子学报, 2025, 53(03): 686-704. DOI：10.12263/DZXB.20240602

WANG Jin-zhong, DAI Shun, ZHANG Xiu-wei, et al. UAV-RGBT Multispectral Object Detection Dataset and Algorithm Benchmark[J]. Acta Electronica Sinica, 2025, 53(03): 686-704. DOI：10.12263/DZXB.20240602

摘要

基于无人机（Unmanned Aerial Vehicle，UAV）平台的可见光（Red Green Blue，RGB）和热红外（Thermal infrared，T）多源目标检测，可实现全天时、全天候的目标侦察，在军用和民用领域有着重要的应用价值.受限于数据拍摄获取和处理的复杂性，当前少有公开的UAV视角RGB-T多源目标检测数据集，一定程度上限制了UAV视角RGB-T多源目标检测算法的研究和应用.与此同时，UAV应用场景复杂多变，其飞行高度、速度、焦距和背景等快速变化，所拍摄目标在图像上呈现出尺度多样、稠密/稀疏分布不均衡、类别不平衡等特点，具有一定的挑战性.此外，在诸如目标侦察、交通监控等高时效性应用场景中，算法需在保证高精度的同时实现实时目标检测，因此，算法的设计必须充分考虑精度与速度之间的平衡.针对上述问题，本文构建了一个跨季节、跨昼夜、多类别、多尺度的大规模UAV视角RGB-T多源图像数据集UAV-RGBT，包含20个类别、5 117对RGB-T图像和超11万个标注，有助于推进UAV视角多源目标检测算法的研究.同时，基于YOLOv8n模型，本文提出了一种UAV视角多源目标检测（UAV-based Dual-branch Multispectral object Detection，UAV-DMDet）模型，其通过多源交叉注意力融合和多源特征分解组合方法有效促进了多源特征的深度融合，较好地实现了模型参数量、检测速度和检测精度的均衡.实验结果表明：在UAV-RGBT数据集上，UAV-DMDet模型较单源YOLOv8n模型，在RGB和T模态方面，mAP@0.5分别提高了3.61%、11.03%，mAP@0.5：0.95分别提高了0.84%、6.76%；在DroneVehicle数据集上，mAP@0.5和mAP@0.5：0.95较主流算法I

MDet提高了2.66%和12.36%；在检测速度方面，以640

https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=83542137&type=

2.28600001

https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=83542172&type=

1.43933344

640分辨率图像为例，UAV-DMDet模型在单张GeForce RTX 3090显卡上FP32精度推理速度可达31帧/s，在华为昇腾710处理器上FP16精度推理速度可达58帧/s，可有效应用于UAV视角RGB-T多源实时目标检测任务.

Abstract

Unmanned aerial vehicle (UAV)-based multispectral object detection utilizing both visible (RGB) and thermal infrared (T) images

makes all-weather and all-day target monitoring possible

serving critical roles in military a

nd civilian applications. However

due to the complexity of data acquisition and processing

there is currently a lack of publicly available UAV-based RGB-T multispectral object detection datasets

which to some extent limits its research and application. Meanwhile

UAV operational scenarios are characterized by complex and variable conditions

including rapid changes in flight altitude

speed

focal length

and background. So

the captured targets exhibit diverse scales

uneven (dense/sparse) distributions

and category imbalances in images

which presents significant challenges for accurate detection. Furthermore

real-time requirement should be guaranted in applications such as reconnaissance and traffic monitoring. Therefore

it is the key to keep a trade-off between accuracy and speed in the algorithmic design of UAV RGB-T object detector. To address these issues

this paper introduces a large-scale UAV-based RGB-T multispectral dataset named UAV-RGBT

which spans across seasons and day-night cycles

and includes multiple categories and scales. Specifically

UAV-RGBT comprises 20 categories with 5 117 pairs of RGB-T images and over 110 000 annotations

which is conducive to advancing research in UAV-based multispectral object detection algorithms. Moreover

based on the YOLOv8n model

the UAV-based dual-branch multispectral object detection (UAV-DMDet) model is proposed to promote deep fusion of multispectral features through a multi-modal cross-attention fusion module and a multi-modal feature decomposition combination module. This approach achieves a batter trade-off among model parameter size

detection speed

and accuracy. Experimental results demonstrate that the UAV-DMDet model improves the mAP@0.5 on the UAV-RGBT dataset by 3.61% and 11.03% in the visible and thermal modalities

respectively

and enhances the mAP@0.5:0.95 by 0.84% and 6.76%

respectively. On the DroneVehicle dataset

the UAV-DMDet model outperforms the mainstream algorithm I

MDet

with mAP@0.5 and mAP@0.5:0.95 improvements

of 2.66% and 12.36%

respectively. Furthermore

with 640

https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=83542160&type=

2.28600001

https://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=83542184&type=

1.43933344

640 resolution images as input

the UAV-DMDet model achieve FP32 precision inference speed of 31 frames per second on a GeForce RTX 3090 GPU

and FP16 precision inference speed of 58 frames per second on a Huawei Ascend 710 processor

making it effectively applicable for real-time UAV-based RGB-T multispectral object detection tasks.

关键词

Keywords

references

GONZÁLEZ A , FANG Z J , SOCARRAS Y , et al . Pedestrian detection at day/night time with visible and FIR cameras: A comparison [J ] . Sensors , 2016 , 16 ( 6 ): 820 .

HWANG S , PARK J , KIM N , et al . Multispectral pedestrian detection: Benchmark dataset and baseline [C ] // 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2015 : 1037 - 1045 .

SUN Y M , CAO B , ZHU P F , et al . Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2022 , 32 ( 10 ): 6700 - 6713 .

HAN Y Q , LIU H P , WANG Y F , et al . A comprehensive review for typical applications based upon unmanned aerial vehicle platform [J ] . IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , 2022 , 15 : 9654 - 9666 .

JOCHER G , CHAURASIA A , QIU J , et al . YOLO by ultralytics [EB/OL ] . ( 2023-01-23 )[ 2024-06-26 ] . https://github.com/ultralytics/ultralytics https://github.com/ultralytics/ultralytics .

JOCHER G . YOLOv5 by Ultralytics [EB/OL ] . ( 2020-01-01 )[ 2024-06-26 ] . https://github.com/ultralytics/yolov5 https://github.com/ultralytics/yolov5 .

LIU W , ANGUELOV D , ERHAN D , et al . SSD: Single Shot MultiBox Detector [M ] // Computer Vision - ECCV 2016 . Cham : Springer International Publishing , 2016 : 21 - 37 .

REN S Q , HE K M , GIRSHICK R , et al . Faster R-CNN: Towards real-time object detection with region proposal networks [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017 , 39 ( 6 ): 1137 - 1149 .

CAI Z W , VASCONCELOS N . Cascade R-CNN: Delving into high quality object detection [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 6154 - 6162 .

DOSOVITSKIY A , BEYER L , KOLESNIKOV A , et al . An image is worth 16 x 16 words: Transformers for image recognition at scale[EB/OL ] . ( 2021-06-03 )[ 2024-06-26 ] . https://arxiv.org/abs/2010.11929 https://arxiv.org/abs/2010.11929 .

CARION N , MASSA F , SYNNAEVE G , et al . End-to-End Object Detection with Transformers [M ] // Computer Vision - ECCV 2020 . Cham : Springer International Publishing , 2020 : 213 - 229 .

YUAN M X , WEI X X . C²Former: Calibrated and complementary transformer for RGB-infrared object detection [J ] . IEEE Transactions on Geoscience and Remote Sensing , 2024 , 62 : 5403712 .

ZHANG N , LIU Y M , LIU H , et al . Oriented infrared vehicle detection in aerial images via mining frequency and semantic information [J ] . IEEE Transactions on Geoscience and Remote Sensing , 2023 , 61 : 5002315 .

YUAN M X , WANG Y Y , WEI X X . Translation, scale and rotation: Cross-modal alignment meets RGB-infrared vehicle detection [M ] // Computer Vision - ECCV 2022 . Cham : Springer Nature Switzerland , 2022 : 509 - 525 .

HUANG Z C , LI W , TAO R . Multimodal knowledge distillation for arbitrary-oriented object detection in aerial images [C ] // ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Piscataway : IEEE , 2023 : 1 - 5 .

WANG A , CHEN H , LIU L H , et al . YOLOv10: Real-time end-to-end object detection [EB/OL ] . ( 2024-05-30 )[ 2024-06-26 ] . https://arxiv.org/abs/2405.14458v2 https://arxiv.org/abs/2405.14458v2 .

DU D W , QI Y K , YU H Y , et al . The unmanned aerial vehicle benchmark: Object detection and tracking [M ] // Computer Vision - ECCV 2018 . Cham : Springer International Publishing , 2018 : 375 - 391 .

BOZCAN I , KAYACAN E . AU-AIR: A multi-modal unmanned aerial vehicle dataset for low altitude traffic surveillance [C ] // 2020 IEEE International Conference on Robotics and Automation (ICRA) . Piscataway : IEEE , 2020 : 8504 - 8510 .

HSIEH M R , LIN Y L , HSU W H . Drone-based object counting by spatially regularized regional proposal network [C ] // 2017 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2017 : 4165 - 4173 .

ZHANG W , LIU C S , CHANG F L , et al . Multi-scale and occlusion aware network for vehicle detection and segmentation on UAV aerial images [J ] . Remote Sensing , 2020 , 12 ( 11 ): 1760 .

ZHU P F , WEN L Y , BIAN X , et al . Vision meets drones: A challenge [EB/OL ] . ( 2018-04-23 )[ 2024-06-26 ] . https://arxiv.org/abs/1804.07437v2 https://arxiv.org/abs/1804.07437v2 .

ZHANG H J , SUN M S , LI Q , et al . An empirical study of multi-scale object detection in high resolution UAV images [J ] . Neurocomputing , 2021 , 421 : 173 - 182 .

WANG J H , TENG X C , LI Z , et al . VSAI: A multi-view dataset for vehicle detection in complex scenarios using aerial images [J ] . Drones , 2022 , 6 ( 7 ): 161 .

PORTMANN J , LYNEN S , CHLI M , et al . People detection and tracking from aerial thermal views [C ] // 2014 IEEE International Conference on Robotics and Automation (ICRA) . Piscataway : IEEE , 2014 : 1794 - 1800 .

SUO J S , WANG T Y , ZHANG X Z , et al . HIT-UAV: A high-altitude infrared thermal dataset for Unmanned Aerial Vehicle-based object detection [J ] . Scientific Data , 2023 , 10 ( 1 ): 227 .

ZHANG X W , LI Y P , QI Z S , et al . Learning multi-domain feature relation for visible and Long-wave Infrared image patch matching [EB/OL ] . ( 2023-08-09 )[ 2024-06-26 ] . https://arxiv.org/abs/2308.04880v1 https://arxiv.org/abs/2308.04880v1 .

LIU S , QI L , QIN H F , et al . Path aggregation network for instance segmentation [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 8759 - 8768 .

VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems . New York : ACM , 2017 : 6000 - 6010 .

BA J L , KIROS J R , HINTON G E . Layer normalization [EB/OL ] . ( 2016-07-21 )[ 2024-06-26 ] . https://arxiv.org/abs/1607.06450v1 https://arxiv.org/abs/1607.06450v1 .

ZHAO Z X , BAI H W , ZHANG J S , et al . CDDFuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 5906 - 5916 .

ZAMIR S W , ARORA A , KHAN S , et al . Restormer: Efficient transformer for high-resolution image restoration [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 5718 - 5729 .

DINH L , SOHL-DICKSTEIN J , BENGIO S . Density estimation using real NVP [EB/OL ] . ( 2017-02-27 )[ 2024-06-26 ] . https://arxiv.org/abs/1605.08803v3 https://arxiv.org/abs/1605.08803v3 .

ZHOU M , HUANG J , FANG Y C , et al . Pan-sharpening with customized transformer and invertible neural network [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2022 , 36 ( 3 ): 3553 - 3561 .

HOWARD A , SANDLER M , CHEN B , et al . Searching for MobileNetV3 [C ] // 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2019 : 1314 - 1324 .

HU J , SHEN L , SUN G . Squeeze-and-excitation networks [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 7132 - 7141 .

ZHAO Y A , LV W Y , XU S L , et al . DETRs beat YOLOs on real-time object detection [EB/OL ] . ( 2024-04-03 )[ 2024-06-26 ] . https://arxiv.org/abs/2304.08069v3 https://arxiv.org/abs/2304.08069v3 .

SHEN J F , CHEN Y F , LIU Y , et al . ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection [J ] . Pattern Recognition , 2024 , 145 : 109913 .

WANG J Z , TIAN X T , DAI S , et al . RGB-T object detection via group shuffled multi-receptive attention and multi-modal supervision [EB/OL ] . ( 2024-05-29 )[ 2024-06-26 ] . https://arxiv.org/abs/2405.18955v1 https://arxiv.org/abs/2405.18955v1 .

XIE X X , CHENG G , WANG J B , et al . Oriented R-CNN for object detection [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2021 : 3500 - 3509 .

DING J , XUE N , LONG Y , et al . Learning RoI transformer for oriented object detection in aerial images [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 2844 - 2853 .

ZHANG L , LIU Z Y , ZHANG S F , et al . Cross-modality interactive attention network for multispectral pedestrian detection [J ] . Information Fusion , 2019 , 50 : 20 - 29 .

ZHANG L , ZHU X Y , CHEN X Y , et al . Weakly aligned cross-modal learning for multispectral pedestrian detection [C ] // 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2019 : 5127 - 5137 .

WANG D , ZHANG Q M , XU Y F , et al . Advancing plain vision transformer toward remote sensing foundation model [J ] . IEEE Transactions on Geoscience and Remote Sensing , 2022 , 61 : 5607315 .

WU Y F , GUAN X R , ZHAO B Y , et al . Vehicle detection based on adaptive multimodal feature fusion and cross-modal vehicle index using RGB-T images [J ] . IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , 2023 , 16 : 8166 - 8177 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

雷达有源电磁干扰视觉检测与参数估计方法

一种改进YOLOv8的跳频网台分选算法