WANG Jin-zhong, DAI Shun, ZHANG Xiu-wei, TIAN Xue-tao, XING Yin-hui, WANG Fang, YIN Han-lin, ZHANG Yan-ning
Online available: 2025-04-21
Unmanned aerial vehicle(UAV)-based multispectral object detection utilizing both visible(RGB) and thermal infrared(T) images, makes all-weather and all-day target monitoring possible, serving critical roles in military and civilian applications. However, due to the complexity of data acquisition and processing, there is currently a lack of publicly available UAV-based RGB-T multispectral object detection datasets, which to some extent limits its research and application. Meanwhile, UAV operational scenarios are characterized by complex and variable conditions, including rapid changes in flight altitude, speed, focal length, and background. So, the captured targets exhibit diverse scales, uneven(dense/sparse) distributions, and category imbalances in images, which presents significant challenges for accurate detection. Furthermore, real-time requirement should be guaranted in applications such as reconnaissance and traffic monitoring. Therefore, it is the key to keep a trade-off between accuracy and speed in the algorithmic design of UAV RGB-T object detector. To address these issues, this paper introduces a large-scale UAV-based RGB-T multispectral dataset named UAV-RGBT, which spans across seasons and day-night cycles, and includes multiple categories and scales. Specifically, UAV-RGBT comprises 20 categories with 5 117 pairs of RGB-T images and over 110 000 annotations, which is conducive to advancing research in UAV-based multispectral object detection algorithms. Moreover, based on the YOLOv8n model, the UAV-based dual-branch multispectral object detection(UAV-DMDet) model is proposed to promote deep fusion of multispectral features through a multi-modal cross-attention fusion module and a multi-modal feature decomposition combination module. This approach achieves a batter trade-off among model parameter size, detection speed, and accuracy. Experimental results demonstrate that the UAV-DMDet model improves the mAP@0.5 on the UAV-RGBT dataset by 3.61% and 11.03% in the visible and thermal modalities, respectively, and enhances the mAP@0.5:0.95 by 0.84% and 6.76%, respectively. On the DroneVehicle dataset, the UAV-DMDet model outperforms the mainstream algorithm I2MDet, with mAP@0.5 and mAP@0.5:0.95 improvements of 2.66% and 12.36%, respectively. Furthermore, with 640 640 resolution images as input, the UAV-DMDet model achieve FP32 precision inference speed of 31 frames per second on a GeForce RTX 3090 GPU, and FP16 precision inference speed of 58 frames per second on a Huawei Ascend 710 processor, making it effectively applicable for real-time UAV-based RGB-T multispectral object detection tasks.