1.中国民航大学计算机科学与技术学院,天津 300300
2.北京航空航天大学计算机学院,北京 100191
[ "张智 男,1993年12月出生于山西省大同市.现为中国民航大学计算机科学与技术学院实验师.主要研究方向为视频图像目标检测.E-mail: zhangz@cauc.edu.cn" ]
[ "易华挥 男,1997年11月生于四川省南充市.现为中国民航大学本科学生.主要研究方向为目标检测.E-mail: huahui_yi@163.com" ]
[ "郑锦(通讯作者) 女,1978年10月出生于四川省乐山市.现为北京航空航天大学计算机学院副教授、博士生导师.主要研究方向为计算机视觉、视频图像处理等." ]
收稿:2022-03-26,
修回:2022-05-10,
纸质出版:2023-04-25
移动端阅览
张智,易华挥,郑锦.聚焦小目标的航拍图像目标检测算法[J].电子学报,2023,51(04):944-955.
ZHANG Zhi,YI Hua-hui,ZHENG Jin.Focusing on Small Objects Detector in Aerial Images[J].ACTA ELECTRONICA SINICA,2023,51(04):944-955.
张智,易华挥,郑锦.聚焦小目标的航拍图像目标检测算法[J].电子学报,2023,51(04):944-955. DOI: 10.12263/DZXB.20220313.
ZHANG Zhi,YI Hua-hui,ZHENG Jin.Focusing on Small Objects Detector in Aerial Images[J].ACTA ELECTRONICA SINICA,2023,51(04):944-955. DOI: 10.12263/DZXB.20220313.
与通用目标检测不同,无人机(Unmanned Aerial Vehicle,UAV)航拍图像目标检测主要面临两个难题:(1)远距离观察下存在大量小尺寸目标,难以与背景区分;(2)大量区域中目标密集且存在严重遮挡.因此,将通用目标检测器直接应用于航拍图像会导致检测精度下降.本文提出一种聚焦小目标的航拍图像目标检测算法(Focusing on Small objects Detector in aerial images,FocSDet).针对小目标,通过密集高级组合(Dense Higher-Level Composition,DHLC)模式连接双Swin-Transfomer骨干网络,并和特征金字塔(Feature Pyramid Networks,FPN)结合,构建小目标特征聚合网络作为FocSDet的骨干网络,可丰富单层特征表达并提升对图像全局信息的利用,在不损失大目标语义信息的同时得到对小目标更好的特征描述,有效提升了小目标检测能力;针对区域密集遮挡,提出任务平衡样本分配策略,区别于现有样本分配策略只依赖定位位置,本文所提出的策略中样本匹配质量评价分数由定位位置信息和预测分类分数共同构成.基于该新评价分数不断迭代更新样本分配和监督网络优化,取得了更高质量的预测结果.最后,在检测头的分类和回归分支中引入层注意力构成增强检测头,进一步提升了小目标的检测性能.在Visdrone无人机数据集、CARPK航拍数据集上的实验表明,本文提出的FocSDet相较于现有方法ATSS和VFNET,在Visdrone上平均精度(Average Precision,AP)分别提升2%和0.6%,小目标AP
s
分别提升2.6%和1.2%;在CARPK上AP分别提升2.2%和1.7%,小目标AP
s
分别提升5.2%和5.0%.
Different from general object detection in natural images
object detection in unmanned aerial vehicle (UAV) aerial images mainly faces these challenges such as large number of small objects in remote observation
which is difficult to distinguish from the background
and dense objects with serious occlusion in
lots of areas. Therefore
the direct application of general object detector to aerial images will lead to the decline of detection performance. In this paper
an aerial image object detection algorithm focusing on small objects (FocsDet) is proposed. For small objects
a small object feature aggregation network is designed
which connects the dual Swin-Transfomer backbone network through dense higher-level composition (DHLC) mode and combines with feature pyramid networks (FPN)
so as to improve the utilization of global image information
enrich single-layer feature expression
and obtain better feature description of small objects without losing semantic information of large objects. It effectively improves the detection performance of small object. For regional dense occlusion
a task-balance label assignment is proposed
in which the label matching quality evaluation score is composed of location cost and classification cost
which is different from the existing evaluation score which only depends on location cost. Based on the evaluation score
label assignment and supervision network optimization are updated iteratively
so as to achieve better prediction results. Finally
layer attention is introduced into the classification and regression branches of the detection head to form enhanced detection head
which further improves the detection performance of small objects. Experiments on Visdrone dataset and CARPK dataset show that compared with the existing methods such as ATSS and VFNET
the average precision (AP) of FocsDet is improved by 2% and 0.6%
AP
s
is improved by 2.6% and 1.2% on Vsidrone dataset respectively. On CARPK dataset
AP increases by 2.2% and 1.7%
and AP
s
increases by 5.2% and 5.0% respectively.
REN S Q , HE K M , GIRSHICK R , et al . Faster R-CNN: Towards real-time object detection with region proposal networks [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017 , 39 ( 6 ): 1137 - 1149 .
BOCHKOVSKIY A , WANG C Y , LIAO H Y M . YOLOv4: Optimal speed and accuracy of object detection [EB/OL ] . ( 2020-04-23 )[ 2022-03 ] . https://arxiv.org/abs/2004.10934 https://arxiv.org/abs/2004.10934 .
ZHU C C , HE Y H , SAVVIDES M . Feature selective anchor-free module for single-shot object detection [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Long Beach : IEEE , 2020 : 840 - 849 .
ZHOU X Y , WANG D Q , KRÄHENBÜHL P . Objects as points [EB/OL ] . ( 2019-04-16 )[ 2022-03 ] . https://arxiv.org/abs/1904.07850 https://arxiv.org/abs/1904.07850 .
TIAN Z , SHEN C H , CHEN H , et al . FCOS: A simple and strong anchor-free object detector [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022 , 44 ( 4 ): 1922 - 1933 .
LIN T Y , MAIRE M , BELONGIE S , et al . Microsoft COCO: Common objects in context [C ] // Computer Vision - ECCV 2014 . Zurich : Springer , 2014 : 740 - 755 .
CAO Y R , HE Z J , WANG L J , et al . VisDrone-DET2021: The vision meets drone object detection challenge results [C ] // 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) . Montreal : IEEE , 2021 : 2847 - 2854 .
HUANG G , LIU Z , VAN DER MAATEN L , et al . Densely connected convolutional networks [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Honolulu : IEEE , 2017 : 2261 - 2269 .
LIN T Y , DOLLÁR P , GIRSHICK R , et al . Feature pyramid networks for object detection [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Honolulu : IEEE , 2017 : 936 - 944 .
ZHANG S F , CHI C , YAO Y Q , et al . Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Seattle : IEEE , 2020 : 9756 - 9765 .
LIU Z , LIN Y T , CAO Y , et al . Swin transformer: Hierarchical vision transformer using shifted windows [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Montreal : IEEE , 2022 : 9992 - 10002 .
ZHANG H Y , WANG Y , DAYOUB F , et al . VarifocalNet: An IoU-aware dense object detector [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Nashville : IEEE , 2021 : 8510 - 8519 .
HSIEH M R , LIN Y L , HSU W H . Drone-based object counting by spatially regularized regional proposal network [C ] // 2017 IEEE International Conference on Computer Vision (ICCV) . Venice : IEEE , 2017 : 4165 - 4173 .
HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Las Vegas : IEEE , 2016 : 770 - 778 .
YANG F , FAN H , CHU P , et al . Clustered object detection in aerial images [C ] // 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Seoul : IEEE , 2020 : 8310 - 8319 .
LI C L , YANG T , ZHU S J , et al . Density map guided object detection in aerial images [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) . Seattle : IEEE , 2020 : 737 - 746 .
ÜNEL F Ö , ÖZKALAYCI B O , ÇIĞLA C . The power of tiling for small object detection [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) . Long Beach : IEEE , 2020 : 582 - 591 .
GONG Y Q , YU X H , DING Y , et al . Effective fusion factor in FPN for tiny object detection [C ] // 2021 IEEE Winter Conference on Applications of Computer Vision (WACV) . Waikoloa : IEEE , 2021 : 1159 - 1167 .
BAI Y C , ZHANG Y Q , DING M L , et al . SOD-MTGAN: Small object detection via multi-task generative adversarial network [C ] // Computer Vision - ECCV 2018 . Munich : Springer , 2018 : 210 - 226 .
LIANG X , ZHANG J , ZHUO L , et al . Small object detection in unmanned aerial vehicle images using feature fusion and scaling-based single shot detector with spatial context analysis [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2020 , 30 ( 6 ): 1758 - 1770 .
LONG H , CHUNG Y , LIU Z B , et al . Object detection in aerial images using feature fusion deep networks [J ] . IEEE Access , 2019 , 7 : 30980 - 30990 .
LIN T Y , GOYAL P , GIRSHICK R , et al . Focal loss for dense object detection [C ] // 2017 IEEE International Conference on Computer Vision (ICCV) . Venice : IEEE , 2017 : 2999 - 3007 .
REDMON J , DIVVALA S , GIRSHICK R , et al . You only look once: Unified, real-time object detection [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Las Vegas : IEEE , 2016 : 779 - 788 .
ZHANG J Y , HUANG J Y , CHEN X K , et al . How to fully exploit the abilities of aerial image detectors [C ] // 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW) . Seoul : IEEE , 2020 : 1 - 8 .
侯志强 , 刘晓义 , 余旺盛 , 等 . 使用GIoU改进非极大值抑制的目标检测算法 [J ] . 电子学报 , 2021 , 49 ( 4 ): 696 - 705 .
HOU Z Q , LIU X Y , YU W S , et al . Object detection algorithm for improving non-maximum suppression using GIoU [J ] . Acta Electronica Sinica , 2021 , 49 ( 4 ): 696 - 705 . (in Chinese)
REZATOFIGHI H , TSOI N , GWAK J , et al . Generalized intersection over union: A metric and a loss for bounding box regression [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Long Beach : IEEE , 2020 : 658 - 666 .
LI X , WANG W H , WU L J , et al . Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection [C ] // Proceedings of the 34th International Conference on Neural Information Processing Systems . Vancouver : Curran Associates Inc. , 2020 : 21002 - 21012 .
RUSSAKOVSKY O , DENG J , SU H , et al . ImageNet large scale visual recognition challenge [J ] . International Journal of Computer Vision , 2015 , 115 ( 3 ): 211 - 252 .
LI X , WANG W H , HU X L , et al . Generalized focal loss V2: Learning reliable localization quality estimation for dense object detection [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Nashville : IEEE , 2021 : 11627 - 11636 .
0
浏览量
29
下载量
12
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621