

浏览全部资源
扫码关注微信
1.南京信息工程大学计算机与软件学院,江苏南京 210044
2.西安交通大学软件学院, 陕西西安 710049
3.东南大学信息科学与工程学院, 江苏南京 210096
Received:15 June 2020,
Revised:2021-01-15,
Published:25 July 2021
移动端阅览
程旭,宋晨,史金钢等.基于深度学习的通用目标检测研究综述[J].电子学报,2021,49(07):1428-1438.
CHENG Xu,SONG Chen,SHI Jin-gang,et al.A Survey of Generic Object Detection Methods Based on Deep Learning[J].ACTA ELECTRONICA SINICA,2021,49(07):1428-1438.
程旭,宋晨,史金钢等.基于深度学习的通用目标检测研究综述[J].电子学报,2021,49(07):1428-1438. DOI: 10.12263/DZXB.20200570.
CHENG Xu,SONG Chen,SHI Jin-gang,et al.A Survey of Generic Object Detection Methods Based on Deep Learning[J].ACTA ELECTRONICA SINICA,2021,49(07):1428-1438. DOI: 10.12263/DZXB.20200570.
目标检测是计算机视觉领域中最基础且最重要的任务之一,是行为识别与人机交互等高层视觉任务的基础.随着深度学习技术的发展,目标检测模型的准确率和效率得到了大幅提升.与传统的目标检测算法相比,深度学习利用强大的分层特征提取和学习能力使得目标检测算法性能取得了突破性进展.与此同时,大规模数据集的出现及显卡计算能力的极大提高也促成了这一领域的蓬勃发展.本文对基于深度学习的目标检测现有研究成果进行了详细综述.首先回顾传统目标检测算法及其存在的问题,其次总结深度学习下区域提案和单阶段基准检测模型.之后从特征图、上下文模型、边框优化、区域提案、类别不平衡处理、训练策略、弱监督学习和无监督学习这八个角度分类总结当前主流的目标检测模型,最后对目标检测算法中待解决的问题和未来研究方向做出展望.
Object detection is one of the most fundamental and important tasks in the field of computer vision
which is the basis of high‑level vision tasks such as behavior recognition and human‑computer interaction. With the development of deep learning technology
the accuracy and efficiency of object detectors have been greatly improved. Compared with traditional object detection algorithms
deep learning utilizes powerful hierarchical feature extraction and learning capabilities to make breakthroughs in the performance of object detectors. Meanwhile
the large‑scale datasets and the tremendous improvement in computing power have also contributed to the vigorous development in this field. In this paper
the existing research of object detectors based on deep learning are reviewed in detail. First
we review the traditional object detection algorithms and its problems. Then
object detectors based on deep learning are introduced
and the region‑based and single‑stage benchmark detectors are summarized. After that
the current mainstream object detectors are concluded from eight perspectives of feature maps
context information
bounding box optimization
regional proposal
category imbalance processing
training strategy
weakly supervised learning and unsupervised learning. Finally
the problems to be solved in the object detectors are proposed and future research directions are prospected.
Fischler M A , et al . The representation and matching of pictorial structures [J]. IEEE Transactions on Computers , 1973 , 100 ( 1 ): 67 - 92 .
Everingham M , Van Gool L , Williams C K I , et al . The PASCAL visual object classes (VOC) challenge [J]. International Journal of Computer Vision , 2010 , 88 ( 2 ): 303 - 338 .
Felzenszwalb P F , Girshick R B , McAllester D , et al . Object detection with discriminatively trained part‑based models [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2009 , 32 ( 9 ): 1627 - 1645 .
Dalal N , Triggs B . Histograms of oriented gradients for human detection [A]. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition [C]. New York : IEEE , 2005 . 886 - 893 .
Lowe D G . Object recognition from local scale‑invariant features [A]. Jim Little. IEEE International Conference on Computer Vision [C]. New York : IEEE , 1999 . 1150 - 1157 .
Krizhevsky A , Sutskever I , Hinton G E . Imagenet classification with deep convolutional neural networks [A]. L Bottou. Advances in Neural Information Processing Systems [C]. CA : Morgan Kaufmann , 2012 . 1097 - 1105 .
Chen C , Seff A , Kornhauser A , et al . Deepdriving: Learning affordance for direct perception in autonomous driving [A]. Jim Little. IEEE International Conference on Computer Vision [C]. New York : IEEE , 2015 . 2722 - 2730 .
Chen X , Ma H , Wan J , et al . Multi‑view 3D object detection network for autonomous driving [A]. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition [C]. New York : IEEE , 2017 . 1907 - 1915 .
Deng J , Dong W , Socher R , et al . Imagenet: A large‑ scale hierarchical image database [A]. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition [C]. New York : IEEE , 2009 . 248 - 255 .
Lin T Y , Maire M , Belongie S , et al . Microsoft COCO: Common objects in context [A]. Vittorio Ferrari. European Conference on Computer Vision [C]. Berlin : Springer , 2014 . 740 - 755 .
Kuznetsova A , Rom H , Alldrin N , et al . The Open Images Dataset v 4 : Unified Image Classification , Object Detection , and Visual Relationship Detection at Scale[EB/OL]. arXiv preprint arXiv: 1811.00982 , 2018.
Shao S , Li Z , Zhang T , et al . Objects365 : A large⁃scale, high⁃quality dataset for object detection[A]. Jim Little . IEEE International Conference on Computer Vision[C]. New York : IEEE , 2019 . 8430 - 8439 .
Uijlings J R R , Van De Sande K E A , Gevers T , et al . Selective search for object recognition [J]. International Journal of Computer Vision , 2013 , 104 ( 2 ): 154 - 171 .
Zitnick C L , Dollár P . Edge boxes: locating object proposals from edges [A]. Vittorio Ferrari. European Conference on Computer Vision [C]. Berlin : Springer , 2014 . 391 - 405 .
Ren S , He K , Girshick R , et al . Faster R‑CNN: Towards real‑time object detection with region proposal networks [A]. Advances in Neural Information Processing Systems [C]. CA : Morgan Kaufmann , 2015 . 91 - 99 .
Girshick R , Donahue J , Darrell T , et al . Rich feature hierarchies for accurate object detection and semantic segmentation [A]. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition [C]. New York : IEEE , 2014 . 580 - 587 .
He K , Zhang X , Ren S , et al . Spatial pyramid pooling in deep convolutional networks for visual recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2015 , 38 ( 9 ): 1904 - 1916 .
Girshick R . Fast R‑CNN [A]. Jim Little. IEEE International Conference on Computer Vision [C]. New York : IEEE , 2015 . 1440 - 1448 .
Dai J , Li Y , He K , et al . R‑FCN : Object detection via region‑based fully convolutional networks[A]. U .
Luxburg . Advances in Neural Information Processing Systems [C]. CA : Morgan Kaufmann , 2016 . 379 - 387 .
He K , Gkioxari G , Dollár P , et al . Mask R‑CNN [A]. Jim Little. IEEE International Conference on Computer Vision [C]. New York : IEEE , 2017 . 2961 - 2969 .
Hearst M A , Dumais S T , Osuna E , et al . Support vector machines [J]. IEEE Intelligent Systems and Their Applications , 1998 , 13 ( 4 ): 18 - 28 .
Redmon J , Divvala S , Girshick R , et al . You only look once : unified, real‑time object detection[A]. Satya Nadella . IEEE Conference on Computer Vision and Pattern Recognition[C]. New York : IEEE , 2016 . 779 - 788 .
Redmon J , Farhadi A . YOLO9000 : better, faster, stron‑ ger[A]. Satya Nadella . IEEE Conference on Computer Vision and Pattern Recognition[C]. New York : IEEE , 2017 . 7263 - 7271 .
Redmon J , Farhadi A . Yolov3: An Incremental Improvement [EB/OL]. arXiv preprint arXiv: 1804.02767 , 2018 .
Bochkovskiy A , Wang C Y , Liao H Y M . YOLOv4: Optimal Speed and Accuracy of Object Detection [EB/OL]. arXiv preprint arXiv: 2004.10934 , 2020 .
Wang C Y , Mark Liao H Y , Wu Y H , et al . CSPNet: A new backbone that can enhance learning capability of cnn [A]. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition [C]. New York : IEEE , 2020 . 390 - 391 .
Misra D . Mish: A Self Regularized Non‑monotonic Neural Activation Function [EB/OL]. arXiv preprint arXiv: 1908.08681 , 2019 .
Ghiasi G , Lin T Y , Le Q V . Dropblock : A regularization method for convolutional networks[A]. H .
Wallach . Advances in Neural Information Processing Systems [C]. CA : Morgan Kaufmann , 2018 . 10727 - 10737 .
Liu W , Anguelov D , Erhan D , et al . Ssd: Single shot multibox detector [A]. Vittorio Ferrari. European Conference on Computer Vision [C]. Berlin : Springer , 2016 . 21 - 37 .
Fu C Y , Liu W , Ranga A , et al . DSSD: Deconvolutional Single Shot Detector [EB/OL]. arXiv preprint arXiv: 1701.06659 , 2017 .
Law H , Deng J . Cornernet: Detecting objects as paired keypoints [A]. Vittorio Ferrari. European Conference on Computer Vision [C]. Berlin : Springer , 2018 . 734 - 750 .
Duan K , Bai S , Xie L , et al . Centernet: Keypoint triplets for object detection [A]. Jim Little. IEEE International Conference on Computer Vision [C]. New York : IEEE , 2019 . 6569 - 6578 .
Bell S , Lawrence Zitnick C , Bala K , et al . Inside‑outside net: detecting objects in context with skip pooling and recurrent neural networks [A]. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition [C]. New York : IEEE , 2016 . 2874 - 2883 .
Kong T , Yao A , Chen Y , et al . Hypernet: Towards accurate region proposal generation and joint object detection [A]. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition [C]. New York : IEEE , 2016 . 845 - 853 .
Cai Z , Fan Q , Feris R S , et al . A unified multi‑scale deep convolutional neural network for fast object detection [A]. Vittorio Ferrari. European Conference on Computer Vision [C]. Berlin : Springer , 2016 . 354 - 370 .
Liu S , Huang D . Receptive field block net for accurate and fast object detection [A]. Vittorio Ferrari. European Conference on Computer Vision [C]. Berlin : Springer , 2018 . 385 - 400 .
Li Y , Chen Y , Wang N , et al . Scale‑aware trident networks for object detection [A]. Jim Little. IEEE International Conference on Computer Vision [C]. New York : IEEE , 2019 . 6054 - 6063 .
Lin T Y , Dollár P , Girshick R , et al . Feature pyramid networks for object detection [A]. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition [C]. New York : IEEE , 2017 . 2117 - 2125 .
Zhou P , Ni B , Geng C , et al . Scale‑transferrable object detection [A]. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition [C]. New York : IEEE , 2018 . 528 - 537 .
Li Z , Peng C , Yu G , et al . DetNet: Design backbone for object detection [A]. Vittorio Ferrari. European Conference on Computer Vision [C]. Berlin : Springer , 2018 . 334 - 350 .
Zhao Q , Sheng T , Wang Y , et al . M2Det: A single‑shot object detector based on multi‑level feature pyramid network [A]. Yang Q. American Association for Artificial Intelligence [C]. New York : IEEE , 2019 . 9259 - 9266 .
Tian Z , Shen C , Chen H , et al . FCOS: Fully convolutional one‑stage object detection [A]. Jim Little. IEEE International Conference on Computer Vision [C]. New York : IEEE , 2019 . 9627 - 9636 .
Tan M , Pang R , Le Q V . Efficientdet: scalable and efficient object detection [A]. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition [C]. New York : IEEE , 2020 . 10781 - 10790 .
Ouyang W , Wang X , Zeng X , et al . DeepID‑Net: Deformable deep convolutional neural networks for object detection [A]. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition [C]. New York : IEEE , 2015 . 2403 - 2412 .
Shrivastava A , Gupta A . Contextual priming and feedback for faster R‑CNN [A]. Vittorio Ferrari. European Conference on Computer Vision [C]. Berlin : Springer , 2016 . 330 - 348 .
Gidaris S , Komodakis N . Object detection via a multi‑ region and semantic segmentation‑aware CNN model [A]. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition [C]. New York : IEEE , 2015 . 1134 - 1142 .
Zeng X , Ouyang W , Yan J , et al . Crafting GBD‑net for object detection [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017 , 40 ( 9 ): 2109 - 2123 .
Li J , Wei Y , Liang X , et al . Attentive contexts for object detection [J]. IEEE Transactions on Multimedia , 2016 , 19 ( 5 ): 944 - 954 .
Zhu Y , Zhao C , Wang J , et al . Couplenet: Coupling global structure with local parts for object detection [A]. Jim Little. IEEE International Conference on Computer Vision [C]. New York : IEEE , 2017 . 4126 - 4134 .
Zagoruyko S , Lerer A , Lin T Y , et al . A Multipath Network for Object Detection [EB/OL]. arXiv preprint arXiv: 1604.02135 , 2016 .
Cai Z , Vasconcelos N . Cascade R‑CNN: Delving into high quality object detection [A]. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition [C]. New York : IEEE , 2018 . 6154 - 6162 .
Lu X , Li B , Yue Y , et al . Grid R‑CNN [A]. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition [C]. New York : IEEE , 2019 . 7363 - 7372 .
Rosenfeld A , Thurston M . Edge and curve detection for visual scene analysis [J]. IEEE Transactions on computers , 1971 , 100 ( 5 ): 562 - 569 .
Bodla N , Singh B , Chellappa R , et al . Soft‑NMS—improving object detection with one line of code [A]. Jim Little. IEEE International Conference on Computer Vision [C]. New York : IEEE , 2017 . 5561 - 5569 .
He Y , Zhu C , Wang J , et al . Bounding box regression with uncertainty for accurate object detection [A]: Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition [C]. New York : IEEE , 2019 . 2888 - 2897 .
Kuo W , Hariharan B , Malik J . Deepbox: Learning objectness with convolutional networks [A]. Jim Little. IEEE International Conference on Computer Vision [C]. New York : IEEE , 2015 . 2479 - 2487 .
Ghodrati A , Diba A , Pedersoli M , et al . Deep proposal: Hunting objects by cascading deep convolutional layers [A]. Jim Little. IEEE International Conference on Computer Vision [C]. New York : IEEE , 2015 . 2578 - 2586 .
Sung K K . Learning and Example Selection for Object and Pattern Detection [D]. Massachusetts , USA: MIT AI Lab , 1995 .
Shrivastava A , Gupta A , Girshick R . Training region‑ based object detectors with online hard example mining [A]. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition [C]. New York : IEEE , 2016 . 761 - 769 .
Lin T Y , Goyal P , Girshick R , et al . Focal loss for dense object detection [A]. Jim Little. IEEE International Conference on Computer Vision [C]. New York : IEEE , 2017 . 2980 - 2988 .
Peng C , Xiao T , Li Z , et al . MegDet: A large mini‑ batch object detector [A]. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition [C]. New York : IEEE , 2018 . 6181 - 6189 .
Wang T , Zhu Y , Zhao C , et al . Large batch optimization for object detection: training coco in 12 minutes [A]. Vittorio Ferrari. European Conference on Computer Vision [C]. Berlin : Springer , 2020 . 481 - 496 .
Singh B , Davis L S . An analysis of scale invariance in object detection snip [A]. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition [C]. New York : IEEE , 2018 . 3578 - 3587 .
Singh B , Najibi M , Davis L S . Sniper: Efficient multi‑scale training [A]. H Wallach. Advances in Neural Information Processing Systems [C]. CA : Morgan Kaufmann , 2018 . 9310 - 9320 .
Shen Z , Liu Z , Li J , et al . Dsod: Learning deeply supervised object detectors from scratch [A]. Jim Little. IEEE International Conference on Computer Vision [C]. New York : IEEE , 2017 . 1919 - 1927 .
Zhu R , Zhang S , Wang X , et al . ScratchDet: Training single‑shot object detectors from scratch [A]. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition [C]. New York : IEEE , 2019 . 2268 - 2277 .
Liu T , Yuan Z , Sun J , et al . Learning to detect a salient object [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2010 , 33 ( 2 ): 353 - 367 .
Li J , Li X , Yang B , et al . Segmentation‑based image copy‑move forgery detection scheme [J]. IEEE Transactions on Information Forensics and Security , 2014 , 10 ( 3 ): 507 - 518 .
Li X , Kan M , Shan S , et al . Weakly supervised object detection with segmentation collaboration [A]. Jim Little. IEEE International Conference on Computer Vision [C]. New York : IEEE , 2019 . 9735 - 9744 .
Arun A , Jawahar C V , Kumar M P . Dissimilarity coefficient based weakly supervised object detection [A]. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition [C]. New York : IEEE , 2019 . 9432 - 9441 .
Lin C , Wang S , Xu D , et al . Object instance mining for weakly supervised object detection [A]. Yang Q. American Association for Artificial Intelligence [C]. New York : IEEE , 2020 . 11482 - 11489 .
Ren Z , Yu Z , Yang X , et al . Instance‑aware , context‑ focused, and memory‑efficient weakly supervised object detection[A]. Satya Nadella . IEEE Conference on Computer Vision and Pattern Recognition[C]. New York : IEEE , 2020 . 10598 - 10607 .
Bilen H , Vedaldi A . Weakly supervised deep detection networks [A]. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition [C]. New York : IEEE , 2016 . 2846 - 2854 .
Kantorov V , Oquab M , Cho M , et al . ContextLocNet: Context‑aware deep network models for weakly supervised localization [A]. Vittorio Ferrari. European Conference on Computer Vision [C]. Berlin : Springer , 2016 . 350 - 365 .
Zeng Z , Liu B , Fu J , et al . WSOD2: Learning bottom‑ up and top‑down objectness distillation for weakly‑supe‑ rvised object detection [A]. Jim Little. IEEE International Conference on Computer Vision [C]. New York : IEEE , 2019 . 8292 - 8300 .
Kanezaki A . Unsupervised image segmentation by backpropagation [A]. IEEE International Conference on Aco‑ ustics, Speech and Signal Processing[C]. New York : IEEE , 2018 . 1543 - 1547 .
Croitoru I , Bogolin S V , Leordeanu M . Unsupervised learning of foreground object segmentation [J]. International Journal of Computer Vision , 2019 , 127 ( 9 ): 1279 - 1302 .
Chen Y , Li W , Sakaridis C , et al . Domain adaptive faster R‑CNN for object detection in the wild [A]. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition [C]. New York : IEEE , 2018 . 3339 - 3348 .
Zhu X , Pang J , Yang C , et al . Adapting object detectors via selective cross‑domain alignment [A]. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition [C]. New York : IEEE , 2019 . 687 - 696 .
Hsu H K , Yao C H , Tsai Y H , et al . Progressive domain adaptation for object detection [A]. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition [C]. New York : IEEE , 2020 . 749 - 757 .
0
Views
16
下载量
20
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621