A Survey of Generic Object Detection Methods Based on Deep Learning

CHENG Xu; SONG Chen; SHI Jin-gang; ZHOU Lin; ZHANG Yi-feng; ZHENG Yu-hui

doi:10.12263/DZXB.20200570

您当前的位置：

首页 >

文章列表页 >

A Survey of Generic Object Detection Methods Based on Deep Learning

SURVEYS AND REVIEWS | 更新时间：2025-12-08

- A Survey of Generic Object Detection Methods Based on Deep Learning
- ACTA ELECTRONICA SINICA Vol. 49, Issue 7, Pages: 1428-1438(2021)
- 作者机构：
  
  1.南京信息工程大学计算机与软件学院，江苏南京 210044
  2.西安交通大学软件学院, 陕西西安 710049
  3.东南大学信息科学与工程学院, 江苏南京 210096
- 作者简介：
- 基金信息：
- DOI：10.12263/DZXB.20200570
  CLC： TP391
- Received：15 June 2020，
  
  Revised：2021-01-15，
  
  Published：25 July 2021
- 稿件说明：
移动端阅览
程旭,宋晨,史金钢等.基于深度学习的通用目标检测研究综述[J].电子学报,2021,49(07):1428-1438.

CHENG Xu,SONG Chen,SHI Jin-gang,et al.A Survey of Generic Object Detection Methods Based on Deep Learning[J].ACTA ELECTRONICA SINICA,2021,49(07):1428-1438.
程旭,宋晨,史金钢等.基于深度学习的通用目标检测研究综述[J].电子学报,2021,49(07):1428-1438. DOI： 10.12263/DZXB.20200570.

CHENG Xu,SONG Chen,SHI Jin-gang,et al.A Survey of Generic Object Detection Methods Based on Deep Learning[J].ACTA ELECTRONICA SINICA,2021,49(07):1428-1438. DOI： 10.12263/DZXB.20200570.

摘要

目标检测是计算机视觉领域中最基础且最重要的任务之一，是行为识别与人机交互等高层视觉任务的基础.随着深度学习技术的发展，目标检测模型的准确率和效率得到了大幅提升.与传统的目标检测算法相比，深度学习利用强大的分层特征提取和学习能力使得目标检测算法性能取得了突破性进展.与此同时，大规模数据集的出现及显卡计算能力的极大提高也促成了这一领域的蓬勃发展.本文对基于深度学习的目标检测现有研究成果进行了详细综述.首先回顾传统目标检测算法及其存在的问题，其次总结深度学习下区域提案和单阶段基准检测模型.之后从特征图、上下文模型、边框优化、区域提案、类别不平衡处理、训练策略、弱监督学习和无监督学习这八个角度分类总结当前主流的目标检测模型，最后对目标检测算法中待解决的问题和未来研究方向做出展望.

Abstract

Object detection is one of the most fundamental and important tasks in the field of computer vision

which is the basis of high‑level vision tasks such as behavior recognition and human‑computer interaction. With the development of deep learning technology

the accuracy and efficiency of object detectors have been greatly improved. Compared with traditional object detection algorithms

deep learning utilizes powerful hierarchical feature extraction and learning capabilities to make breakthroughs in the performance of object detectors. Meanwhile

the large‑scale datasets and the tremendous improvement in computing power have also contributed to the vigorous development in this field. In this paper

the existing research of object detectors based on deep learning are reviewed in detail. First

we review the traditional object detection algorithms and its problems. Then

object detectors based on deep learning are introduced

and the region‑based and single‑stage benchmark detectors are summarized. After that

the current mainstream object detectors are concluded from eight perspectives of feature maps

context information

bounding box optimization

regional proposal

category imbalance processing

training strategy

weakly supervised learning and unsupervised learning. Finally

the problems to be solved in the object detectors are proposed and future research directions are prospected.

关键词

Keywords

references

Fischler M A ， et al . The representation and matching of pictorial structures ［J］. IEEE Transactions on Computers ， 1973 ， 100 （ 1 ）： 67 - 92 .

Everingham M ， Van Gool L ， Williams C K I ， et al . The PASCAL visual object classes （VOC） challenge ［J］. International Journal of Computer Vision ， 2010 ， 88 （ 2 ）： 303 - 338 .

Felzenszwalb P F ， Girshick R B ， McAllester D ， et al . Object detection with discriminatively trained part‑based models ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence ， 2009 ， 32 （ 9 ）： 1627 - 1645 .

Dalal N ， Triggs B . Histograms of oriented gradients for human detection ［A］. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition ［C］. New York ： IEEE ， 2005 . 886 - 893 .

Lowe D G . Object recognition from local scale‑invariant features ［A］. Jim Little. IEEE International Conference on Computer Vision ［C］. New York ： IEEE ， 1999 . 1150 - 1157 .

Krizhevsky A ， Sutskever I ， Hinton G E . Imagenet classification with deep convolutional neural networks ［A］. L Bottou. Advances in Neural Information Processing Systems ［C］. CA ： Morgan Kaufmann ， 2012 . 1097 - 1105 .

Chen C ， Seff A ， Kornhauser A ， et al . Deepdriving： Learning affordance for direct perception in autonomous driving ［A］. Jim Little. IEEE International Conference on Computer Vision ［C］. New York ： IEEE ， 2015 . 2722 - 2730 .

Chen X ， Ma H ， Wan J ， et al . Multi‑view 3D object detection network for autonomous driving ［A］. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition ［C］. New York ： IEEE ， 2017 . 1907 - 1915 .

Deng J ， Dong W ， Socher R ， et al . Imagenet： A large‑ scale hierarchical image database ［A］. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition ［C］. New York ： IEEE ， 2009 . 248 - 255 .

Lin T Y ， Maire M ， Belongie S ， et al . Microsoft COCO： Common objects in context ［A］. Vittorio Ferrari. European Conference on Computer Vision ［C］. Berlin ： Springer ， 2014 . 740 - 755 .

Kuznetsova A ， Rom H ， Alldrin N ， et al . The Open Images Dataset v 4 ： Unified Image Classification ， Object Detection ， and Visual Relationship Detection at Scale［EB/OL］. arXiv preprint arXiv： 1811.00982 ， 2018.

Shao S ， Li Z ， Zhang T ， et al . Objects365 ： A large⁃scale， high⁃quality dataset for object detection［A］. Jim Little . IEEE International Conference on Computer Vision［C］. New York ： IEEE ， 2019 . 8430 - 8439 .

Uijlings J R R ， Van De Sande K E A ， Gevers T ， et al . Selective search for object recognition ［J］. International Journal of Computer Vision ， 2013 ， 104 （ 2 ）： 154 - 171 .

Zitnick C L ， Dollár P . Edge boxes： locating object proposals from edges ［A］. Vittorio Ferrari. European Conference on Computer Vision ［C］. Berlin ： Springer ， 2014 . 391 - 405 .

Ren S ， He K ， Girshick R ， et al . Faster R‑CNN： Towards real‑time object detection with region proposal networks ［A］. Advances in Neural Information Processing Systems ［C］. CA ： Morgan Kaufmann ， 2015 . 91 - 99 .

Girshick R ， Donahue J ， Darrell T ， et al . Rich feature hierarchies for accurate object detection and semantic segmentation ［A］. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition ［C］. New York ： IEEE ， 2014 . 580 - 587 .

He K ， Zhang X ， Ren S ， et al . Spatial pyramid pooling in deep convolutional networks for visual recognition ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence ， 2015 ， 38 （ 9 ）： 1904 - 1916 .

Girshick R . Fast R‑CNN ［A］. Jim Little. IEEE International Conference on Computer Vision ［C］. New York ： IEEE ， 2015 . 1440 - 1448 .

Dai J ， Li Y ， He K ， et al . R‑FCN ： Object detection via region‑based fully convolutional networks［A］. U .

Luxburg . Advances in Neural Information Processing Systems ［C］. CA ： Morgan Kaufmann ， 2016 . 379 - 387 .

He K ， Gkioxari G ， Dollár P ， et al . Mask R‑CNN ［A］. Jim Little. IEEE International Conference on Computer Vision ［C］. New York ： IEEE ， 2017 . 2961 - 2969 .

Hearst M A ， Dumais S T ， Osuna E ， et al . Support vector machines ［J］. IEEE Intelligent Systems and Their Applications ， 1998 ， 13 （ 4 ）： 18 - 28 .

Redmon J ， Divvala S ， Girshick R ， et al . You only look once ： unified， real‑time object detection［A］. Satya Nadella . IEEE Conference on Computer Vision and Pattern Recognition［C］. New York ： IEEE ， 2016 . 779 - 788 .

Redmon J ， Farhadi A . YOLO9000 ： better， faster， stron‑ ger［A］. Satya Nadella . IEEE Conference on Computer Vision and Pattern Recognition［C］. New York ： IEEE ， 2017 . 7263 - 7271 .

Redmon J ， Farhadi A . Yolov3： An Incremental Improvement ［EB/OL］. arXiv preprint arXiv： 1804.02767 ， 2018 .

Bochkovskiy A ， Wang C Y ， Liao H Y M . YOLOv4： Optimal Speed and Accuracy of Object Detection ［EB/OL］. arXiv preprint arXiv： 2004.10934 ， 2020 .

Wang C Y ， Mark Liao H Y ， Wu Y H ， et al . CSPNet： A new backbone that can enhance learning capability of cnn ［A］. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition ［C］. New York ： IEEE ， 2020 . 390 - 391 .

Misra D . Mish： A Self Regularized Non‑monotonic Neural Activation Function ［EB/OL］. arXiv preprint arXiv： 1908.08681 ， 2019 .

Ghiasi G ， Lin T Y ， Le Q V . Dropblock ： A regularization method for convolutional networks［A］. H .

Wallach . Advances in Neural Information Processing Systems ［C］. CA ： Morgan Kaufmann ， 2018 . 10727 - 10737 .

Liu W ， Anguelov D ， Erhan D ， et al . Ssd： Single shot multibox detector ［A］. Vittorio Ferrari. European Conference on Computer Vision ［C］. Berlin ： Springer ， 2016 . 21 - 37 .

Fu C Y ， Liu W ， Ranga A ， et al . DSSD： Deconvolutional Single Shot Detector ［EB/OL］. arXiv preprint arXiv： 1701.06659 ， 2017 .

Law H ， Deng J . Cornernet： Detecting objects as paired keypoints ［A］. Vittorio Ferrari. European Conference on Computer Vision ［C］. Berlin ： Springer ， 2018 . 734 - 750 .

Duan K ， Bai S ， Xie L ， et al . Centernet： Keypoint triplets for object detection ［A］. Jim Little. IEEE International Conference on Computer Vision ［C］. New York ： IEEE ， 2019 . 6569 - 6578 .

Bell S ， Lawrence Zitnick C ， Bala K ， et al . Inside‑outside net： detecting objects in context with skip pooling and recurrent neural networks ［A］. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition ［C］. New York ： IEEE ， 2016 . 2874 - 2883 .

Kong T ， Yao A ， Chen Y ， et al . Hypernet： Towards accurate region proposal generation and joint object detection ［A］. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition ［C］. New York ： IEEE ， 2016 . 845 - 853 .

Cai Z ， Fan Q ， Feris R S ， et al . A unified multi‑scale deep convolutional neural network for fast object detection ［A］. Vittorio Ferrari. European Conference on Computer Vision ［C］. Berlin ： Springer ， 2016 . 354 - 370 .

Liu S ， Huang D . Receptive field block net for accurate and fast object detection ［A］. Vittorio Ferrari. European Conference on Computer Vision ［C］. Berlin ： Springer ， 2018 . 385 - 400 .

Li Y ， Chen Y ， Wang N ， et al . Scale‑aware trident networks for object detection ［A］. Jim Little. IEEE International Conference on Computer Vision ［C］. New York ： IEEE ， 2019 . 6054 - 6063 .

Lin T Y ， Dollár P ， Girshick R ， et al . Feature pyramid networks for object detection ［A］. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition ［C］. New York ： IEEE ， 2017 . 2117 - 2125 .

Zhou P ， Ni B ， Geng C ， et al . Scale‑transferrable object detection ［A］. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition ［C］. New York ： IEEE ， 2018 . 528 - 537 .

Li Z ， Peng C ， Yu G ， et al . DetNet： Design backbone for object detection ［A］. Vittorio Ferrari. European Conference on Computer Vision ［C］. Berlin ： Springer ， 2018 . 334 - 350 .

Zhao Q ， Sheng T ， Wang Y ， et al . M2Det： A single‑shot object detector based on multi‑level feature pyramid network ［A］. Yang Q. American Association for Artificial Intelligence ［C］. New York ： IEEE ， 2019 . 9259 - 9266 .

Tian Z ， Shen C ， Chen H ， et al . FCOS： Fully convolutional one‑stage object detection ［A］. Jim Little. IEEE International Conference on Computer Vision ［C］. New York ： IEEE ， 2019 . 9627 - 9636 .

Tan M ， Pang R ， Le Q V . Efficientdet： scalable and efficient object detection ［A］. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition ［C］. New York ： IEEE ， 2020 . 10781 - 10790 .

Ouyang W ， Wang X ， Zeng X ， et al . DeepID‑Net： Deformable deep convolutional neural networks for object detection ［A］. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition ［C］. New York ： IEEE ， 2015 . 2403 - 2412 .

Shrivastava A ， Gupta A . Contextual priming and feedback for faster R‑CNN ［A］. Vittorio Ferrari. European Conference on Computer Vision ［C］. Berlin ： Springer ， 2016 . 330 - 348 .

Gidaris S ， Komodakis N . Object detection via a multi‑ region and semantic segmentation‑aware CNN model ［A］. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition ［C］. New York ： IEEE ， 2015 . 1134 - 1142 .

Zeng X ， Ouyang W ， Yan J ， et al . Crafting GBD‑net for object detection ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence ， 2017 ， 40 （ 9 ）： 2109 - 2123 .

Li J ， Wei Y ， Liang X ， et al . Attentive contexts for object detection ［J］. IEEE Transactions on Multimedia ， 2016 ， 19 （ 5 ）： 944 - 954 .

Zhu Y ， Zhao C ， Wang J ， et al . Couplenet： Coupling global structure with local parts for object detection ［A］. Jim Little. IEEE International Conference on Computer Vision ［C］. New York ： IEEE ， 2017 . 4126 - 4134 .

Zagoruyko S ， Lerer A ， Lin T Y ， et al . A Multipath Network for Object Detection ［EB/OL］. arXiv preprint arXiv： 1604.02135 ， 2016 .

Cai Z ， Vasconcelos N . Cascade R‑CNN： Delving into high quality object detection ［A］. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition ［C］. New York ： IEEE ， 2018 . 6154 - 6162 .

Lu X ， Li B ， Yue Y ， et al . Grid R‑CNN ［A］. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition ［C］. New York ： IEEE ， 2019 . 7363 - 7372 .

Rosenfeld A ， Thurston M . Edge and curve detection for visual scene analysis ［J］. IEEE Transactions on computers ， 1971 ， 100 （ 5 ）： 562 - 569 .

Bodla N ， Singh B ， Chellappa R ， et al . Soft‑NMS—improving object detection with one line of code ［A］. Jim Little. IEEE International Conference on Computer Vision ［C］. New York ： IEEE ， 2017 . 5561 - 5569 .

He Y ， Zhu C ， Wang J ， et al . Bounding box regression with uncertainty for accurate object detection ［A］： Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition ［C］. New York ： IEEE ， 2019 . 2888 - 2897 .

Kuo W ， Hariharan B ， Malik J . Deepbox： Learning objectness with convolutional networks ［A］. Jim Little. IEEE International Conference on Computer Vision ［C］. New York ： IEEE ， 2015 . 2479 - 2487 .

Ghodrati A ， Diba A ， Pedersoli M ， et al . Deep proposal： Hunting objects by cascading deep convolutional layers ［A］. Jim Little. IEEE International Conference on Computer Vision ［C］. New York ： IEEE ， 2015 . 2578 - 2586 .

Sung K K . Learning and Example Selection for Object and Pattern Detection ［D］. Massachusetts ， USA： MIT AI Lab ， 1995 .

Shrivastava A ， Gupta A ， Girshick R . Training region‑ based object detectors with online hard example mining ［A］. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition ［C］. New York ： IEEE ， 2016 . 761 - 769 .

Lin T Y ， Goyal P ， Girshick R ， et al . Focal loss for dense object detection ［A］. Jim Little. IEEE International Conference on Computer Vision ［C］. New York ： IEEE ， 2017 . 2980 - 2988 .

Peng C ， Xiao T ， Li Z ， et al . MegDet： A large mini‑ batch object detector ［A］. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition ［C］. New York ： IEEE ， 2018 . 6181 - 6189 .

Wang T ， Zhu Y ， Zhao C ， et al . Large batch optimization for object detection： training coco in 12 minutes ［A］. Vittorio Ferrari. European Conference on Computer Vision ［C］. Berlin ： Springer ， 2020 . 481 - 496 .

Singh B ， Davis L S . An analysis of scale invariance in object detection snip ［A］. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition ［C］. New York ： IEEE ， 2018 . 3578 - 3587 .

Singh B ， Najibi M ， Davis L S . Sniper： Efficient multi‑scale training ［A］. H Wallach. Advances in Neural Information Processing Systems ［C］. CA ： Morgan Kaufmann ， 2018 . 9310 - 9320 .

Shen Z ， Liu Z ， Li J ， et al . Dsod： Learning deeply supervised object detectors from scratch ［A］. Jim Little. IEEE International Conference on Computer Vision ［C］. New York ： IEEE ， 2017 . 1919 - 1927 .

Zhu R ， Zhang S ， Wang X ， et al . ScratchDet： Training single‑shot object detectors from scratch ［A］. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition ［C］. New York ： IEEE ， 2019 . 2268 - 2277 .

Liu T ， Yuan Z ， Sun J ， et al . Learning to detect a salient object ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence ， 2010 ， 33 （ 2 ）： 353 - 367 .

Li J ， Li X ， Yang B ， et al . Segmentation‑based image copy‑move forgery detection scheme ［J］. IEEE Transactions on Information Forensics and Security ， 2014 ， 10 （ 3 ）： 507 - 518 .

Li X ， Kan M ， Shan S ， et al . Weakly supervised object detection with segmentation collaboration ［A］. Jim Little. IEEE International Conference on Computer Vision ［C］. New York ： IEEE ， 2019 . 9735 - 9744 .

Arun A ， Jawahar C V ， Kumar M P . Dissimilarity coefficient based weakly supervised object detection ［A］. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition ［C］. New York ： IEEE ， 2019 . 9432 - 9441 .

Lin C ， Wang S ， Xu D ， et al . Object instance mining for weakly supervised object detection ［A］. Yang Q. American Association for Artificial Intelligence ［C］. New York ： IEEE ， 2020 . 11482 - 11489 .

Ren Z ， Yu Z ， Yang X ， et al . Instance‑aware ， context‑ focused， and memory‑efficient weakly supervised object detection［A］. Satya Nadella . IEEE Conference on Computer Vision and Pattern Recognition［C］. New York ： IEEE ， 2020 . 10598 - 10607 .

Bilen H ， Vedaldi A . Weakly supervised deep detection networks ［A］. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition ［C］. New York ： IEEE ， 2016 . 2846 - 2854 .

Kantorov V ， Oquab M ， Cho M ， et al . ContextLocNet： Context‑aware deep network models for weakly supervised localization ［A］. Vittorio Ferrari. European Conference on Computer Vision ［C］. Berlin ： Springer ， 2016 . 350 - 365 .

Zeng Z ， Liu B ， Fu J ， et al . WSOD2： Learning bottom‑ up and top‑down objectness distillation for weakly‑supe‑ rvised object detection ［A］. Jim Little. IEEE International Conference on Computer Vision ［C］. New York ： IEEE ， 2019 . 8292 - 8300 .

Kanezaki A . Unsupervised image segmentation by backpropagation ［A］. IEEE International Conference on Aco‑ ustics， Speech and Signal Processing［C］. New York ： IEEE ， 2018 . 1543 - 1547 .

Croitoru I ， Bogolin S V ， Leordeanu M . Unsupervised learning of foreground object segmentation ［J］. International Journal of Computer Vision ， 2019 ， 127 （ 9 ）： 1279 - 1302 .

Chen Y ， Li W ， Sakaridis C ， et al . Domain adaptive faster R‑CNN for object detection in the wild ［A］. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition ［C］. New York ： IEEE ， 2018 . 3339 - 3348 .

Zhu X ， Pang J ， Yang C ， et al . Adapting object detectors via selective cross‑domain alignment ［A］. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition ［C］. New York ： IEEE ， 2019 . 687 - 696 .

Hsu H K ， Yao C H ， Tsai Y H ， et al . Progressive domain adaptation for object detection ［A］. Satya Nadella. IEEE Conference on Computer Vision and Pattern Recognition ［C］. New York ： IEEE ， 2020 . 749 - 757 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Survey of Object Detection Based on Deep Learning

Operator Fusion Method and Hardware Architecture Design Based on Non-Standard Operators

AI-DETR: Interpretable Object Detection Method Based on Adaptive Weighting

Continual Learning Methods and Applications in Computer Vision

Open World Object Detection Based on Causal Prompt Distillation

Related Author

LUO Hui-lan

CHEN Hong-kun

WANG Ying

GAO Lan

ZHANG Zhe

LIU Xin

WU Yi-xiong

ZHANG Wei-gong

Related Institution

School of Information Engineering, Jiangxi University of Science and Technology

College of Information Engineering, Capital Normal University

School of Mathematical Science, Capital Normal University

Faculty of Software Technologics, Shanxi Agricultural University

School of Computer Science, Shaanxi Normal University

⁰