Rapid Search for Small Object in Reinforcement Learning by Combining Spatio-Temporal Contextual Information

JIANG Hong; MA Jiao-jiao; YAO Hong-ge; CHENG Si-yi; CHEN You; YU Jun

doi:10.12263/DZXB.20220617

您当前的位置：

首页 >

文章列表页 >

Rapid Search for Small Object in Reinforcement Learning by Combining Spatio-Temporal Contextual Information

PAPERS | 更新时间：2025-12-08

- Rapid Search for Small Object in Reinforcement Learning by Combining Spatio-Temporal Contextual Information
- ACTA ELECTRONICA SINICA Vol. 51, Issue 11, Pages: 3176-3186(2023)
- 作者机构：
  
  1.西安工业大学计算机科学与工程学院，陕西西安 710021
  2.空军工程大学航空工程学院，陕西西安 710038
- 作者简介：
- 基金信息：
- DOI：10.12263/DZXB.20220617
  CLC： TP391.4;
- Received：27 May 2022，
  
  Revised：2022-08-31，
  
  Published：25 November 2023
- 稿件说明：
移动端阅览
姜虹,马姣姣,姚红革等.融合时空上下文信息的强化学习小目标快速搜索[J].电子学报,2023,51(11):3176-3186.

JIANG Hong,MA Jiao-jiao,YAO Hong-ge,et al.Rapid Search for Small Object in Reinforcement Learning by Combining Spatio-Temporal Contextual Information[J].ACTA ELECTRONICA SINICA,2023,51(11):3176-3186.
姜虹,马姣姣,姚红革等.融合时空上下文信息的强化学习小目标快速搜索[J].电子学报,2023,51(11):3176-3186. DOI： 10.12263/DZXB.20220617.

JIANG Hong,MA Jiao-jiao,YAO Hong-ge,et al.Rapid Search for Small Object in Reinforcement Learning by Combining Spatio-Temporal Contextual Information[J].ACTA ELECTRONICA SINICA,2023,51(11):3176-3186. DOI： 10.12263/DZXB.20220617.

摘要

人眼在搜索目标时，先基于此前的扫视经验粗略扫视，找到可能有目标的位置，再进行详细搜索.前者的扫视可称为基于时间上下文信息的扫视，后者可称为基于位置上下文信息的搜索.受人眼这种目标搜索模式启发，本文提出一种结合强化学习的时空上下文目标搜索方法.该方法基于强化学习搜索策略构建时间上下文模块，获得时间上下文信息；再通过构建一个自适应多尺度窗口提取位置上下文信息，两种信息在目标搜索过程中交替配合，完成目标搜索.实验结果表明，该方法在MS COCO数据集上较基准方法提升了2.9%，且可在5个搜索次数内找到目标.

Abstract

When searching for a object

the human eye first roughly scans based on previous scanning experience to find potential locations for the object

and then conducts a detailed search. The former can be referred to as scanning based on temporal contextual information

while the latter can be referred to as searching based on location contextual information. Inspired by this

this paper proposes a rapid search method for small objects based on reinforcement learning that integrates spatio-temporal context information. The method builds a temporal context module based on a reinforcement learning search strategy to simulate the human eye's ability to obtain and utilize empirical information

then constructs an adaptive multi-scale window to extract location context information to simulate the human eye's ability to search carefully at possible locations. The two kinds of information cooperate alternately in the object search process to complete the object search. The experimental results show that the proposed algorithm brings around 2.9% gain on MS COCO benchmark

and can find an object within five search counts.

关键词

Keywords

references

MEYE A F , O'KEEFE J , POORT J . Two distinct types of eye-head coupling in freely moving mice [J ] . Current Biology , 2020 , 30 ( 11 ): 2116 - 2130 .

MNIH V , KAVUKCUOGLU K , SILVER D , et al . Human-level control through deep reinforcement learning [J ] . Nature , 2015 , 518 ( 7540 ): 529 - 533 .

LIU S , QI L , QIN H F , et al . Path aggregation network for instance segmentation [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 8759 - 8768 .

LENG J X , REN Y H , JIANG W X , et al . Realize your surroundings: Exploiting context information for small object detection [J ] . Neurocomputing , 2021 , 433 : 287 - 299 .

EVERINGHAM M , VAN GOOL L , WILLIAMS C K I , et al . The pascal visual object classes (VOC) challenge [J ] . International Journal of Computer Vision , 2010 , 88 ( 2 ): 303 - 338 .

LIN T Y , MAIRE M , BELONGIE S , et al . Microsoft COCO: Common objects in context [C ] // European Conference on Computer Vision . Cham : Springer , 2014 : 740 - 755 .

REN S Q , HE K M , GIRSHICK R , et al . Faster R-CNN: Towards real-time object detection with region proposal networks [C ] // Proceedings of the 28th International Conference on Neural Information Processing Systems . Cambridge : MIT Press , 2015 : 91 - 99 .

REDMON J , DIVVALA S , GIRSHICK R , et al . You only look once: Unified, real-time object detection [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 779 - 788 .

LIU W , ANGUELOV D , ERHAN D , et al . SSD: Single shot MultiBox detector [C ] // European Conference on Computer Vision . Cham : Springer , 2016 : 21 - 37 .

李宝奇 , 贺昱曜 , 强伟 , 等 . 基于并行附加特征提取网络的SSD地面小目标检测模型 [J ] . 电子学报 , 2020 , 48 ( 1 ): 84 - 91 .

LI B Q , HE Y Y , QIANG W , et al . SSD with parallel additional feature extraction network for ground small target detection [J ] . Acta Electronica Sinica , 2020 , 48 ( 1 ): 84 - 91 . (in Chinese)

CAO G M , XIE X M , YANG W Z , et al . Feature-fused SSD: Fast detection for small objects [C ] // Proceeding SPIE 10615 , Ninth International Conference on Graphic and Image Processing (ICGIP2017) . Bellingham : SPIE , 2018 : 381 - 388 .

LIN T Y , DOLLÁR P , GIRSHICK R , et al . Feature pyramid networks for object detection [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 936 - 944 .

CHEN Z , HUANG S L , TAO D C . Context refinement for object detection [C ] // European Conference on Computer Vision . Cham : Springer , 2018 : 74 - 89 .

冷佳旭 , 刘莹 . 基于深度学习的小目标检测与识别 [J ] . 数据与计算发展前沿 , 2020 , 2 ( 2 ): 120 - 135 .

LENG J X , LIU Y . Small object detection and recognition based on deep learning [J ] . Frontiers of Data & Computing , 2020 , 2 ( 2 ): 120 - 135 . (in Chinese)

TANG X , DU D K , HE Z Q , et al . PyramidBox: A context-assisted single shot face detector [C ] // European Conference on Computer Vision . Cham : Springer , 2018 : 812 - 828 .

PATO L V , NEGRINHO R , AGUIAR P M Q . Seeing without looking: Contextual rescoring of object detections for AP maximization [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 14598 - 14606 .

FU K , LI J , MA L , et al . Intrinsic relationship reasoning for small object detection [EB/OL ] . ( 2020-09-02 )[ 2022-04-06 ] . https://arxiv.org/abs/2009.00833 https://arxiv.org/abs/2009.00833 .

LIM J S , ASTRID M , YOON H J , et al . Small object detection using context and attention [C ] // 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC) . Piscataway : IEEE , 2021 : 181 - 186 .

MNIH V , HEESS N , GRAVES A . Recurrent models of visual attention [C ] // Proceedings of the 27th International Conference on Neural Information Processing Systems . Cambridge : MIT Press , 2014 : 2204 - 2212 .

程旭 , 宋晨 , 史金钢 , 等 . 基于深度学习的通用目标检测研究综述 [J ] . 电子学报 , 2021 , 49 ( 7 ): 1428 - 1438 .

CHENG X , SONG C , SHI J G , et al . A survey of generic object detection methods based on deep learning [J ] . Acta Electronica Sinica , 2021 , 49 ( 7 ): 1428 - 1438 . (in Chinese)

KONG T , SUN F C , YAO A B , et al . RON: Reverse connection with objectness prior networks for object detection [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 5244 - 5252 .

CAICEDO J C , LAZEBNIK S . Active object localization with deep reinforcement learning [C ] // 2015 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2016 : 2488 - 2496 .

JIE Z Q , LIANG X D , FENG J S , et al . Tree-structured reinforcement learning for sequential object localization [C ] // Proceedings of the 30th International Conference on Neural Information Processing Systems . Red Hook : Curran Associates Inc. , 2016 : 127 - 135 .

ZHOU M , WANG R J , XIE C J , et al . ReinforceNet: A reinforcement learning embedded object detection framework with region selection network [J ] . Neurocomputing , 2021 , 443 : 369 - 379 .

VIOLA P , JONES M . Rapid object detection using a boosted cascade of simple features [C ] // Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2001 : I .

UIJLINGS J R R , VAN DE SANDE K E A , GEVERS T , et al . Selective search for object recognition [J ] . International Journal of Computer Vision , 2013 , 104 ( 2 ): 154 - 171 .

HE K M , GKIOXARI G , DOLLÁR P , et al . Mask R-CNN [C ] // 2017 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2017 : 2980 - 2988 .

SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition [EB/OL ] . ( 2014-09-04 )[ 2022-04-06 ] . https://arxiv.org/abs/1409.1556 https://arxiv.org/abs/1409.1556 .

ITTI L , KOCH C . Computational modelling of visual attention [J ] . Nature Reviews Neuroscience , 2001 , 2 ( 3 ): 194 - 203 .

BUENO M B , NIETO X G , MARQUES F , et al . Hierarchical object detection with deep reinforcement learning [J ] . Deep Learning for Image Processing Applications , 2017 , 31 ( 164 ): 3 .

MATHE S , PIRINEN A , SMINCHISESCU C . Reinforcement learning for visual object detection [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 2894 - 2902 .

LI Y , HAN X C , GE L T , et al . A recurrent reinforcement learning approach for small object detection with dynamic refinement [C ] // 2021 International Joint Conference on Neural Networks (IJCNN) . Piscataway : IEEE , 2021 : 1 - 8 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

CS-ROMF:Efficient Community Search Based on Graph Combinatorial Optimization

Knowledge-Based and Data-Driven Integrating Design Methodology for Air Combat Strategy in Multi-Opponent Adversarial Game

Reinforcement Learning Based Tuning-free Plug-and-Play Image Reconstruction Method for Single Photon Imaging

Optimal Directed Control of Discrete Event Systems Based on Reinforcement Learning

Related Author

ZHANG An-ran

WANG Xing-fen

ZHAO Yu-han

LI Li-bo

FENG Jin-yuan

CHEN Min

LI Jun-ying

CHEN Jia-le

Related Institution

Beijing Information Science and Technology University

Hong Kong Baptist University, Hongkong

Institute of Automation, Chinese Academy of Sciences

School of Artificial Intelligence, University of Chinese Academy of Sciences

School of Computer Science, Nanjing University of Information Science & Technology

⁰