Towards Incremental Object Detection via Hierarchical Proposal and Decoupled Supervision

LIANG Jia-wei; LIANG Si-yuan; CHEN Ruo-yu; LIU Kuan-rong; HUANG Jian-jie; CAO Xiao-chun

doi:10.12263/DZXB.20250673

您当前的位置：

首页 >

文章列表页 >

Towards Incremental Object Detection via Hierarchical Proposal and Decoupled Supervision

PAPERS | 更新时间：2026-04-24

- Towards Incremental Object Detection via Hierarchical Proposal and Decoupled Supervision
- ACTA ELECTRONICA SINICA Vol. 53, Issue 12, Pages: 4494-4506(2025)
- 作者机构：
  
  1.中山大学网络空间安全学院，广东深圳 518102
  2.南洋理工大学计算机与数据科学学院，新加坡 639798
  3.中国科学院信息工程研究所，北京 100093
  4.中国科学院大学网络空间安全学院，北京 100049
- 作者简介：
- 基金信息：
  
  National Natural Science Foundation of China(62025604)
- DOI：10.12263/DZXB.20250673
  CLC： TP391.4;
- Received：01 August 2025，
  
  Accepted：22 December 2025，
  
  Published：25 December 2025
- 稿件说明：
移动端阅览
梁嘉伟, 梁思源, 陈若愚, 等. 基于分层提议与解耦监督的目标检测增量学习[J]. 电子学报, 2025, 53(12): 4494-4506.

LIANG Jia-wei, LIANG Si-yuan, CHEN Ruo-yu, et al. Towards Incremental Object Detection via Hierarchical Proposal and Decoupled Supervision[J]. Acta Electronica Sinica, 2025, 53(12): 4494-4506.
梁嘉伟, 梁思源, 陈若愚, 等. 基于分层提议与解耦监督的目标检测增量学习[J]. 电子学报, 2025, 53(12): 4494-4506. DOI：10.12263/DZXB.20250673

LIANG Jia-wei, LIANG Si-yuan, CHEN Ruo-yu, et al. Towards Incremental Object Detection via Hierarchical Proposal and Decoupled Supervision[J]. Acta Electronica Sinica, 2025, 53(12): 4494-4506. DOI：10.12263/DZXB.20250673

摘要

增量目标检测（Incremental Object Detection，IOD）旨在使模型从流式数据中持续学习新类别的识别与精确定位能力，同时有效保持对已学旧类别的检测性能.然而，当前主流目标检测器在增量训练过程中易遭遇灾难性遗忘问题：当仅利用新类别标注数据微调时，其对旧类别的检测性能显著退化.现有方法多依赖知识蒸馏或样本重放策略以缓解遗忘，但普遍忽视增量训练中的两个关键挑战：一是区域提议生成中的标签分配冲突，二是基于有限旧样本的硬标签监督所引发的过拟合风险.本文指出，现有方法在区域提议生成阶段采用不一致的标签分配策略：新类别与背景提议依据其与真实标注的交并比（Intersection over Union，IoU）匹配生成，而旧类别提议则依赖旧模型置信度进行推断.当两类提议在空间上重叠时，同一候选区域可能被赋予矛盾标签，导致分类与回归任务接收到冲突监督信号，干扰模型有效训练.此外，即使引入少量回放旧样本，若对其施加硬标签监督，模型仍易在小规模子集上过拟合，难以复现其在原始大规模旧数据集上获得的泛化能力，反而削弱旧知识保留效果.为此，本文提出一种面向增量目标检测的解耦学习框架.首先设计分层解耦的候选区域分配机制，依据“新类别→旧类别→背景类”提议的优先级顺序，对重叠区域进行互斥筛选，从源头消除标签冲突.进而引入双路径解耦监督策略：对新类别及背景区域采用真实标注训练，其中背景区域基于无偏背景定义进行监督；而对所有旧类别区域，无论是否在回放样本中显式标注，均仅通过知识蒸馏施加软监督，使其预测分布与旧模型输出对齐，避免硬标签诱导的局部过拟合，协同保障整个检测器训练过程的监督一致性与学习稳定性.在Pascal VOC与MS COCO标准基准上的实验表明，所提方法在单步及多步增量设置下均优于当前最先进（State-Of-The-Art，SOTA）方法，尤其在多步增量场景中，本文方法在平均精度（mean Average Precision，mAP）上分别提升2.0%和2.9%以上，有效验证了其在协同保留旧知识与学习新任务方面的优越性.本工作不仅提升了增量目标检测的持续学习能力，也揭示了区域提议生成与监督策略协同设计在缓解灾难性遗忘中的关键作用.

Abstract

Incremental object detection (IOD) aims to enable models to continuously learn the recognition and localization of new categories from streaming data

while effectively maintaining detection performance on previously learned old classes.However

current mainstream object detectors often suffer from catastrophic forgetting during incremental training: their performance on old classes degrades significantly when fine-tuned only with labeled data from new classes.Existing methods mostly rely on knowledge distillation or exemplar replay strategies to mitigate forgetting

but generally overlook two critical challenges: first

label assignment conflicts in region proposal generation

and second

the overfitting risk induced by hard-label supervision on limited old samples.This paper points out that existing methods adopt inconsistent label assignment strategies in the proposal generation stage: new category and background proposals are matched based on the intersection over union (IoU) with ground truth

whereas old category proposals rely on inferences from the old model.When these two types of proposals overlap spatially

the same candidate region may be assigned contradictory labels

leading to conflicting supervision signals for classification and regression tasks and interfering with effective training.Furthermore

even with a few replayed old samples

applying hard-label supervision makes the model prone to overfitting on small subsets

making it difficult to reproduce the generalization ability gained from the original large-scale datasets

which in turn weakens old knowledge preservation.To address these issues

we propose a decoupled learning framework for incremental object detection.First

a hierarchically decoupled region proposal assignment mechanism is designed to perform mutually exclusive screening of overlapping regions according to a priority order of “new categories → old categories → background”

eliminating label conflicts.Subsequently

a dual-path decoupled supervision strategy is introduced: new categories and background regions are trained with ground-truth annotations (using an unbiased background definition)

while all old category regions

regardless of whether they are explicitly labeled in replayed samples

are supervised solely through knowledge distillation to align their prediction distributions with the old model’s outputs.This avoids local overfitting induced by hard labels and ensures supervision consistency and learning stability throughout the training process.Experiments on Pascal VOC and MS COCO benchmarks demonstrate that the proposed method outperforms state-of-the-art (SOTA) methods in both single-step and multi-step incremental settings.Notably

in multi-step scenarios

our method improves the mean average precision (mAP) by over 2.0% and 2.9% respectively

validating its superiority in synergistically preserving old knowledge and learning new tasks.This work not only enhances the continual learning capability of IOD but also reveals the critical role of the collaborative design of proposal generation and supervision strategies in mitigating catastrophic forgetting.

关键词

Keywords

references

秦嘉奇 , 江泽涛 , 雷晓春 . 基于ICFIE-YOLO的低照度图像目标检测方法 [J ] . 电子学报 , 2025 , 53 ( 2 ): 514 - 526 .

QIN J Q , JIANG Z T , LEI X C . Low illumination image object detection method based on ICFIE-YOLO [J ] . Acta Electronica Sinica , 2025 , 53 ( 2 ): 514 - 526 . (in Chinese)

刘文犀 , 张家榜 , 李悦洲 , 等 . 基于边界特征融合和前景引导的伪装目标检测 [J ] . 电子学报 , 2024 , 52 ( 7 ): 2279 - 2290 .

LIU W X , ZHANG J B , LI Y Z , et al . Boundary feature fusion and foreground guidance for camouflaged object detection [J ] . Acta Electronica Sinica , 2024 , 52 ( 7 ): 2279 - 2290 . (in Chinese)

CARION N , MASSA F , SYNNAEVE G , et al . End-to-end object detection with transformers [C ] // Computer Vision - ECCV 2020 . Cham : Springer , 2020 : 213 - 229 .

GIRSHICK R . Fast R-CNN [C ] // 2015 IEEE International Conference on Computer Vision . Piscataway : IEEE , 2016 : 1440 - 1448 .

REN S Q , HE K M , GIRSHICK R , et al . Faster R-CNN: Towards real-time object detection with region proposal networks [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017 , 39 ( 6 ): 1137 - 1149 .

HE K M , GKIOXARI G , DOLLÁR P , et al . Mask R-CNN [C ] // 2017 IEEE International Conference on Computer Vision . Piscataway : IEEE , 2017 : 2980 - 2988 .

MASANA M , LIU X L , TWARDOWSKI B , et al . Class-incremental learning: Survey and performance evaluation on image classification [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2023 , 45 ( 5 ): 5513 - 5533 .

MCCLOSKEY M , COHEN N J . Catastrophic interference in connectionist networks: The sequential learning problem [M ] // Psychology of Learning and Motivation . Amsterdam : Elsevier , 1989 : 109 - 165 .

MENEZES A G , DE MOURA G , ALVES C , et al . Continual object detection: A review of definitions, strategies, and challenges [J ] . Neural Networks , 2023 , 161 : 476 - 493 .

SHMELKOV K , SCHMID C , ALAHARI K . Incremental learning of object detectors without catastrophic forgetting [C ] // 2017 IEEE International Conference on Computer Vision . Piscataway : IEEE , 2017 : 3420 - 3429 .

CERMELLI F , MANCINI M , ROTA BULÒ S , et al . Modeling the background for incremental learning in semantic segmentation [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2020 : 9230 - 9239 .

刘颖 , 薛家昊 , 张伟东 , 等 . 基于坐标重要性池化和解耦类别对齐蒸馏的图像分类算法 [J ] . 电子学报 , 2025 , 53 ( 3 ): 962 - 973 .

LIU Y , XUE J H , ZHANG W D , et al . Image classification algorithm based on coordinate importance pooling and decoupled class alignment distillation [J ] . Acta Electronica Sinica , 2025 , 53 ( 3 ): 962 - 973 . (in Chinese)

ZOU Z X , CHEN K Y , SHI Z W , et al . Object detection in 20 years: A survey [J ] . Proceedings of the IEEE , 2023 , 111 ( 3 ): 257 - 276 .

REDMON J , FARHADI A . YOLOv3: An incremental improvement [EB/OL ] .( 2018-04-08 )[ 2025-07-10 ] . https://arxiv.org/abs/1804.02767 https://arxiv.org/abs/1804.02767 .

LIU W , ANGUELOV D , ERHAN D , et al . SSD: Single shot MultiBox detector [C ] // Computer Vision - ECCV 2016 . Cham : Springer , 2016 : 21 - 37 .

DUAN K W , BAI S , XIE L X , et al . CenterNet: Keypoint triplets for object detection [C ] // 2019 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2020 : 6568 - 6577 .

ZHU X Z , SU W J , LU L W , et al . Deformable DETR: Deformable transformers for end-to-end object detection [EB/OL ] . ( 2021-03-18 )[ 2025-12-12 ] . https://arxiv.org/abs/2010.04159 https://arxiv.org/abs/2010.04159 .

ZHOU D W , WANG Q W , QI Z H , et al . Class-incremental learning: A survey [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2024 , 46 ( 12 ): 9851 - 9873 .

CASTRO F M , MARÍN-JIMÉNEZ M J , GUIL N , et al . End-to-end incremental learning [M ] // Computer Vision - ECCV 2018 . Cham : Springer International Publishing , 2018 : 241 - 257 .

LI M X , CONG Y , LIU Y Y , et al . Class-incremental gesture recognition learning with out-of-distribution detection [C ] // 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems . Piscataway : IEEE , 2022 : 1503 - 1508 .

YOON J , YANG E , LEE J , et al . Lifelong learning with dynamically expandable networks [EB/OL ] . ( 2018-06-11 )[ 2025-12-12 ] . https://arxiv.org/abs/1708.01547 https://arxiv.org/abs/1708.01547 .

LIU Y Y , CONG Y , SUN G , et al . L3DOC: Lifelong 3D object classification [J ] . IEEE Transactions on Image Processing , 2021 , 30 : 7486 - 7498 .

WANG F Y , ZHOU D W , YE H J , et al . FOSTER: Feature boosting and compression for class-incremental learning [C ] // Computer Vision - ECCV 2022 . Cham : Springer , 2022 : 398 - 414 .

LEE J , HONG H G , JOO D , et al . Continual learning with extended kronecker-factored approximate curvature [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2020 : 8998 - 9007 .

KANG M , PARK J , HAN B . Class-incremental learning by knowledge distillation with adaptive feature consolidation [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 16050 - 16059 .

KIRKPATRICK J , PASCANU R , RABINOWITZ N , et al . Overcoming catastrophic forgetting in neural networks [J ] . Proceedings of the National Academy of Sciences of the United States of America , 2017 , 114 ( 13 ): 3521 - 3526 .

HINTON G , VINYALS O , DEAN J . Distilling the knowledge in a neural network [EB/OL ] . ( 2015-03-09 )[ 2025-07-10 ] . https://arXiv.org/abs/1503.02531 https://arXiv.org/abs/1503.02531 .

PENG C , ZHAO K , LOVELL B C . Faster ILOD: Incremental learning for object detectors based on faster RCNN [J ] . Pattern Recognition Letters , 2020 , 140 : 109 - 115 .

YANG D B , ZHOU Y , ZHANG A T , et al . Multi-view correlation distillation for incremental object detection [J ] . Pattern Recognition , 2022 , 131 : 108863 .

ZHOU W , CHANG S Y , SOSA N , et al . Lifelong object detection [EB/OL ] .( 2020-09-02 )[ 2025-07-10 ] . https://arXiv.org/abs/2009.01129 https://arXiv.org/abs/2009.01129 .

FENG T , WANG M , YUAN H J . Overcoming catastrophic forgetting in incremental object detection via elastic response distillation [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 9417 - 9426 .

CERMELLI F , GERACI A , FONTANEL D , et al . Modeling missing annotations for incremental learning in object detection [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops . Piscataway : IEEE , 2022 : 3699 - 3709 .

JOSEPH K J , KHAN S , KHAN F S , et al . Towards open world object detection [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2021 : 5826 - 5836 .

GUPTA A , NARAYAN S , JOSEPH K J , et al . OW-DETR: Open-world detection transformer [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 9225 - 9234 .

JOSEPH K J , RAJASEGARAN J , KHAN S , et al . Incremental object detection via meta-learning [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022 , 44 ( 12 ): 9209 - 9216 .

LIU Y Y , CONG Y , GOSWAMI D , et al . Augmented box replay: Overcoming foreground shift for incremental object detection [C ] // 2023 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2024 : 11333 - 11343 .

EVERINGHAM M , VAN GOOL L , WILLIAMS C K I , et al . The pascal visual object classes (VOC) challenge [J ] . International Journal of Computer Vision , 2010 , 88 ( 2 ): 303 - 338 .

LIN T Y , MAIRE M , BELONGIE S , et al . Microsoft COCO: Common objects in context [C ] // Computer Vision - ECCV 2014 . Cham : Springer , 2014 : 740 - 755 .

CHEN R Y , ZHANG H , LI J Z , et al . Generalized semantic contrastive learning via embedding side information for few-shot object detection [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2025 , 47 ( 8 ): 6496 - 6514 .

CHEN R Y , LIANG S Y , LI J Z , et al . Interpreting object-level foundation models via visual precision search [C ] // 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2025 : 30042 - 30052 .

DENG J , DONG W , SOCHER R , et al . ImageNet: A large-scale hierarchical image database [C ] // 2009 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2009 : 248 - 255 .

HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2016 : 770 - 778 .

DONG N , ZHANG Y Q , DING M L , et al . Towards non co-occurrence incremental object detection with unlabeled in-the-wild data [J ] . International Journal of Computer Vision , 2024 , 132 ( 11 ): 5066 - 5083 .

MO Q J , GAO Y P , FU S H , et al . Bridge past and Future: Overcoming information asymmetry in Incremental object detection [C ] // Computer Vision - ECCV 2024 . Cham : Springer , 2025 : 463 - 480 .

陈立 , 张帆 , 郭威 , 等 . 基于级联式逆残差网络的遥感图像轻量目标检测算法 [J ] . 电子学报 , 2023 , 51 ( 9 ): 2588 - 2597 .

CHEN L , ZHANG F , GUO W , et al . Cascaded inverse residual network for lightweight object detection model in remote sensing image [J ] . Acta Electronica Sinica , 2023 , 51 ( 9 ): 2588 - 2597 . (in Chinese)

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Open World Object Detection Based on Causal Prompt Distillation

Dynamic Neural Network for Incremental Learning with Task Extended: Research Progress and Prospect

Related Author

ZHAO Jia-qi

WANG Ping-an

ZHOU Yong

DU Wen-liang

YAO Rui

LIU Bing

ZHAO Hai-yan

MA Quan-yi

Related Institution

School of Computer Science and Technology, China University of Mining and Technology

Mine Digitization Engineering Research Center of the Ministry of Education

School of Optical-Electrical & Computer Engineering， University of Shanghai for Science and Technology

Department of Computer Science and Technology， Shanghai Jiaotong University

Department of Electronic Countermeasures, College of Electronic Engineering, National University of Defense Technology

⁰