Open World Object Detection Based on Causal Prompt Distillation

ZHAO Jia-qi; WANG Ping-an; ZHOU Yong; DU Wen-liang; YAO Rui; LIU Bing

doi:10.12263/DZXB.20250211

您当前的位置：

首页 >

文章列表页 >

Open World Object Detection Based on Causal Prompt Distillation

PAPERS | 更新时间：2025-10-16

- Open World Object Detection Based on Causal Prompt Distillation
  增强出版
- ACTA ELECTRONICA SINICA Vol. 53, Issue 6, Pages: 2079-2089(2025)
- 作者机构：
  
  1.中国矿业大学计算机科学与技术学院，江苏徐州 221116
  2.矿山数字化教育部工程研究中心，江苏徐州 221116
- 作者简介：
- 基金信息：
  
  National Natural Science Foundation of China(62272461;62172417;62276266;62277046);Double First-Class Project of China University of Mining and Technology for Independent Innovation and Social Service(2022ZZCX06);Six Talent Peaks Project in Jiangsu Province(2015-DZXX-010;2018-XYDXX-044)
- DOI：10.12263/DZXB.20250211
  CLC： TP391;
- Received：23 March 2025，
  
  Revised：2025-05-07，
  
  Published：25 June 2025
- 稿件说明：
移动端阅览
赵佳琦, 王平安, 周勇, 等. 基于因果提示蒸馏的开放世界目标检测[J]. 电子学报, 2025, 53(06): 2079-2089.

ZHAO Jia-qi, WANG Ping-an, ZHOU Yong, et al. Open World Object Detection Based on Causal Prompt Distillation[J]. Acta Electronica Sinica, 2025, 53(06): 2079-2089.
赵佳琦, 王平安, 周勇, 等. 基于因果提示蒸馏的开放世界目标检测[J]. 电子学报, 2025, 53(06): 2079-2089. DOI：10.12263/DZXB.20250211

ZHAO Jia-qi, WANG Ping-an, ZHOU Yong, et al. Open World Object Detection Based on Causal Prompt Distillation[J]. Acta Electronica Sinica, 2025, 53(06): 2079-2089. DOI：10.12263/DZXB.20250211

摘要

开放世界目标检测旨在在动态环境中同时识别已知与未知类别，并在收到未知类别的标签后逐步实现对新增类别的识别能力.然而，现有方法因缺乏未知类别的语义表征能力，已知与未知类别间的指导信息相互耦合，导致检测性能受限.为此，本文提出一种基于因果提示蒸馏的开放世界目标检测方法.该方法创新性地将视觉-语言模型与因果推理结合，以解决开放场景中的类别间存在的语义偏差问题.具体而言，本文通过构建结构因果模型，从因果视角揭示了已知类别与未知类别间的语义干扰路径；接着提出了因果提示学习，通过生成未知类别的语义向量，显式引入开放场景的语义先验以增强模型对未知目标的感知能力；最后针对知识传递中的语义偏差问题提出因果蒸馏机制，利用双重蒸馏损失解耦教师模型对已知类别与未知类别的指导信息.实验结果表明，该方法在多个数据集上取得了良好效果，已知类别的平均检测精度（mAP）提升了1.3%，未知类别的召回率（U-Recall）提升了6.5%，这些结果验证了本文方法的有效性.

Abstract

Open world object detection aims to simultaneously identify both known and unknown categories in dynamic environments

while enabling incremental learning of new categories. However

due to the lack of semantic representation ability of unknown categories

the guidance information between known and unknown categories is mutually coupled

resulting in limited detection performance. To solve this problem

this paper proposes an open world object detection based on causal prompt distillation

which innovatively combines visual-language model with causal inference to solve the problem of semantic bias between categories in open scenes. Specifically

by constructing a structural causal model

this paper reveals the semantic interference path between known and unknown categories from the perspective of causality. Then

causal prompt learning is proposed

which explicitly introduces the semantic prior of the open scene by generating semantic vectors of unknown categories to enhance the model’s perception of unknown objects. Finally

in order to solve the problem of semantic bias in knowledge transfer

a causal distillation mechanism is proposed

and the guidance information of the known and unknown categories is decoupled by the double distillation loss decoupling teacher model. Experimental results demonstrate that this method has achieved good effects on multiple datasets

with an improvement in mean average precision (mAP) for known categories by 1.3% and a rise in recall rate (U-Recall) for unknown categories by 6.5%. These results validate the effectiveness and robustness of the proposed approach.

关键词

Keywords

references

VOULODIMOS A , DOULAMIS N , DOULAMIS A , et al . Deep learning for computer vision: A brief review [J ] . Computational Intelligence and Neuroscience , 2018 , 2018( 1 ): 1 - 13 .

ZHAO J Q , DING Z Y , ZHOU Y , et al . OrientedFormer: An end-to-end transformer-based oriented object detector in remote sensing images [J ] . IEEE Transactions on Geoscience and Remote Sensing , 2024 , 62 : 5640816 .

YANG S , LU H M , LI J R . Multifeature fusion-based object detection for intelligent transportation systems [J ] . IEEE Transactions on Intelligent Transportation Systems , 2023 , 24 ( 1 ): 1126 - 1133 .

ZHAO J Q , WANG H Z , ZHOU Y , et al . Spatial-channel enhanced tr-ansformer for visible-infrared person re-identification [J ] . IEEE Transactions on Multimedia , 2023 , 25 : 3668 - 3680 .

MULLAPPILLY S S , GEHLOT A S , ANWER R M , et al . Semi-supervised open-world object detection [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2024 , 38 ( 5 ): 4305 - 4314 .

WU Y , CHEN Y P , WANG L J , et al . Large scale incremental learning [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 374 - 382 .

WANG X , CHEN G Y , QIAN G W , et al . Large-scale multi-modal pre-trained models: A comprehensive survey [J ] . Machine Intelligence Research , 2023 , 20 ( 4 ): 447 - 482 .

RADFORD A , KIM J W , HALLACY C , et al . Learning transferable visual models from natural language supervision [C ] // Proceedings of the International Conference on Machine Learning . New York : ICML , 2021 : 8748 - 8763 .

LI J M , ZHANG Y N , QIANG W W , et al . Disentangle and remerge: Interventional knowledge distillation for few-shot object detection from a conditional causal perspective [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2023 , 37 ( 1 ): 1323 - 1333 .

ANTONELLI S , AVOLA D , CINQUE L , et al . Few-shot object detection: A survey [J ] . ACM Computing Surveys , 2022 , 54 ( 11 ): 1 - 37 .

GOU J P , YU B S , MAYBANK S J , et al . Knowledge distillation: A survey [J ] . International Journal of Computer Vision , 2021 , 129 ( 6 ): 1789 - 1819 .

ZHANG D , ZHANG H , TANG J , et al . Causal intervention for weakly-supervised semantic segmentation [J ] . Advances in Neural Information Processing Systems (NeurIPS) , 2020 , 33 : 655 - 666 .

ZENG Y , ZHANG X S , LI H , et al . X 2 -VLM: All-in-one pre-trained model for vision-language tasks [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2024 , 46 ( 5 ): 3156 - 3168 .

QIAN Q , HU J H . Online Zero-shot Classification with CLIP [M ] // Computer Vision - ECCV 2024 . Cham : Springer Nature Switzerland , 2024 : 462 - 477 .

WANG Y Q , YAO Q M , KWOK J T , et al . Generalizing from a few examples [J ] . ACM Computing Surveys , 2021 , 53 ( 3 ): 1 - 34 .

ZHAO J Q , FU A , ZHOU Y , et al . Fine-grained semantic oriented embedding set alignment for text-based person search [J ] . Image and Vision Computing , 2024 , 152 : 105309 .

SCHEIRER W J , DE REZENDE ROCHA A , SAPKOTA A , et al . Toward open set recognition [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2013 , 35 ( 7 ): 1757 - 1772 .

FENG G C , DESAI D , PASQUALI S , et al . Open set recognition for random forest [C ] // Proceedings of the 5th ACM International Conference on AI in Finance . New York : ACM , 2024 : 45 - 53 .

YANG Z , YUE J , GHAMISI P , et al . Open set recognition in real world [J ] . International Journal of Computer Vision , 2024 , 132 ( 8 ): 3208 - 3231 .

JOSEPH K J , KHAN S , KHAN F S , et al . Towards open world object detection [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 5826 - 5836 .

MA Y Q , LI H N , ZHANG Z G , et al . Annealing-based label-transfer learning for open world object detection [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 11454 - 11463 .

WANG Y H , YUE Z Q , HUA X S , et al . Random boxes are open-world object detectors [C ] // 2023 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2023 : 6210 - 6220 .

SUN Z C , LI J H , MU Y D . Exploring orthogonality in open world object detection [C ] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2024 : 17302 - 17312 .

JUDEA P . Causality: Models, Reasoning and Inference [M ] . New York : Cambridge University Press , 2000 .

邵志文 , 陈必宽 , 祝汉城 , 等 . 基于因果干预的无偏面部动作单元识别 [J ] . 电子学报 , 2024 , 52 ( 10 ): 3312 - 3321 .

SHAO Z W , CHEN B K , ZHU H C , et al . Causal intervention for unbiased facial action unit recognition [J ] . Acta Electronica Sinica , 2024 , 52 ( 10 ): 3312 - 3321 . (in Chinese)

KHATTAK M U , RASHEED H , MAAZ M , et al . MaPLe: Multi-modal prompt learning [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 19113 - 19122 .

NADKARNI P M , OHNO-MACHADO L , CHAPMAN W W . Natural language processing: An introduction [J ] . Journal of the American Medical Informatics Association , 2011 , 18 ( 5 ): 544 - 551 .

ZHOU K Y , YANG J K , LOY C C , et al . Learning to prompt for vision-language models [J ] . International Journal of Computer Vision , 2022 , 130 ( 9 ): 2337 - 2348 .

王楠井 , 刘阿建 , 梁凤梅 , 等 . 基于图像内容理解的判别性类别提示学习 [J ] . 电子学报 , 2025 , 53 ( 2 ): 493 - 502 .

WANG N J , LIU A J , LIANG F M , et al . Discriminative category prompt learning based on image content understanding [J ] . Acta Electronica Sinica , 2025 , 53 ( 2 ): 493 - 502 . (in Chinese)

HINTON G , VINYALS O , DEAN J . Distilling the knowledge in a neural network [EB/OL ] . ( 2015-05-09 )[ 2025-05-27 ] . https://arxiv.org/abs/1503.02531v1 https://arxiv.org/abs/1503.02531v1 .

YANG Z D , LI Z , JIANG X H , et al . Focal and global knowledge distillation for detectors [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 4633 - 4642 .

LIN T Y , MAIRE M , BELONGIE S , et al . Microsoft Coco: Common Objects in Context [M ] . Berlin : Springer International Publishing , 2014 .

EVERINGHAM M , VAN GOOL L , WILLIAMS C K I , et al . The pascal visual object classes (VOC) challenge [J ] . International Journal of Computer Vision , 2010 , 88 ( 2 ): 303 - 338 .

LOSHCHILOV I , HUTTER F . Decoupled weight decay regularization [EB/OL ] . ( 2019-01-04 )[ 2025-05-27 ] . https://arxiv.org/abs/1711.05101v3 https://arxiv.org/abs/1711.05101v3 .

YANG S , SUN P , JIANG Y , et al . Objects in semantic topology [C ] // International Conference on Learning Repre-sentation . Washington DC : ICLR , 2022 : 1 .

GUPTA A , NARAYAN S , JOSEPH K J , et al . OW-DETR: Open-world detection transformer [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 9225 - 9234 .

WU Z H , LU Y , CHEN X Y , et al . UC-OWOD: Unknown-classified open world object detection [C ] // Computer Vision - ECCV 2022 . Cham : Springer Nature Switzerland , 2022 : 193 - 210 .

ZOHAR O , WANG K C , YEUNG S . PROB: Probabilistic objectness for open world object detection [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 11444 - 11453 .

MA S L , WANG Y F , WEI Y , et al . CAT: LoCalization and IdentificAtion cascade detection transformer for open-world object detection [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 19681 - 19690 .

SHMELKOV K , SCHMID C , ALAHARI K . Incremental learning of object detectors without catastrophic forgetting [C ] // 2017 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2017 : 3420 - 3429 .

PENG C , ZHAO K , LOVELL B C . Faster ILOD: Incremental learning for object detectors based on faster RCNN [J ] . Pattern Recognition Letters , 2020 , 140 : 109 - 115 .

KANG M X , ZHANG J P , ZHANG J M , et al . Alleviating catastrophic forgetting of incremental object detection via within-class and between-class knowledge distillation [C ] // 2023 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2023 : 18848 - 18858 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

A Survey of Generic Object Detection Methods Based on Deep Learning

Survey of Object Detection Based on Deep Learning

Differentially Private with Sparse and Smooth Self-Distillation

Visual Object Tracking Algorithm Based on Adaptive Feature Selection

Related Author

CHENG Xu

SONG Chen

SHI Jin-gang

ZHOU Lin

ZHANG Yi-feng

ZHENG Yu-hui

LUO Hui-lan

CHEN Hong-kun

Related Institution

School of Computer and Software， Nanjing University of Information Science and Technology

School of Software Engineering， Xi’an Jiaotong University

School of Information Science and Engineering， Southeast University

School of Information Engineering, Jiangxi University of Science and Technology

School of Information, Renmin University of China

⁰