AI-DETR: Interpretable Object Detection Method Based on Adaptive Weighting

LU Yin-yuan; XU Sheng-quan; XIE Juan-ying

doi:10.12263/DZXB.20250038

您当前的位置：

首页 >

文章列表页 >

AI-DETR: Interpretable Object Detection Method Based on Adaptive Weighting

PAPERS | 更新时间：2025-12-10

- AI-DETR: Interpretable Object Detection Method Based on Adaptive Weighting
- ACTA ELECTRONICA SINICA Vol. 53, Issue 7, Pages: 2279-2304(2025)
- 作者机构：
  
  1.陕西师范大学计算机科学学院，陕西西安 710119
  2.陕西师范大学生命科学学院，陕西西安 710119
- 作者简介：
- 基金信息：
  
  National Natural Science Foundation of China(62076159)
- DOI：10.12263/DZXB.20250038
  CLC： TP391.4;
- Received：09 January 2025，
  
  Accepted：30 April 2025，
  
  Published：25 July 2025
- 稿件说明：
移动端阅览
鲁银圆, 许升全, 谢娟英. AI-DETR：自适应加权的可解释目标检测方法[J]. 电子学报, 2025, 53(07): 2279-2304.

LU Yin-yuan, XU Sheng-quan, XIE Juan-ying. AI-DETR: Interpretable Object Detection Method Based on Adaptive Weighting[J]. Acta Electronica Sinica, 2025, 53(07): 2279-2304.
鲁银圆, 许升全, 谢娟英. AI-DETR：自适应加权的可解释目标检测方法[J]. 电子学报, 2025, 53(07): 2279-2304. DOI：10.12263/DZXB.20250038

LU Yin-yuan, XU Sheng-quan, XIE Juan-ying. AI-DETR: Interpretable Object Detection Method Based on Adaptive Weighting[J]. Acta Electronica Sinica, 2025, 53(07): 2279-2304. DOI：10.12263/DZXB.20250038

摘要

检测变换器（DEtection TRansformer，DETR）是计算机视觉和多模态学习等领域的研究热点，但其解码器学习偏差存在层间传递，且不同层交叉注意力计算使用相同参考点、编码器输出特征语义模糊，严重影响模型性能.本文针对DETR的上述缺陷，以Conditional DETR为基线模型，将交叉注意力机制解耦为权重和值向量两部分，提出层间自适应注意力权重更新（Inter-layer Adaptive Attention Weight Refinement，IAAWR）方法，动态调节解码器不同层的交叉注意力权重，削弱学习偏差层间传递；提出值向量自适应增强（Adaptive Feature Enhancement，AFE）方法，采用分治思想改善编码器各层对目标局部区域的特征提取能力，显著增强输出特征的语义性；提出无参数迭代矫正预测框参考点（Iterative Reference Point Refinement，IRPR）方法，实现预测框参考点动态更新，增强回归预测的灵活性和精细度.融合以上三个创新点改进基线模型Conditional DETR，得到自适应的可解释目标检测变换器（Adaptive and Interpretable DETR，AI-DETR）.新模型AI-DETR仅增加了11个可学习参数，其平均精度（Average Precision，AP）指标在公开数据集MS-COCO（MicroSoft Common Objects in COntext）上比基线模型Conditional DETR提升1.8个百分点，在更具挑战性的野外环境下蝴蝶数据集Butterfly_2018和Butterfly_2023上分别提升1.3个百分点和0.8个百分点.通过定性、定量分析及结果可视化，详细阐述和论证了AI-DETR模型各创新点的具体贡献.

Abstract

Detection transformer (DETR) has been emerging as a hotspot in computer vision

multimodal learning and other fields. However

its performance is heavily affted by the learning feature bias transmission between decoder layers

and the same reference points used by the cross-attention of different decoder layers

and the semantic vagueness of the encoder output features. To address these deficiencies

this paper employs Conditional DETR as the baseline and decouples its cross-attention mechanism into weights and values

then proposes an inter-layer adaptive attention weight refinement (IAAWR)

with the aim of dynamically adjusting the cross-attention weights of different layers of the decoder

with a review to weakening the inter-layer transfer of learning bias. In addition

an adaptive feature enhancement (AFE) method is proposed utilizing divide and conquer idea

with the aim of improving the feature extraction capability of each layer of the encoder for the local region of the target

resulting in the enhancement of semantics in the output features. Furthermore

the strategy of parameter-free iterative reference point refinement (IRPR) is proposed to achieve dynamic update of the reference points of the prediction box

enhancing the flexibility and fineness of regression prediction.These three innovations have been integrated into the baseline model Conditional DETR

resulting in an adaptive and interpretable DETR model referred to adaptive and interpretable DETR (AI-DETR).This AI-DETR defeats the Conditional DETR in terms of average precision (AP) on the publicly available dataset microsoft common objects in context (MS-COCO) with 1.8 percentage points and on the very challenging real-world datasets Butterfly_2018 and Butterfly_2023 datasets with 1.3 and 0.8 percent points

respectively. The qualitative and quantitative analyses

in conjunction with visualisations of the results

elucidate and validate the individual contribution of each innovation within the AI-DETR.

关键词

Keywords

references

GIRSHICK R , DONAHUE J , DARRELL T , et al . Rich feature hierarchies for accurate object detection and semantic segmentation [C ] // 2014 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2014 : 580 - 587 .

REN S Q , HE K M , GIRSHICK R , et al . Faster R-CNN: Towards real-time object detection with region proposal networks [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017 , 39 ( 6 ): 1137 - 1149 .

TIAN Z , SHEN C H , CHEN H , et al . FCOS: Fully convolutional one-stage object detection [C ] // 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2019 : 9626 - 9635 .

SUN P Z , ZHANG R F , JIANG Y , et al . Sparse R-CNN: An end-to-end framework for object detection [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2023 , 45 ( 12 ): 15650 - 15664 .

CARION N , MASSA F , SYNNAEVE G , et al . End-to-end object detection with transformers [M ] // Computer Vision-ECCV 2020 . Cham : Springer International Publishing , 2020 : 213 - 229 .

VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [C ] // Neural Information Processing Systems(NIPs) . Long Beach : MIT Press , 2017 : 5998 - 6008 .

ZHU X , SU W , LU L , et al . Deformable DETR: Deformable transformers for end-to-end object detection [C ] // International Conference on Learning Representations(ICLR) . Virtual Event : OpenReview.net , 2021 : 1 - 16 .

MENG D P , CHEN X K , FAN Z J , et al . Conditional DETR for fast training convergence [C ] // 2021 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2021 : 3631 - 3640 .

LIU S L , LI F , ZHANG H , et al . DAB-DETR: Dynamic anchor boxes are better queries for DETR [C ] // International Conference on Learning Representations(ICLR) . Virtual Event : OpenReview.net , 2022 : 1 - 20 .

ZHANG H , LI F , LIU S L , et al . DINO: DETR with improved denoising anchor boxes for end-to-end object detection [EB/OL ] . ( 2022-07-11 )[ 2024-11-18 ] . https://arxiv.org/abs/2203.03605v4 https://arxiv.org/abs/2203.03605v4 .

LI F , ZHANG H , LIU S L , et al . DN-DETR: Accelerate DETR training by introducing query DeNoising [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2024 , 46 ( 4 ): 2239 - 2251 .

LIN T Y , MAIRE M , BELONGIE S , et al . Microsoft COCO: Common objects in context [M ] // Computer Vision-ECCV 2014 . Cham : Springer International Publishing , 2014 : 740 - 755 .

ZHANG M Y , SONG G L , LIU Y , et al . Decoupled DETR: Spatially disentangling localization and classification for improved end-to-end object detection [C ] // 2023 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2023 : 6578 - 6587 .

JIA D , YUAN Y H , HE H D , et al . DETRs with hybrid matching [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 19702 - 19712 .

ZONG Z F , SONG G L , LIU Y . DETRs with collaborative hybrid assignments training [C ] // 2023 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2023 : 6725 - 6735 .

谢娟英 , 侯琦 , 史颖欢 , 等 . 蝴蝶种类自动识别研究 [J ] . 计算机研究与发展 , 2018 , 55 ( 8 ): 1609 - 1618 .

XIE J Y , HOU Q , SHI Y H , et al . The automatic identification of butterfly species [J ] . Journal of Computer Research and Development , 2018 , 55 ( 8 ): 1609 - 1618 . (in Chinese)

谢娟英 , 曹嘉文 , 马丽滨 , 等 . 蝴蝶物种自动识别研究的生态照片数据集 [J ] . 中国科学数据 , 2019 , 4 ( 3 ): 193 - 198 .

XIE J Y , CAO J W , MA L B , et al . A dataset of butterfly ecological images for automatic species identification [J ] . China Scientific Data , 2019 , 4 ( 3 ): 193 - 198 . (in Chinese)

李策 , 张栋 , 杜少毅 , 等 . 一种迁移学习和可变形卷积深度学习的蝴蝶检测算法 [J ] . 自动化学报 , 2019 , 45 ( 9 ): 1772 - 1782 .

LI C , ZHANG D , DU S Y , et al . A butterfly detection algorithm based on transfer learning and deformable convolution deep learning [J ] . Acta Automatica Sinica , 2019 , 45 ( 9 ): 1772 - 1782 . (in Chinese)

XIE J Y , LU Y Y , WU Z Z , et al . Investigations of butterfly species identification from images in natural environments [J ] . International Journal of Machine Learning and Cybernetics , 2021 , 12 ( 8 ): 2431 - 2442 .

LIN T Y , GOYAL P , GIRSHICK R , et al . Focal loss for dense object detection [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2020 , 42 ( 2 ): 318 - 327 .

谢娟英 , 鲁银圆 , 孔维轩 , 等 . 基于改进RetinaNet的自然环境中蝴蝶种类识别 [J ] . 计算机研究与发展 , 2021 , 58 ( 8 ): 1686 - 1704 .

XIE J Y , LU Y Y , KONG W X , et al . Butterfly species identification from natural environment based on improved RetinaNet [J ] . Journal of Computer Research and Development , 2021 , 58 ( 8 ): 1686 - 1704 . (in Chinese)

XU C D , CAI R J , XIE Y H , et al . Fine-grained butterfly recognition via peer learning network with distribution-aware penalty mechanism [J ] . Animals , 2022 , 12 ( 20 ): 2884 .

XIE J Y , KONG W X , LU Y Y , et al . KSRFB-net: Detecting and identifying butterflies in ecological images based on human visual mechanism [J ] . International Journal of Machine Learning and Cybernetics , 2022 , 13 ( 10 ): 3143 - 3158 .

KONG W X , YANG M J , ZHANG J Y , et al . MRFB-net for identifying butterfly species via images taken in the field environments [C ] // 2023 International Conference on Machine Learning and Cybernetics (ICMLC) . Piscataway : IEEE , 2023 : 260 - 267 .

ZHANG T , WAQAS M , FANG Y , et al . Weakly-supervised butterfly detection based on saliency map [J ] . Pattern Recognition , 2023 , 138 : 109313 .

赵戈伟 , 许升全 , 谢娟英 . DL-MAML: 一种新的蝴蝶物种自动识别模型 [J ] . 计算机研究与发展 , 2024 , 61 ( 3 ): 674 - 684 .

ZHAO G W , XU S Q , XIE J Y . DL-MAML: An innovative model for automatically identifying butterfly species [J ] . Journal of Computer Research and Development , 2024 , 61 ( 3 ): 674 - 684 . (in Chinese)

LIN T Y , DOLLÁR P , GIRSHICK R , et al . Feature pyramid networks for object detection [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 936 - 944 .

HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 770 - 778 .

GU A , DAO T . Mamba: Linear-time sequence modeling with selective state spaces [EB/OL ] . ( 2024-05-31 )[ 2024-11-18 ] . https://arxiv.org/abs/2312.00752v2 https://arxiv.org/abs/2312.00752v2 .

ZHU L H , LIAO B C , ZHANG Q , et al . Vision mamba: Efficient visual representation learning with bidirectional state space model [C ] // International Conference on Machine Learning (ICML) . Vienna : OpenReview.net , 2024 : 1 - 14 .

WANG Y M , ZHANG X Y , YANG T , et al . Anchor DETR: Query design for transformer-based detector [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2022 , 36 ( 3 ): 2567 - 2575 .

GAO P , ZHENG M H , WANG X G , et al . Fast convergence of DETR with spatially modulated co-attention [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2021 : 3601 - 3610 .

SUN Z Q , CAO S C , YANG Y M , et al . Rethinking transformer-based set prediction for object detection [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2021 : 3591 - 3600 .

ZHAO Y A , LV W Y , XU S L , et al . DETRs beat YOLOs on real-time object detection [C ] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2024 : 16965 - 16974 .

ZHU C C , HE Y H , SAVVIDES M . Feature selective anchor-free module for single-shot object detection [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . IEEE , 2019 : 840 - 849 .

ZHANG S F , CHI C , YAO Y Q , et al . Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 9759 - 9768 .

ZHOU X Y , WANG D Q , KRÄHENBÜHL P . Objects as points [EB/OL ] . ( 2019-04-25 )[ 2025-03-17 ] . https://arxiv.org/abs/1904.07850v2 https://arxiv.org/abs/1904.07850v2 .

KONG T , SUN F C , LIU H P , et al . FoveaBox: Beyound anchor-based object detection [J ] . IEEE Transactions on Image Processing , 2020 , 29 : 7389 - 7398 .

ZHU B J , WANG J F , JIANG Z K , et al . AutoAssign: Differentiable label assignment for dense object detection [EB/OL ] . ( 2020-11-25 )[ 2025-03-17 ] . https://arxiv.org/abs/2007.03496v3 https://arxiv.org/abs/2007.03496v3 .

KIM K , LEE H S . Probabilistic anchor assignment with IoU prediction for object detection [M ] // Computer Vision-ECCV 2020 . Cham : Springer International Publishing , 2020 : 355 - 371 .

YANG Z , LIU S H , HU H , et al . RepPoints: Point set representation for object detection [C ] // 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscatawayy : IEEE , 2019 : 9657 - 9666 .

CHEN S F , SUN P Z , SONG Y B , et al . DiffusionDet: Diffusion model for object detection [C ] // 2023 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2023 : 19773 - 19786 .

GEIGER A , LENZ P , URTASUN R . Are we ready for autonomous driving? The KITTI vision benchmark suite [C ] // 2012 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2012 : 3354 - 3361 .

HAROON M , SHAHZAD M , FRAZ M M . Multisized object detection using spaceborne optical imagery [J ] . IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , 2020 , 13 : 3032 - 3046 .

CIAGLIA F , ZUPPICHINI F S , GUERRIE P , et al . Roboflow 100: A rich, multi-domain object detection benchmark [EB/OL ] . ( 2022-12-30 )[ 2025-03-17 ] . https://arxiv.org/abs/2211.13523v3 https://arxiv.org/abs/2211.13523v3 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

A Guided Diffusion-based Approach to Natural Adversarial Patch Generation

A Survey of Generic Object Detection Methods Based on Deep Learning

An Infrared Polarization Based UAV Detection Method for Complex Environment

Related Author

HE Kun

SHE Ji-si

ZHANG Zi-jun

CHEN Jing

WANG Xin-xin

DU Rui-ying

CHENG Xu

SONG Chen

Related Institution

School of Cyber Science and Engineering， Wuhan University

Key Laboratory of Aerospace Information Security and Trusted Computing Ministry of Education， Wuhan University

Rizhao Institute of Information Technology， Wuhan University

Collaborative Innovation Center of Geospatial Technology

School of Computer and Software， Nanjing University of Information Science and Technology

⁰