Cross-Modal Pedestrian Re-identification Pre-training Method Based on Catastrophic Forgetting and Combination Superimposed Erasure

SUN Rui; XIE Rui-rui; ZHANG Lei; ZHANG Xu-dong; GAO Jun

doi:10.12263/DZXB.20221190

您当前的位置：

首页 >

文章列表页 >

Cross-Modal Pedestrian Re-identification Pre-training Method Based on Catastrophic Forgetting and Combination Superimposed Erasure

PAPERS | 更新时间：2025-12-08

- Cross-Modal Pedestrian Re-identification Pre-training Method Based on Catastrophic Forgetting and Combination Superimposed Erasure
- ACTA ELECTRONICA SINICA Vol. 51, Issue 10, Pages: 2925-2935(2023)
- 作者机构：
  
  1.合肥工业大学计算机与信息学院，安徽合肥 230601
  2.工业安全与应急技术安徽省重点实验室，安徽合肥 230009
- 作者简介：
- 基金信息：
  
  National Natural Science Foundation of China(61876057);National Natural Science Foundation of Anhui(2208085MF158);Key Research and Development Plan of Anhui Province(202004d07020012)
- DOI：10.12263/DZXB.20221190
  CLC： TP391.4
- Received：20 October 2022，
  
  Revised：2023-07-20，
  
  Published：25 October 2023
- 稿件说明：
移动端阅览
孙锐,谢瑞瑞,张磊等.基于灾难性遗忘及组合叠加擦除的跨模态行人重识别预训练方法[J].电子学报,2023,51(10):2925-2935.

SUN Rui,XIE Rui-rui,ZHANG Lei,et al.Cross-Modal Pedestrian Re-identification Pre-training Method Based on Catastrophic Forgetting and Combination Superimposed Erasure[J].ACTA ELECTRONICA SINICA,2023,51(10):2925-2935.
孙锐,谢瑞瑞,张磊等.基于灾难性遗忘及组合叠加擦除的跨模态行人重识别预训练方法[J].电子学报,2023,51(10):2925-2935. DOI： 10.12263/DZXB.20221190.

SUN Rui,XIE Rui-rui,ZHANG Lei,et al.Cross-Modal Pedestrian Re-identification Pre-training Method Based on Catastrophic Forgetting and Combination Superimposed Erasure[J].ACTA ELECTRONICA SINICA,2023,51(10):2925-2935. DOI： 10.12263/DZXB.20221190.

摘要

面向构建24小时全时段视频监控系统的需要，基于可见光与近红外的跨模态行人重识别受到工业界与学术界的广泛关注.然而，目前大部分跨模态行人重识别任务都试图利用在ImageNet上预训练的模型来提前学习模态内共性特征，但ImageNet与跨模态行人数据模态差异较大，且预训练过程中将颜色信息作为判别特征之一，导致预训练中学习到的共性特征并不适用于无色彩红外图像的信息表示.本文提出了一种基于灾难性遗忘及组合叠加擦除的自监督跨模态行人重识别预训练方法，首先利用提出的灾难性遗忘评分来对预训练数据进行筛选，旨在减小预训练数据与后续任务数据存在的域间差距，进一步减少模型训练时间.其次，针对传统跨模态识别中的关键区分性特征提取，本文设计了一种强通道数据增强策略，通过对R、G、B三通道的通道级擦除与组合，生成了颜色迥异的多类型样本，有利于促使模型关注于纹理信息而非颜色信息.最后基于本文提出的跨模态数据筛选指标以及通道增强策略，构建了跨模态任务的自监督学习框架.实验结果表明，本文提出的预训练方法所训练的ResNet50网络在迁移到众多跨模态行人重识别方法时优于目前主流自监督预训练方法，其中在经典方法AGW的基础上Rank1与mAP分别提高了8.02%与5.81%.

Abstract

To meet the need of building a 24-hour full-time video surveillance system

cross-modal pedestrian recognition based on visible light and near-infrared is widely concerned by industry and academia. However

most of the current cross-modal pedestrian recognition tasks attempt to use pre-trained models on ImageNet to learn modal commonalities in advance

but there are large modal differences between ImageNet and cross-modal pedestrian data

in the pre-training process

the color information is taken as one of the distinguishing features

which leads to the common features learned in the pre-training is not suitable for the information representation of the colorless infrared image. This paper proposes a self-supervised cross-modal pedestrian recognition pre-training method based on catastrophic forgetting and combined superposition erasure. Firstly

the pre-training data are filtered by using the proposed catastrophic forgetting score

the aim is to reduce the domain gap between the pre-training data and the follow-up task data

and further reduce the training time of the model. Secondly

aiming at the key distinguishing feature extraction in traditional cross-modal identification

this paper designs a strong channel data enhancement strategy by erasing and combining the R

G and B channels at the channel level

multi-type samples with different colors are generated

which makes the model focus on texture information instead of color information. Finally

a self-supervised learning framework for cross-modal tasks is constructed based on the cross-modal data filtering index and channel enhancement strategy. The experimental results show that the ResNet50 network trained by the proposed pre-training method is superior to the current self-supervised pre-training methods when migrating to a large number of cross-modal pedestrian recognition methods

on the basis of AGW

Rank1 and mAP were increased by 8.02% and 5.81% respectively.

关键词

Keywords

references

WU A C , ZHENG W S , YU H X , et al . RGB-infrared cross-modality person re-identification [C]// 2017 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2017 : 5390 - 5399 .

YE M , SHEN J B , LIN G J , et al . Deep learning for person re-identification: A survey and outlook [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022 , 44 ( 6 ): 2872 - 2893 .

孙锐 , 赵争晖 , 杨梓 , 等 . 跨模态异构行人再识别的研究进展 [J]. 模式识别与人工智能 , 2020 , 33 ( 12 ): 1066 - 1082 .

SUN R , ZHAO Z H , YANG Z , et al . A survey on cross-modality heterogeneous person re-identification [J]. Pattern Recognition and Artificial Intelligence , 2020 , 33 ( 12 ): 1066 - 1082 . (in Chinese)

HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 770 - 778 .

DOSOVITSKIY A , BEYER L , KOLESNIKOV A , et al . An image is worth 16 × 16 words: Transformers for image recognition at scale[EB/OL]. ( 2021-10-22 )[ 2022-10-02 ]. https://arxiv.org/abs/2010.11929 https://arxiv.org/abs/2010.11929 .

FU C Y , HU Y B , WU X , et al . CM-NAS: Cross-modality neural architecture search for visible-infrared person re-identification [C]// 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2022 : 11803 - 11812 .

DENG J , DONG W , SOCHER R , et al . ImageNet: A large-scale hierarchical image database [C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2009 : 248 - 255 .

HE K M , GIRSHICK R , DOLLAR P . Rethinking ImageNet pre-training [C]// 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2020 : 4917 - 4926 .

ZHENG L , SHEN L Y , TIAN L , et al . Scalable person re-identification: A benchmark [C]// 2015 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2016 : 1116 - 1124 .

WEI L H , ZHANG S L , GAO W , et al . Person transfer GAN to bridge domain gap for person re-identification [C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 79 - 88 .

WANG G S , YUAN Y F , CHEN X , et al . Learning discriminative features with multiple granularities for person re-identification [C]// Proceedings of the 26th ACM International Conference on Multimedia . New York : ACM , 2018 : 274 - 282 .

MIKOLOV T , CHEN K , CORRADO G , et al . Efficient estimation of word representations in vector space [EB/OL]. ( 2013-06-16 )[ 2022-10-02 ]. https://arxiv.org/abs/1301.3781 https://arxiv.org/abs/1301.3781 .

HE K M , FAN H Q , WU Y X , et al . Momentum contrast for unsupervised visual representation learning [C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 9726 - 9735 .

CHEN X , FAN H , GIRSHICK R , et al . Improved basel-ines with momentum contrastive learning [EB/OL]. ( 2021-03-09 )[ 2022-10-02 ]. https://arxiv.org/abs/2003.04297 https://arxiv.org/abs/2003.04297 .

CHEN T , KORNBLITH S , NOROUZI M , et al . A simple framework for contrastive learning of visual representa-tions [C]// Proceedings of the International Conference on Machine Learning . Shangri-La : JMLR , 2020 : 1597 - 1607 .

GRILLl J B , STRUB F , ALTCHE F , et al . Bootstrap your own latent-a new approach to self-supervised learning [J]. Advances in Neural Information Processing Systems , 2020 , 33 : 21271 - 21284 .

FU D P , CHEN D D , BAO J M , et al . Unsupervised pre-training for person re-identification [C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 14745 - 14754 .

LUO H , WANG P , XU Y , et al . Self-supervised pre-tr-aining for transformer-based person re-identification [EB/OL]. ( 2021-11-23 )[ 2022-10-02 ]. https://arxiv.org/abs/2111.12084 https://arxiv.org/abs/2111.12084 .

ZHU K , GUO H , YAN T , et al . Part-aware self-supervised pre-training for person re-identification [EB/OL]. ( 2022-03-08 )[ 2022-10-02 ]. https://arxiv.org/abs/2203.03931 https://arxiv.org/abs/2203.03931 .

WAN L , JING Q , SUN Z , et al . Self-supervised modality-aware multiple granularity pre-training for rgb-infrared person re-identification [EB/OL]. ( 2021-12-12 )[ 2022-10-02 ]. https://arxiv.org/abs/2112.06147 https://arxiv.org/abs/2112.06147 .

RAMASESH V V , DYER E , RAGHU M . Anatomy of catastrophic forgetting: Hidden representations and task semantics [EB/OL]. ( 2020-07-14 )[ 2022-10-02 ] https://arxiv.org/abs/2007.07400 https://arxiv.org/abs/2007.07400 .

THOMPSON B , GWINNUP J , KHAYRALLAH H , et al . Overcoming catastrophic forgetting during domain adaptation of neural machine translation [C]// Proceedings of the 2019 Conference of the North . Stroudsburg : Association for Computational Linguistics , 2019 : 2062 - 2068 .

YE M , RUAN W J , DU B , et al . Channel augmented joint learning for visible-infrared recognition [C]// 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2022 : 13547 - 13556 .

NGUYEN D , HONG H , KIM K , et al . Person recognition system based on a combination of body images from visible light and thermal cameras [J]. Sensors , 2017 , 17 ( 3 ): 605 .

WANG F , LIU H P . Understanding the behaviour of contrastive loss [C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 2495 - 2504 .

YEH C-H , HONG C-Y , HSU Y-C , et al . Decoupled contrastive learning [J]. ( 2022-10-13 )[ 2022-10-02 ]. https://arxiv.org/abs/2110.06848 https://arxiv.org/abs/2110.06848 .

ZHANG J Y , GE Y X , GU X Q , et al . Self-supervised pre-training on the target domain for cross-domain person re-identification [C]// Proceedings of the 29th ACM International Conference on Multimedia . New York : ACM , 2021 : 4268 - 4276 .

ZHONG Z , ZHENG L , KANG G , et al . Random erasing data augmentation [C]// Proceedings of the 34th AAAI Conference on Artificial Intelligence and the Thirty-Second Conference on Innovative Applications of Artificial I-ntelligence and the Tenth Symposium on Educational Advances in Artificial Intelligence . New York : AAAI Press , 2020 : 13001 - 13008 .

ZHONG Z , ZHENG L , LUO Z M , et al . Invariance matters: Exemplar memory for domain adaptive person re-identification [C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 598 - 607 .

JÜNGLING K KAI , ARENS M . Local feature based person reidentification in infrared image sequences [C]// 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance . Piscataway : IEEE , 2010 : 448 - 455 .

SHORTEN C , KHOSHGOFTAAR T M . A survey on image data augmentation for deep learning [J]. Journal of Big Data , 2019 , 6 ( 1 ): 1 - 48 .

DENG K Y , ZHANG C X , CHEN Z , et al . Jointing recurrent across-channel and spatial attention for multi-object tracking with block-erasing data augmentation [J]. IEEE Transactions on Circuits and Systems for Video Technology , 2023 , 33 ( 8 ): 4054 - 4069 .

REDMON J , DIVVALA S , GIRSHICK R , et al . You only look once: Unified, real-time object detection [C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 779 - 788 .

HAO Y , WANG N , LI J , et al . HSME: Hypersphere manifold embedding for visible thermal person re-identification [C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence . Honolulu : AAAI Press , 2019 : 8385 - 8392 .

LOSHCHILOV I , HUTTER F . SGDR: Stochastic gradient descent with warm restarts [EB/OL]. ( 2016-08-13 )[ 2022-10-02 ]. https://arxiv.org/abs/1608.03983 https://arxiv.org/abs/1608.03983 .

KINGMA D P , BA J . Adam: A method for stochastic opt-imization [EB/OL]. ( 2014-12-22 )[ 2022-10-02 ]. https://arxi-v.o-rg/abs/1412.6980 https://arxi-v.o-rg/abs/1412.6980 .

YE M , SHEN J B , CRANDALL D J , et al . Dynamic dual-attentive aggregation learning for visible-infrared person re-identification [C]// Computer Vision–ECCV 2020 . Cham : Springer International Publishing , 2020 : 229 - 247 .

WANG G A , ZHANG T Z , CHENG J , et al . RGB-infrared cross-modality person re-identification via joint pixel and feature alignment [C]// 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2020 : 3622 - 3631 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Speech Large Language Models: Architecture, Training and Challenges Analysis

Self-Supervised Hand Pose Estimation with Regional Depth Correspondence

Dynamic Neural Network for Incremental Learning with Task Extended: Research Progress and Prospect

Cross-Modality Person Re-identification Based on Locally Heterogeneous Polymerization Graph Convolutional Network

Related Author

XIE Rui-rui

LI Ai-jun

ZHAO Bin

RONG Lu

LIU Qi-meng

ZHANG Ya-zhou

HUANG Wei-ting

LIAO Jian-xin

Related Institution

Institute of Linguistics, Chinese Academy of Social Sciences

School of Education, Tianjin University

College of Intelligence and Computing, Tianjin University

Software Engineering College, Zhengzhou University of Light Industry

China Mobile Group Design Institute Co.， Ltd.

⁰