Video-Based Person Re-Identification Using Long-Short Term Temporal Relationship Network

HE Zhi-min; QIAN Jiang-bo; YAN Di-qun; YE Xu-lun; WANG Chong

doi:10.12263/DZXB.20230342

您当前的位置：

首页 >

文章列表页 >

Video-Based Person Re-Identification Using Long-Short Term Temporal Relationship Network

PAPERS | 更新时间：2025-12-08

- Video-Based Person Re-Identification Using Long-Short Term Temporal Relationship Network
- ACTA ELECTRONICA SINICA Vol. 52, Issue 8, Pages: 2746-2757(2024)
- 作者机构：
  
  1.宁波大学信息科学与工程学院，浙江宁波 315211
  2.浙江移动网络应用技术重点实验室，浙江宁波 315211
- 作者简介：
- 基金信息：
  
  National Natural Science Foundation of China(62271274);Ningbo Science and Technology Project(2024Z004;2023Z059)
- DOI：10.12263/DZXB.20230342
  CLC： TP391.41;
- Received：14 April 2023，
  
  Revised：2024-06-06，
  
  Published：25 August 2024
- 稿件说明：
移动端阅览
何智敏, 钱江波, 严迪群, 等. 基于长短期时间关系网络的视频行人重识别[J]. 电子学报, 2024, 52(08): 2746-2757.

HE Zhi-min, QIAN Jiang-bo, YAN Di-qun, et al. Video-Based Person Re-Identification Using Long-Short Term Temporal Relationship Network[J]. Acta Electronica Sinica, 2024, 52(08): 2746-2757.
何智敏, 钱江波, 严迪群, 等. 基于长短期时间关系网络的视频行人重识别[J]. 电子学报, 2024, 52(08): 2746-2757. DOI：10.12263/DZXB.20230342

HE Zhi-min, QIAN Jiang-bo, YAN Di-qun, et al. Video-Based Person Re-Identification Using Long-Short Term Temporal Relationship Network[J]. Acta Electronica Sinica, 2024, 52(08): 2746-2757. DOI：10.12263/DZXB.20230342

摘要

行人重识别是计算机视觉领域中的一个重要研究方向，其目的是在不同的监控摄像头中识别并跟踪同一行人.由于视频帧间存在多种时间关系，从这些关系中可以获取到对象的运动模式以及细粒度特征，因此视频重识别相比图像重识别拥有更丰富的时空线索，也更接近实际应用.问题的关键是如何挖掘这些时空线索作为视频重识别的特征.本文针对视频行人重识别问题，提出了一种基于Transformer的长短期时间关系网络（Long and Short Time Transformer，LSTT）.该网络包含长短期时间关系模块，提取重要时序信息并强化特征表示.长期时间关系模块利用记忆线索存储每帧信息，并在每一帧建立全局联系；短期时间关系模块则考虑相邻帧之间交互，学习细粒度目标信息，提高特征表示能力.此外，为了提高模型对不同目标特征的适配性，本文还设计了一个包含不同规格卷积核的多尺度模块.该模块具有多种卷积感受野，能够更全面覆盖目标区域，从而进一步提高模型的泛化性能.在MARS、MARS_DL和iLIDS-VID 3个数据集上的实验结果表明，LSTT模型性能最优.

Abstract

Person re-identification is an important research direction in the field of computer vision

aiming to identify and track the same person across different surveillance cameras. Compared with image-based re-identification methods

the video-based re-identification method has richer temporal and spatial information

making it more efficient in real-world applications. Due to the existence of various temporal relationships between video frames

valuable information such as motion patterns and fine-grained features can be obtained. Therefore

how to effectively extract these temporal and spatial clues has become a key issue in video-based re-identification. In this paper

a long and short time Transformer (LSTT) network based on a temporal relationship is proposed to address the video-based person re-identification problem. The module includes long and short term relationship modules to extract important temporal information and enhance feature representation. The long-term relationship module stores information for each frame using a memory cue and establishes global connections for each video frame. The short-term relationship module considers interaction between adjacent frames to learn fine-grained target information and improve feature representation. Additionally

to improve the model’s adaptability to different target features

a multi-scale module with convolution kernels of different sizes is designed. The module has multiple convolution receptive fields and can more comprehensively cover the target area

further improving the model’s generalization performance. Experimental results on three datasets

namely MARS

MARS_DL

and iLIDS-VID

demonstrate that the LSTT model achieves state-of-the-art performance.

关键词

Keywords

references

GE Y , LI Z , ZHAO H , et al . FD-GAN: Pose-guided feature distilling GAN for robust person re-identification [J ] . Advances in Neural Information Processing Systems , 2018 , 31 : 1 - 13 .

JIAO B , TAN X , ZHOU J , et al . Instance and pair-aware dynamic networks for re-identification [EB/OL ] . ( 2021 )[2023 ] . https://arxiv.org/abs/2103.05395 https://arxiv.org/abs/2103.05395 .

YE M , SHEN J , LIN G , et al . Deep learning for person re-identification: A survey and outlook [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2021 , 44 ( 6 ): 2872 - 2893 .

ZHOU Q , FAN H , ZHENG S , et al . Graph correspondence transfer for person re-identification [C ] // Proceedings of the AAAI Conference on Artificial Intelligence . New York : ACM , 2018 : 3279 - 3288 .

WANG K , WANG P , DING C , et al . Batch coherence-driven network for part-aware person re-identification [J ] . IEEE Transactions on Image Processing , 2021 , 30 : 3405 - 3418 .

GAO Z , WEI S X , GUAN W L , et al . Identity-guided collaborative learning for cloth-changing person reidentification [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2024 , 46 ( 5 ): 2819 - 2837 .

YU Z , TIWARI P , HOU L , et al . MV-ReID: 3D multi-view transformation network for occluded person re-identification [J ] . Knowledge-Based Systems , 2024 , 283 : 111200 .

FU Y , WANG X , WEI Y , et al . STA: Spatial-temporal attention for large-scale video-based person re-identification [C ] // Proceedings of the AAAI Conference on Artificial Intelligence . New York : ACM , 2019 : 8287 - 8294 .

HOU R B , CHANG H , MA B P , et al . Temporal complementary learning for video person re-identification [C ] // Computer Vision-ECCV 2020 . Cham : Springer International Publishing , 2020 : 388 - 405 .

LI X , ZHOU W , ZHOU Y , et al . Relation-guided spatial attention and temporal refinement for video-based person re-identification [C ] // Proceedings of the AAAI Conference on Artificial Intelligence . New York : ACM , 2020 : 11434 - 11441 .

YANG J R , ZHENG W S , YANG Q Z , et al . Spatial-temporal graph convolutional network for video-based person re-identification [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 3289 - 3299 .

HU X , WEI D , WANG Z , et al . Hypergraph video pedestrian re-identification based on posture structure relationship and action constraints [J ] . Pattern Recognition , 2021 , 111 : 107688 .

MEKHAZNI D , DUFAU M , DESROSIERS C , et al . Camera alignment and weighted contrastive learning for domain adaptation in video person ReID [C ] // 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) . Piscataway : IEEE , 2023 : 1624 - 1633 .

SI J L , ZHANG H G , LI C G , et al . Dual attention matching network for context-aware feature sequence based person re-identification [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 5363 - 5372 .

ZHANG Z Z , LAN C L , ZENG W J , et al . Multi-granularity reference-aided attentive feature aggregation for video-based person re-identification [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 10407 - 10416 .

SARFRAZ M S , SCHUMANN A , EBERLE A , et al . A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 420 - 429 .

LIU J X , NI B B , YAN Y C , et al . Pose transferrable person re-identification [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 4099 - 4108 .

JIAO B L , GAO L Y , WANG P . Temporal-consistent visual clue attentive network for video-based person re-identification [C ] // Proceedings of the 2022 International Conference on Multimedia Retrieval . New York : ACM , 2022 : 72 - 80 .

LI J N , ZHANG S L , HUANG T J . Multi-scale 3D convolution network for video based person re-identification [C ] // Proceedings of the AAAI Conference on Artificial Intelligence . New York : ACM , 2019 : 8618 - 8625 .

MA Y , BAI T , ZHANG W Y , et al . Multi-scale relation network for person re-identification [C ] // 2021 IEEE Symposium on Computers and Communications (ISCC) . Piscataway : IEEE , 2021 : 1 - 7 .

MCLAUGHLIN N , MARTINEZ DEL RINCON J , MILLER P . Recurrent convolutional network for video-based person re-identification [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 1325 - 1334 .

YAN Y , NI B , SONG Z , et al . Person re-identification via recurrent feature aggregation [C ] // Computer Vision-ECCV 2016: 14th European Conference . Amsterdam : Springer International Publishing , 2016 : 701 - 716 .

GU X , CHANG H , MA B , et al . Appearance-preserving 3d convolution for video-based person re-identification [C ] // Computer Vision-ECCV 2020: 16th European Conference . Glasgow : Springer International Publishing , 2020 : 228 - 243 .

HOU R , MA B , CHANG H , et al . IAUnet: Global context-aware feature learning for person reidentification [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2020 , 32 ( 10 ): 4460 - 4474 .

WANG X L , GIRSHICK R , GUPTA A , et al . Non-local neural networks [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 7794 - 7803 .

REN S C , ZHOU D Q , HE S F , et al . Shunted self-attention via multi-scale token aggregation [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 10853 - 10862 .

EOM C , LEE G , LEE J , et al . Video-based person re-identification with spatial and temporal memory networks [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2021 : 12036 - 12045 .

WANG Y Q , ZHANG P P , GAO S , et al . Pyramid spatial-temporal aggregation for video-based person re-identification [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2021 : 12026 - 12035 .

HOU R B , CHANG H , MA B P , et al . BiCnet-TKS: Learning efficient spatial-temporal representation for video person re-identification [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 2014 - 2023 .

DOSOVITSKIY A , BEYER L , KOLESNIKOV A , et al . An image is worth 16x16 words: Transformers for image recognition at scale [J ] . ( 2020 )[2023 ] . https://arxiv.org/abs/2010.11929 https://arxiv.org/abs/2010.11929 .

CARION N , MASSA F , SYNNAEVE G , et al . End-to-end object detection with transformers [C ] // Computer Vision-ECCV 2020: 16th European Conference . Glasgow : Springer International Publishing , 2020 : 213 - 229 .

ZANG X , LI G , GAO W . Multidirection and multiscale pyramid in transformer for video-based pedestrian retrieval [J ] . IEEE Transactions on Industrial Informatics , 2022 , 18 ( 12 ): 8776 - 8785 .

WU J L , HE L X , LIU W , et al . CAViT: contextual alignment vision transformer for video object re-identification [C ] // Lecture Notes in Computer Science . Cham : Springer Nature Switzerland , 2022 : 549 - 566 .

ZHENG L , BIE Z , SUN Y F , et al . MARS: A video benchmark for large-scale person re-identification [C ] // Computer Vision-ECCV 2016: 15th European Conference . Cham : Springer International Publishing , 2016 : 868 - 884 .

LIU C T , CHEN J C , CHEN C S , et al . Video-based person re-identification without bells and whistles [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) . Piscataway : IEEE , 2021 : 1491 - 1500 .

WANG T , GONG S , ZHU X , et al . Person re-identification by video ranking [C ] // Computer Vision-ECCV 2014: 13th European Conference . Zurich : Springer International Publishing , 2014 : 688 - 703 .

LI S , BAK S , CARR P , et al . Diversity regularized spatiotemporal attention for video-based person re-identification [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 369 - 378 .

HE T Y , JIN X , SHEN X , et al . Dense interaction learning for video-based person re-identification [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2021 : 300 .

CHEN G Y , RAO Y M , LU J W , et al . Temporal coherence or temporal motion: Which is more critical for video-based person re-identification? [C ] /// Computer Vision-ECCV 2020: 19th European Conference . Cham : Springer International Publishing , 2020 : 660 - 676 .

PATHAK P , ESHRATIFAR A E , GORMISH M . Video person re-ID: Fantastic techniques and where to find them (student abstract) [C ] // Proceedings of the AAAI Conference on Artificial Intelligence . New York : ACM , 2020 : 13893 - 13894 .

ZHAO J N , QI F L , REN G Y , et al . PhD learning: Learning with pompeiu-hausdorff distances for video-based vehicle re-identification [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 2225 - 2235 .

WU Y , BOURAHLA O E F , LI X , et al . Adaptive graph representation learning for video person re-identification [J ] . IEEE Transactions on Image Processing , 2020 , 29 : 8821 - 8830 .

YAN Y C , QIN J , CHEN J X , et al . Learning multi-granular hypergraphs for video-based person re-identification [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 2899 - 2908 .

LIU J W , ZHA Z J , WU W , et al . Spatial-temporal correlation and topology learning for person re-identification in videos [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 4370 - 4379 .

LIU X H , ZHANG P P , YU C Y , et al . Watching you: Global-guided reciprocal learning for video-based person re-identification [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 13334 - 13343 .

AICH A , ZHENG M , KARANAM S , et al . Spatio-temporal representation factorization for video-based person re-identification [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2021 : 152 - 162 .

HOU R , MA B , CHANG H , et al . Feature completion for occluded person re-identification [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2021 , 44 ( 9 ): 4894 - 4912 .

LIU X H , ZHANG P P , LU H C . Video-based person re-identification with long short-term representation learning [C ] // Lecture Notes in Computer Science . Cham : Springer Nature Switzerland , 2023 : 55 - 67 .

PATHAK P , ESHRATIFAR A E , GORMISH M . Video person re-id: Fantastic techniques and where to find them (student abstract) [C ] // Proceedings of the AAAI Conference on Artificial Intelligence . New York : ACM , 2020 : 13893 - 13894 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

A Medical Image Segmentation Network Based on Cross-Visual State Space and Multi-Branch Interactive Attention

A Motion Planning Method for Autonomous Driving Based on Spatiotemporal Attention Transformer

Cross-Modal Light-3Dformer Model for Lung Tumor Classification

Discriminative Category Prompt Learning Based on Image Content Understanding

Related Author

XUE Wei

CHEN Chuang-hui

DU Ming-yang

ZHONG Ping

ZHENG Xiao

YUAN Ding

LI Yuan

MENG Yu-qian

Related Institution

College of Electronic Science and Technology, National University of Defense Technology

College of Electronic Engineering, National University of Defense Technology

School of Computer Science and Technology, Anhui University of Technology, Maanshan

School of Astronautics, Beihang University

School of medical information & Engineering, Ningxia Medical University

⁰