

浏览全部资源
扫码关注微信
1.宁波大学信息科学与工程学院,浙江宁波 315211
2.浙江移动网络应用技术重点实验室,浙江宁波 315211
Received:14 April 2023,
Revised:2024-06-06,
Published:25 August 2024
移动端阅览
何智敏, 钱江波, 严迪群, 等. 基于长短期时间关系网络的视频行人重识别[J]. 电子学报, 2024, 52(08): 2746-2757.
HE Zhi-min, QIAN Jiang-bo, YAN Di-qun, et al. Video-Based Person Re-Identification Using Long-Short Term Temporal Relationship Network[J]. Acta Electronica Sinica, 2024, 52(08): 2746-2757.
何智敏, 钱江波, 严迪群, 等. 基于长短期时间关系网络的视频行人重识别[J]. 电子学报, 2024, 52(08): 2746-2757. DOI:10.12263/DZXB.20230342
HE Zhi-min, QIAN Jiang-bo, YAN Di-qun, et al. Video-Based Person Re-Identification Using Long-Short Term Temporal Relationship Network[J]. Acta Electronica Sinica, 2024, 52(08): 2746-2757. DOI:10.12263/DZXB.20230342
行人重识别是计算机视觉领域中的一个重要研究方向,其目的是在不同的监控摄像头中识别并跟踪同一行人.由于视频帧间存在多种时间关系,从这些关系中可以获取到对象的运动模式以及细粒度特征,因此视频重识别相比图像重识别拥有更丰富的时空线索,也更接近实际应用.问题的关键是如何挖掘这些时空线索作为视频重识别的特征.本文针对视频行人重识别问题,提出了一种基于Transformer的长短期时间关系网络(Long and Short Time Transformer,LSTT).该网络包含长短期时间关系模块,提取重要时序信息并强化特征表示.长期时间关系模块利用记忆线索存储每帧信息,并在每一帧建立全局联系;短期时间关系模块则考虑相邻帧之间交互,学习细粒度目标信息,提高特征表示能力.此外,为了提高模型对不同目标特征的适配性,本文还设计了一个包含不同规格卷积核的多尺度模块.该模块具有多种卷积感受野,能够更全面覆盖目标区域,从而进一步提高模型的泛化性能.在MARS、MARS_DL和iLIDS-VID 3个数据集上的实验结果表明,LSTT模型性能最优.
Person re-identification is an important research direction in the field of computer vision
aiming to identify and track the same person across different surveillance cameras. Compared with image-based re-identification methods
the video-based re-identification method has richer temporal and spatial information
making it more efficient in real-world applications. Due to the existence of various temporal relationships between video frames
valuable information such as motion patterns and fine-grained features can be obtained. Therefore
how to effectively extract these temporal and spatial clues has become a key issue in video-based re-identification. In this paper
a long and short time Transformer (LSTT) network based on a temporal relationship is proposed to address the video-based person re-identification problem. The module includes long and short term relationship modules to extract important temporal information and enhance feature representation. The long-term relationship module stores information for each frame using a memory cue and establishes global connections for each video frame. The short-term relationship module considers interaction between adjacent frames to learn fine-grained target information and improve feature representation. Additionally
to improve the model’s adaptability to different target features
a multi-scale module with convolution kernels of different sizes is designed. The module has multiple convolution receptive fields and can more comprehensively cover the target area
further improving the model’s generalization performance. Experimental results on three datasets
namely MARS
MARS_DL
and iLIDS-VID
demonstrate that the LSTT model achieves state-of-the-art performance.
GE Y , LI Z , ZHAO H , et al . FD-GAN: Pose-guided feature distilling GAN for robust person re-identification [J ] . Advances in Neural Information Processing Systems , 2018 , 31 : 1 - 13 .
JIAO B , TAN X , ZHOU J , et al . Instance and pair-aware dynamic networks for re-identification [EB/OL ] . ( 2021 )[2023 ] . https://arxiv.org/abs/2103.05395 https://arxiv.org/abs/2103.05395 .
YE M , SHEN J , LIN G , et al . Deep learning for person re-identification: A survey and outlook [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2021 , 44 ( 6 ): 2872 - 2893 .
ZHOU Q , FAN H , ZHENG S , et al . Graph correspondence transfer for person re-identification [C ] // Proceedings of the AAAI Conference on Artificial Intelligence . New York : ACM , 2018 : 3279 - 3288 .
WANG K , WANG P , DING C , et al . Batch coherence-driven network for part-aware person re-identification [J ] . IEEE Transactions on Image Processing , 2021 , 30 : 3405 - 3418 .
GAO Z , WEI S X , GUAN W L , et al . Identity-guided collaborative learning for cloth-changing person reidentification [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2024 , 46 ( 5 ): 2819 - 2837 .
YU Z , TIWARI P , HOU L , et al . MV-ReID: 3D multi-view transformation network for occluded person re-identification [J ] . Knowledge-Based Systems , 2024 , 283 : 111200 .
FU Y , WANG X , WEI Y , et al . STA: Spatial-temporal attention for large-scale video-based person re-identification [C ] // Proceedings of the AAAI Conference on Artificial Intelligence . New York : ACM , 2019 : 8287 - 8294 .
HOU R B , CHANG H , MA B P , et al . Temporal complementary learning for video person re-identification [C ] // Computer Vision-ECCV 2020 . Cham : Springer International Publishing , 2020 : 388 - 405 .
LI X , ZHOU W , ZHOU Y , et al . Relation-guided spatial attention and temporal refinement for video-based person re-identification [C ] // Proceedings of the AAAI Conference on Artificial Intelligence . New York : ACM , 2020 : 11434 - 11441 .
YANG J R , ZHENG W S , YANG Q Z , et al . Spatial-temporal graph convolutional network for video-based person re-identification [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 3289 - 3299 .
HU X , WEI D , WANG Z , et al . Hypergraph video pedestrian re-identification based on posture structure relationship and action constraints [J ] . Pattern Recognition , 2021 , 111 : 107688 .
MEKHAZNI D , DUFAU M , DESROSIERS C , et al . Camera alignment and weighted contrastive learning for domain adaptation in video person ReID [C ] // 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) . Piscataway : IEEE , 2023 : 1624 - 1633 .
SI J L , ZHANG H G , LI C G , et al . Dual attention matching network for context-aware feature sequence based person re-identification [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 5363 - 5372 .
ZHANG Z Z , LAN C L , ZENG W J , et al . Multi-granularity reference-aided attentive feature aggregation for video-based person re-identification [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 10407 - 10416 .
SARFRAZ M S , SCHUMANN A , EBERLE A , et al . A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 420 - 429 .
LIU J X , NI B B , YAN Y C , et al . Pose transferrable person re-identification [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 4099 - 4108 .
JIAO B L , GAO L Y , WANG P . Temporal-consistent visual clue attentive network for video-based person re-identification [C ] // Proceedings of the 2022 International Conference on Multimedia Retrieval . New York : ACM , 2022 : 72 - 80 .
LI J N , ZHANG S L , HUANG T J . Multi-scale 3D convolution network for video based person re-identification [C ] // Proceedings of the AAAI Conference on Artificial Intelligence . New York : ACM , 2019 : 8618 - 8625 .
MA Y , BAI T , ZHANG W Y , et al . Multi-scale relation network for person re-identification [C ] // 2021 IEEE Symposium on Computers and Communications (ISCC) . Piscataway : IEEE , 2021 : 1 - 7 .
MCLAUGHLIN N , MARTINEZ DEL RINCON J , MILLER P . Recurrent convolutional network for video-based person re-identification [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 1325 - 1334 .
YAN Y , NI B , SONG Z , et al . Person re-identification via recurrent feature aggregation [C ] // Computer Vision-ECCV 2016: 14th European Conference . Amsterdam : Springer International Publishing , 2016 : 701 - 716 .
GU X , CHANG H , MA B , et al . Appearance-preserving 3d convolution for video-based person re-identification [C ] // Computer Vision-ECCV 2020: 16th European Conference . Glasgow : Springer International Publishing , 2020 : 228 - 243 .
HOU R , MA B , CHANG H , et al . IAUnet: Global context-aware feature learning for person reidentification [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2020 , 32 ( 10 ): 4460 - 4474 .
WANG X L , GIRSHICK R , GUPTA A , et al . Non-local neural networks [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 7794 - 7803 .
REN S C , ZHOU D Q , HE S F , et al . Shunted self-attention via multi-scale token aggregation [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 10853 - 10862 .
EOM C , LEE G , LEE J , et al . Video-based person re-identification with spatial and temporal memory networks [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2021 : 12036 - 12045 .
WANG Y Q , ZHANG P P , GAO S , et al . Pyramid spatial-temporal aggregation for video-based person re-identification [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2021 : 12026 - 12035 .
HOU R B , CHANG H , MA B P , et al . BiCnet-TKS: Learning efficient spatial-temporal representation for video person re-identification [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 2014 - 2023 .
DOSOVITSKIY A , BEYER L , KOLESNIKOV A , et al . An image is worth 16x16 words: Transformers for image recognition at scale [J ] . ( 2020 )[2023 ] . https://arxiv.org/abs/2010.11929 https://arxiv.org/abs/2010.11929 .
CARION N , MASSA F , SYNNAEVE G , et al . End-to-end object detection with transformers [C ] // Computer Vision-ECCV 2020: 16th European Conference . Glasgow : Springer International Publishing , 2020 : 213 - 229 .
ZANG X , LI G , GAO W . Multidirection and multiscale pyramid in transformer for video-based pedestrian retrieval [J ] . IEEE Transactions on Industrial Informatics , 2022 , 18 ( 12 ): 8776 - 8785 .
WU J L , HE L X , LIU W , et al . CAViT: contextual alignment vision transformer for video object re-identification [C ] // Lecture Notes in Computer Science . Cham : Springer Nature Switzerland , 2022 : 549 - 566 .
ZHENG L , BIE Z , SUN Y F , et al . MARS: A video benchmark for large-scale person re-identification [C ] // Computer Vision-ECCV 2016: 15th European Conference . Cham : Springer International Publishing , 2016 : 868 - 884 .
LIU C T , CHEN J C , CHEN C S , et al . Video-based person re-identification without bells and whistles [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) . Piscataway : IEEE , 2021 : 1491 - 1500 .
WANG T , GONG S , ZHU X , et al . Person re-identification by video ranking [C ] // Computer Vision-ECCV 2014: 13th European Conference . Zurich : Springer International Publishing , 2014 : 688 - 703 .
LI S , BAK S , CARR P , et al . Diversity regularized spatiotemporal attention for video-based person re-identification [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 369 - 378 .
HE T Y , JIN X , SHEN X , et al . Dense interaction learning for video-based person re-identification [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2021 : 300 .
CHEN G Y , RAO Y M , LU J W , et al . Temporal coherence or temporal motion: Which is more critical for video-based person re-identification? [C ] /// Computer Vision-ECCV 2020: 19th European Conference . Cham : Springer International Publishing , 2020 : 660 - 676 .
PATHAK P , ESHRATIFAR A E , GORMISH M . Video person re-ID: Fantastic techniques and where to find them (student abstract) [C ] // Proceedings of the AAAI Conference on Artificial Intelligence . New York : ACM , 2020 : 13893 - 13894 .
ZHAO J N , QI F L , REN G Y , et al . PhD learning: Learning with pompeiu-hausdorff distances for video-based vehicle re-identification [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 2225 - 2235 .
WU Y , BOURAHLA O E F , LI X , et al . Adaptive graph representation learning for video person re-identification [J ] . IEEE Transactions on Image Processing , 2020 , 29 : 8821 - 8830 .
YAN Y C , QIN J , CHEN J X , et al . Learning multi-granular hypergraphs for video-based person re-identification [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 2899 - 2908 .
LIU J W , ZHA Z J , WU W , et al . Spatial-temporal correlation and topology learning for person re-identification in videos [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 4370 - 4379 .
LIU X H , ZHANG P P , YU C Y , et al . Watching you: Global-guided reciprocal learning for video-based person re-identification [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 13334 - 13343 .
AICH A , ZHENG M , KARANAM S , et al . Spatio-temporal representation factorization for video-based person re-identification [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2021 : 152 - 162 .
HOU R , MA B , CHANG H , et al . Feature completion for occluded person re-identification [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2021 , 44 ( 9 ): 4894 - 4912 .
LIU X H , ZHANG P P , LU H C . Video-based person re-identification with long short-term representation learning [C ] // Lecture Notes in Computer Science . Cham : Springer Nature Switzerland , 2023 : 55 - 67 .
PATHAK P , ESHRATIFAR A E , GORMISH M . Video person re-id: Fantastic techniques and where to find them (student abstract) [C ] // Proceedings of the AAAI Conference on Artificial Intelligence . New York : ACM , 2020 : 13893 - 13894 .
0
Views
12
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621