长安大学信息工程学院,陕西西安 710064
[ "方鑫 男,2000年出生于陕西省商洛市.现为长安大学信息工程学院硕士研究生.主要研究方向为视觉目标跟踪. E-mail: fangxin@chd.edu.cn" ]
[ "陈柘 男,1969年出生于陕西省西安市.博士,现为长安大学信息工程学院副教授、硕士生导师.主要研究方向为计算机视觉与机器学习.E-mail: zchen@chd.edu.cn" ]
[ "刘占文 女,1983年出生于河南省安阳市.博士,现为长安大学信息工程学院教授、博士生导师.主要研究方向为车路云一体化交通环境感知与在环测试.E-mail: zwliu@chd.edu.cn" ]
[ "李小鹏 男,2000年出生于福建省福州市.现为长安大学信息工程学院硕士研究生.主要研究方向为图像超分辨率. E-mail: xiaopengli@chd.edu.cn" ]
[ "宿雨心 女,1999年出生于内蒙古自治区呼和浩特市.现为长安大学信息工程学院硕士研究生.主要研究方向为图像复原.E-mail: yuxinsu@chd.edu.cn" ]
收稿:2024-07-31,
修回:2025-02-26,
纸质出版:2025-03-25
移动端阅览
方鑫, 陈柘, 刘占文, 等. 面向不同挑战及同异质信息分离的RGBT跟踪[J]. 电子学报, 2025, 53(03): 910-925.
FANG Xin, CHEN Zhe, LIU Zhan-wen, et al. Facing Different Challenges and Separating Homogeneous and Heterogeneous Information for RGBT Tracking[J]. Acta Electronica Sinica, 2025, 53(03): 910-925.
方鑫, 陈柘, 刘占文, 等. 面向不同挑战及同异质信息分离的RGBT跟踪[J]. 电子学报, 2025, 53(03): 910-925. DOI:10.12263/DZXB.20240713
FANG Xin, CHEN Zhe, LIU Zhan-wen, et al. Facing Different Challenges and Separating Homogeneous and Heterogeneous Information for RGBT Tracking[J]. Acta Electronica Sinica, 2025, 53(03): 910-925. DOI:10.12263/DZXB.20240713
可见光热红外(RGB and Thermal infrared,RGBT)跟踪是一种结合了可见光和热红外光两种不同传感器信息的多模态目标跟踪方法.这种方法旨在克服单一传感器在特定环境下的局限性,通过融合多种传感器的数据来提高目标跟踪的鲁棒性和准确性.然而,在现有的RGBT跟踪算法中,大多将可见光与热红外图像提取的特征直接进行融合,忽略了两种模态间的同质性与异质性.此外,RGBT跟踪还经常受到目标快速运动、尺度变化、光照变化、热交叉和遮挡等多种挑战因素的影响,现有工作往往是通过研究单一结构来同时解决所有问题,但这需要足够复杂的模型和足够多的训练数据.本文提出了一种新的面向不同挑战并结合多模态同异质信息分离与融合的网络,用于RGBT跟踪.在该网络的每层主干中都设计了一个挑战感知模块用于融合每种挑战下来自可见光与热红外两种不同模态的特征,并自适应地聚合所有挑战下的融合特征.此外,还加入了注意力增强模块及多尺度辅助模块对主干网络所提取的特征进行增强.最后根据可见光与热红外的同质性与异质性,分别提取它们的特有特征与共有特征并进行自适应融合.在GTOT、RGBT234和LasHeR数据集上的大量实验表明,与现有RGBT跟踪方法相比,本文提出的跟踪器显示出非常强的竞争力.
RGB and Thermal infrared (RGBT) tracking is a multi-modal object tracking method that integrates different information from visible light and thermal infrared sensors. This method aims to overcome the limitations of single sensor in a specific condition and increase the robustness and accuracy of object tracking by fusing data from multiple sensors. However
the majority of RGBT tracking methods in use today directly fuse features extracted from thermal infrared and visible light images
ignoring the homogeneity and heterogeneity of the two modalities. In addition
RGBT tracking is often affected by multiple challenging factors such as objects fast motion
scale variation
illumination variation
thermal crossover
and occlusion. Existing work often focuses on a single model to solve all challenges simultaneously
which requires highly complex model and extensive training data. This paper proposes a novel network called CMHHNet (facing different Challenges and combining Multi-modal Homogeneous and Heterogeneous information separation and integration Network) for RGBT tracking. In this network
a challenge-aware module is deployed in each layer of the backbone to fuse the visible light and thermal infrared features from two different modalities under each challenge separately
and adaptively aggregate the fused features under all challenges. In addition
an attention enhancement module and a multi-scale auxiliary module are added to strengthen the features that the backbone network has extracted. Finally
according to the homogeneity and heterogeneity of thermal infrared and visible light
their unique and common features are extracted separately and adaptively fused. Extensive experiments on GTOT
RGBT234 and LasHeR datasets demonstrate that the tracker proposed in this paper shows quite strong competitiveness compared with existing RGBT tracking methods.
张天路 , 张强 . 基于深度学习的RGB-T目标跟踪技术综述 [J ] . 模式识别与人工智能 , 2023 , 36 ( 4 ): 327 - 353 .
ZHANG T L , ZHANG Q . A survey of RGB-T object tracking technologies based on deep learning [J ] . Pattern Recognition and Artificial Intelligence , 2023 , 36 ( 4 ): 327 - 353 . (in Chinese)
WANG Q R , YUAN C , LIN Z H . Learning attentional recurrent neural network for visual tracking [C ] // Proceedings of 2017 IEEE International Conference on Multimedia and Expo (ICME) . Piscataway : IEEE , 2017 : 1237 - 1242 .
VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [C ] // Proceedings of Advances in Neural Information Processing Systems (NeurIPS) . Long Beach , 2017 : 5998 - 6008 .
CUI Y T , JIANG C , WANG L M , et al . MixFormer: End-to-end tracking with iterative mixed attention [C ] // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 13598 - 13608 .
LIN L T , FAN H , ZHANG Z P , et al . Swintrack: A simple and strong baseline for transformer tracking [C ] // Proceedings of Advances in Neural Information Processing Systems (NeurIPS) . New Orleans , 2022 : 16743 - 16754 .
TU Z Z , XIA T , LI C L , et al . RGB-T image saliency detection via collaborative graph learning [J ] . IEEE Transactions on Multimedia , 2020 , 22 ( 1 ): 160 - 173 .
XU D , OUYANG W L , RICCI E , et al . Learning cross-modal deep representations for robust pedestrian detection [C ] // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 4236 - 4244 .
SUN Y X , ZUO W X , LIU M . RTFNet: RGB-thermal fusion network for semantic segmentation of urban scenes [J ] . IEEE Robotics and Automation Letters , 2019 , 4 ( 3 ): 2576 - 2583 .
LI C L , ZHU C L , ZHENG S F , et al . Two-stage modality-graphs regularized manifold ranking for RGB-T tracking [J ] . Signal Processing: Image Communication , 2018 , 68 : 207 - 217 .
LAN X Y , YE M , ZHANG S P , et al . Robust collaborative discriminative learning for RGB-infrared tracking [C ] // Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence . New York : ACM , 2018 : 7008 - 7015 .
LAN X Y , YE M , SHAO R , et al . Learning modality-consistency feature templates: A robust RGB-infrared tracking system [J ] . IEEE Transactions on Industrial Electronics , 2019 , 66 ( 12 ): 9887 - 9897 .
LI C L , LIANG X Y , LU Y J , et al . RGB-T object tracking: Benchmark and baseline [J ] . Pattern Recognition , 2019 , 96 : 106977 .
ZHAI S L , SHAO P P , LIANG X Y , et al . Fast RGB-T tracking via cross-modal correlation filters [J ] . Neurocomputing , 2019 , 334 : 172 - 181 .
ZHANG L C , DANELLJAN M , GONZALEZ-GARCIA A , et al . Multi-modal fusion for end-to-end RGB-T tracking [C ] // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW) . Piscataway : IEEE , 2019 : 2252 - 2261 .
KANG B , LIANG D , DING W , et al . Grayscale-thermal tracking via inverse sparse representation based collaborative encoding [J ] . IEEE Transactions on Image Processing , 2019 , 29 : 3401 - 3415 .
ZHANG X M , ZHANG X H , DU X D , et al . Learning multi-domain convolutional network for RGB-T visual tracking [C ] // Proceedings of 2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) . Piscataway : IEEE , 2018 : 1 - 6 .
NAM H , HAN B . Learning multi-domain convolutional neural networks for visual tracking [C ] // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 4293 - 4302 .
LI C L , LU A D , ZHENG A H , et al . Multi-adapter RGBT tracking [C ] // Proceedings of IEEE/CVF International Conference on Computer Vision Workshop (ICCVW) . Piscataway : IEEE , 2019 : 1 - 9 .
LU A D , LI C L , YAN Y Q , et al . RGBT tracking via multi-adapter network with hierarchical divergence loss [J ] . IEEE Transactions on Image Processing , 2021 , 30 : 5613 - 5625 .
GAO Y , LI C L , ZHU Y B , et al . Deep adaptive fusion network for high performance RGBT tracking [C ] // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW) . Piscataway : IEEE , 2019 : 1 - 9 .
ZHU Y B , LI C L , TANG J , et al . Quality-aware feature aggregation network for robust RGBT tracking [J ] . IEEE Transactions on Intelligent Vehicles , 2021 , 6 ( 1 ): 121 - 130 .
ZHU Y B , LI C L , LUO B , et al . Dense feature aggregation and pruning for RGBT tracking [C ] // Proceedings of the 27th ACM International Conference on Multimedia . New York : ACM , 2019 : 465 - 472 .
BHAT G , DANELLJAN M , VAN GOOL L , et al . Learning discriminative model prediction for tracking [C ] // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2019 : 6181 - 6190 .
CHATFIELD K , SIMONYAN K , VEDALDI A , et al . Return of the devil in the details: Delving deep into convolutional nets [EB/OL ] . ( 2014-05-14 )[ 2025-03-11 ] . https://arxiv.org/abs/1405.3531v4 https://arxiv.org/abs/1405.3531v4 .
LI C L , LIU L , LU A D , et al . Challenge-Aware RGBT Tracking [M ] // Computer Vision-ECCV 2020 . Cham : Springer International Publishing , 2020 : 222 - 237 .
XIAO Y , YANG M M , LI C L , et al . Attribute-based progressive fusion network for RGBT tracking [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2022 , 36 ( 3 ): 2831 - 2838 .
LI C L , CHENG H , HU S Y , et al . Learning collaborative sparse representation for grayscale-thermal tracking [J ] . IEEE Transactions on Image Processing , 2016 , 25 ( 12 ): 5743 - 5756 .
LI C L , XUE W L , JIA Y Q , et al . LasHeR: A large-scale high-diversity benchmark for RGBT tracking [J ] . IEEE Transactions on Image Processing , 2022 , 31 : 392 - 404 .
LI C L , WU X H , ZHAO N , et al . Fusing two-stream convolutional neural networks for RGB-T object tracking [J ] . Neurocomputing , 2018 , 281 : 78 - 85 .
LI C L , ZHU C L , HUANG Y , et al . Cross-Modal Ranking with Soft Consistency and Noisy Labels for Robust RGB-T Tracking [M ] // Computer Vision - ECCV 2018 . Cham : Springer International Publishing , 2018 : 831 - 847 .
ZHANG H , ZHANG L , ZHUO L , et al . Object tracking in RGB-T videos using modal-aware attention network and competitive learning [J ] . Sensors , 2020 , 20 ( 2 ): 393 .
LU A D , QIAN C , LI C L , et al . Duality-gated mutual condition network for RGBT tracking [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2025 , 36 ( 3 ): 4118 - 4131 .
XIA W D , ZHOU D M , CAO J D , et al . CIRNet: An improved RGBT tracking via cross-modality interaction and re-identification [J ] . Neurocomputing , 2022 , 493 : 327 - 339 .
ZHANG X C , YE P , PENG S Y , et al . SiamFT: An RGB-infrared fusion tracking method via fully convolutional Siamese networks [J ] . IEEE Access , 2019 , 7 : 122122 - 122133 .
ZHANG X C , YE P , PENG S Y , et al . DSiamMFT: An RGB-T fusion tracking method via dynamic Siamese networks using multi-layer feature fusion [J ] . Signal Processing: Image Communication , 2020 , 84 : 115756 .
ZHANG T L , LIU X R , ZHANG Q , et al . SiamCDA: Complementarity- and distractor-aware RGB-T tracking based on Siamese network [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2022 , 32 ( 3 ): 1403 - 1417 .
ZHAO L , ZHU M , REN H E , et al . Channel exchanging for RGB-T tracking [J ] . Sensors , 2021 , 21 ( 17 ): 5800 .
ZHANG Q , LIU X R , ZHANG T L . RGB-T tracking by modality difference reduction and feature re-selection [J ] . Image and Vision Computing , 2022 , 127 : 104547 .
WOO S , PARK J , LEE J Y , et al . CBAM: Convolutional Block Attention Module [M ] // Computer Vision - ECCV 2018 . Cham : Springer International Publishing , 2018 : 3 - 19 .
XU Q , MEI Y M , LIU J P , et al . Multimodal cross-layer bilinear pooling for RGBT tracking [J ] . IEEE Transactions on Multimedia , 2021 , 24 : 567 - 580 .
MEI J T , LIU Y Y , WANG C C , et al . Asymmetric global-local mutual integration network for RGBT tracking [J ] . IEEE Transactions on Instrumentation and Measurement , 2022 , 71 : 5017417 .
MEI J T , ZHOU D M , CAO J D , et al . Differential reinforcement and global collaboration network for RGBT tracking [J ] . IEEE Sensors Journal , 2023 , 23 ( 7 ): 7301 - 7311 .
LI Y D , LAI H C , WANG L J , et al . Multibranch adaptive fusion network for RGBT tracking [J ] . IEEE Sensors Journal , 2022 , 22 ( 7 ): 7084 - 7093 .
LI X , WANG W H , HU X L , et al . Selective kernel networks [C ] // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 510 - 519 .
SZEGEDY C , VANHOUCKE V , IOFFE S , et al . Rethinking the inception architecture for computer vision [C ] // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 2818 - 2826 .
LI C L , ZHAO N , LU Y J , et al . Weighted sparse representation regularized graph learning for RGB-T object tracking [C ] // Proceedings of the 25th ACM International Conference on Multimedia . New York : ACM , 2017 : 1856 - 1864 .
ZHANG P Y , ZHAO J , BO C J , et al . Jointly modeling motion and appearance cues for robust RGB-T tracking [J ] . IEEE Transactions on Image Processing , 2021 , 30 : 3335 - 3347 .
ZHANG P Y , WANG D , LU H C , et al . Learning adaptive attribute-driven representation for real-time RGB-T tracking [J ] . International Journal of Computer Vision , 2021 , 129 ( 9 ): 2714 - 2729 .
PENG J C , ZHAO H T , HU Z W . Dynamic fusion network for RGBT tracking [J ] . IEEE Transactions on Intelligent Transportation Systems , 2023 , 24 ( 4 ): 3822 - 3832 .
WANG X , SHU X J , ZHANG S L , et al . MFGNet: Dynamic modality-aware filter generation for RGB-T tracking [J ] . IEEE Transactions on Multimedia , 2022 , 25 : 4335 - 4348 .
ZHANG P Y , ZHAO J , WANG D , et al . Visible-thermal UAV tracking: A large-scale benchmark and new baseline [C ] // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 8876 - 8885 .
CAI Y J , SUI X B , GU G H , et al . Learning modality feature fusion via transformer for RGBT-tracking [J ] . Infrared Physics & Technology , 2023 , 133 : 104819 .
XIAO X B , XIONG X Z , MENG F Q , et al . Multi-scale feature interactive fusion network for RGBT tracking [J ] . Sensors , 2023 , 23 ( 7 ): 3410 .
YANG J R , DONG E Z , TONG J G , et al . Differential enhancement and commonality fusion for RGBT tracking [C ] // Proceedings of 2023 IEEE International Conference on Mechatronics and Automation (ICMA) . Piscataway : IEEE , 2023 : 351 - 356 .
XUE Y J , ZHANG J W , LIN Z J , et al . SiamCAF: Complementary attention fusion-based Siamese network for RGBT tracking [J ] . Remote Sensing , 2023 , 15 ( 13 ): 3252 .
CAI Y J , SUI X B , GU G H . Multi-modal multi-task feature fusion for RGBT tracking [J ] . Information Fusion , 2023 , 97 : 101816 .
MEI J T , ZHOU D M , CAO J D , et al . HDINet: Hierarchical dual-sensor interaction network for RGBT tracking [J ] . IEEE Sensors Journal , 2021 , 21 ( 15 ): 16915 - 16926 .
ZHU Y B , LI C L , TANG J , et al . RGBT tracking by trident fusion network [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2022 , 32 ( 2 ): 579 - 592 .
TÜRKOĞLU A , AKAGUNDUZ E . EANet: Enhanced attribute-based RGBT tracker network [C ] // Proceedings of Sixteenth International Conference on Machine Vision (ICMV 2023) . Armenia : SPIE , 2024 : 363 - 370 .
MEI J T , ZHOU J X , WANG J , et al . Learning multifrequency integration network for RGBT tracking [J ] . IEEE Sensors Journal , 2024 , 24 ( 9 ): 15517 - 15530 .
LIU L , LI C L , XIAO Y , et al . RGBT tracking via challenge-based appearance disentanglement and interaction [J ] . IEEE Transactions on Image Processing , 2024 , 33 : 1753 - 1767 .
ZHU J W , LAI S M , CHEN X , et al . Visual prompt multi-modal tracking [C ] // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 9516 - 9526 .
HUI T R , XUN Z Z , PENG F G , et al . Bridging search region interaction with template for RGB-T tracking [C ] // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 13630 - 13639 .
CAO B , GUO J L , ZHU P F , et al . Bi-directional adapter for multimodal tracking [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2024 , 38 ( 2 ): 927 - 935 .
WANG C Q , XU C Y , CUI Z , et al . Cross-modal pattern-propagation for RGB-T tracking [C ] // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 7064 - 7073 .
0
浏览量
8
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621