Cross-Modal Dual Hashing Based on Transfer Knowledge

ZHONG Jian-qi; LIN Qiu-bin; CAO Wen-ming

doi:10.12263/DZXB.20240032

您当前的位置：

首页 >

文章列表页 >

Cross-Modal Dual Hashing Based on Transfer Knowledge

PAPERS | 更新时间：2025-12-08

- Cross-Modal Dual Hashing Based on Transfer Knowledge
- ACTA ELECTRONICA SINICA Vol. 53, Issue 1, Pages: 209-220(2025)
- 作者机构：
  
  深圳大学电子与信息工程学院，广东深圳 518060
- 作者简介：
- 基金信息：
  
  The National Natural Science Foundation of China(617714322);Fundamental Research Foundation of Shenzhen(JCYJ20220531100814033)
- DOI：10.12263/DZXB.20240032
  CLC： TP391;
- Received：08 January 2024，
  
  Revised：2024-04-19，
  
  Published：25 January 2025
- 稿件说明：
移动端阅览
钟建奇, 林秋斌, 曹文明. 基于迁移知识的跨模态双重哈希[J]. 电子学报, 2025, 53(01): 209-220.

ZHONG Jian-qi, LIN Qiu-bin, CAO Wen-ming. Cross-Modal Dual Hashing Based on Transfer Knowledge[J]. Acta Electronica Sinica, 2025, 53(01): 209-220.
钟建奇, 林秋斌, 曹文明. 基于迁移知识的跨模态双重哈希[J]. 电子学报, 2025, 53(01): 209-220. DOI：10.12263/DZXB.20240032

ZHONG Jian-qi, LIN Qiu-bin, CAO Wen-ming. Cross-Modal Dual Hashing Based on Transfer Knowledge[J]. Acta Electronica Sinica, 2025, 53(01): 209-220. DOI：10.12263/DZXB.20240032

摘要

随着社交网络的普及和多媒体数据的急剧增长，有效的跨模态检索引起了人们越来越多的关注. 由于哈希有效的检索效率和低存储成本，其被广泛用于跨模态检索任务中.然而，这些基于深度学习的跨模态哈希检索方法大多数是利用图像网络和文本网络各自生成对应模态的哈希码，难以获得更加有效的哈希码，无法进一步减小不同模态数据之间的模态鸿沟.为了更好地提高跨模态哈希检索的性能，本文提出了一种基于迁移知识的跨模态双重哈希（Cross-modal Dual Hashing based on Transfer Knowledge，CDHTK）.CDHTK通过结合图像网络、知识迁移网络以及文本网络进行跨模态哈希检索任务.对于图像模态，CDHTK融合图像网络和知识迁移网络各自生成的哈希码，进而生成具有判别性的图像哈希码；对于文本模态，CDHTK融合文本网络和知识迁移网络各自生成的哈希码，从而生成有效的文本哈希码.CDHTK通过采用预测标签的交叉熵损失、生成哈希码的联合三元组量化损失以及迁移知识的差分损失来共同优化哈希码的生成过程，从而提高模型的检索效果，在2个常用的数据集（IAPR TC-12，MIR-Flickr 25K）上进行的实验验证了CDHTK的有效性，比当前最先进的跨模态哈希方法（Adaptive Label correlation based asymm Etric Cross-modal Hashing，ALECH）分别高出6.82%和5.13%.

Abstract

With the popularity of social networks and the rapid growth of multimedia data

efficient cross-modal retrieval has attracted more and more attention. Hashing is widely used in cross-modal retrieval tasks due to its high retrieval efficiency and low storage cost. However

most of these deep learning-based cross-modal hashing retrieval methods utilize image networks and text networks to respectively generate corresponding modal hash codes

making it difficult to obtain more efficient hash codes and unable to further reduce the modal gap between different modal data. To better improve the performance of cross-modal hashing retrieval

this paper proposes a cross-modal dual hashing based on transfer knowledge (CDHTK). CDHTK performs cross-modal hashing retrieval tasks by combining an image network

a transfer knowledge network

and a text network. For the image modality

CDHTK combines the hash codes generated separately by the image network and the knowledge transfer network to generate discriminative hash codes. For the text modality

CDHTK fuses the hash codes generated separately by the text network and the knowledge transfer network to generate efficient hash codes. CDHTK employs a combination of cross-entropy loss for label prediction

joint triplet quantization loss for hash code generation

and differential loss for transfer knowledge to jointly optimize the hash code generation process

thereby improving the retrieval performance of the model. Experiments on two commonly used data sets (IAPR TC-12

MIR-Flickr 25K) verified the effectiveness of CDHTK

which outperforms the current state-of-the-art cross-modal hashing method ALECH (Adaptive Label correlation based asymmEtric Cross-modal Hashing) by 6.82% and 5.13%

respectively.

关键词

Keywords

references

李志欣 , 凌锋 , 张灿龙 , 等 . 融合两级相似度的跨媒体图像文本检索 [J ] . 电子学报 , 2021 , 49 ( 2 ): 268 - 274 .

LI Z X , LING F , ZHANG C L , et al . Cross-media image-text retrieval with two level similarity [J ] . Acta Electronica Sinica , 2021 , 49 ( 2 ): 268 - 274 . (in Chinese)

SHARMA A , KUMAR A , DAUME H , et al . Generalized Multiview Analysis: A discriminative latent space [C ] // 2012 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2012 : 2160 - 2167 .

JING X Y , HU R M , ZHU Y P , et al . Intra-view and inter-view supervised correlation analysis for multi-view feature learning [C ] // Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence . New York : ACM , 2014 : 1882 - 1889 .

JIA Y Q , SALZMANN M , DARRELL T . Learning cross-modality similarity for multinomial data [C ] // 2011 International Conference on Computer Vision . Piscataway : IEEE , 2011 : 2407 - 2414 .

ZHENG Y , ZHANG Y J , LAROCHELLE H . Topic modeling of multimodal data: An autoregressive approach [C ] // 2014 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2014 : 1370 - 1377 .

WANG J , HE Y H , KANG C C , et al . Image-text cross-modal retrieval via modality-specific feature learning [C ] // Proceedings of the 5th ACM on International Conference on Multimedia Retrieval . New York : ACM , 2015 : 347 - 354 .

FROME A , CORRADO G , SHLENS J , et al . Devise: A deep visual-semantic embedding model [C ] // 2013 the Advances in Neural Information Processing System . Massachusetts : MIT Press , 2013 : 2121 - 2129 .

姚涛 , 孔祥维 , 付海燕 , 等 . 基于映射字典学习的跨模态哈希检索 [J ] . 自动化学报 , 2018 , 44 ( 8 ): 1475 - 1485 .

YAO T , KONG X W , FU H Y , et al . Projective dictionary learning hashing for cross-modal retrieval [J ] . Acta Automatica Sinica , 2018 , 44 ( 8 ): 1475 - 1485 . (in Chinese)

SONG J K , YANG Y , YANG Y , et al . Inter-media hashing for large-scale retrieval from heterogeneous data sources [C ] // Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data . New York : ACM , 2013 : 785 - 796 .

DING G G , GUO Y C , ZHOU J L . Collective matrix factorization hashing for multimodal data [C ] // 2014 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2014 : 2083 - 2090 .

朱磊 , 李京智 , 王天时 , 等 . 联邦无监督跨模态哈希 [J ] . 中国科学: 信息科学 , 2023 , 53 ( 11 ): 2180 - 2201 .

ZHU L , LI J Z , WANG T S , et al . Federated unsupervised cross-modal Hashing [J ] . Scientia Sinica (Informationis) , 2023 , 53 ( 11 ): 2180 - 2201 . (in Chinese)

LIN Z J , DING G G , HU M Q , et al . Semantics-preserving hashing for cross-view retrieval [C ] // 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2015 : 3864 - 3872 .

严双咏 , 刘长红 , 江爱文 , 等 . 语义耦合相关的判别式跨模态哈希学习算法 [J ] . 计算机学报 , 2019 , 42 ( 1 ): 164 - 175 .

YAN S Y , LIU C H , JIANG A W , et al . Discriminative cross-modal hashing with coupled semantic correlation [J ] . Chinese Journal of Computers , 2019 , 42 ( 1 ): 164 - 175 . (in Chinese)

ZHANG D , LI W J . Large-scale supervised multimodal hashing with semantic correlation maximization [C ] // Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence . New York : ACM , 2014 : 2177 - 2183 .

李慧琼 , 王永欣 , 陈振铎 , 等 . 基于排序的监督离散跨模态哈希 [J ] . 计算机学报 , 2021 , 44 ( 8 ): 1620 - 1635 .

LI H Q , WANG Y X , CHEN Z D , et al . Ranking-based supervised discrete cross-modal hashing [J ] . Chinese Journal of Computers , 2021 , 44 ( 8 ): 1620 - 1635 . (in Chinese)

CHATFIELD K , SIMONYAN K , VEDALDI A , et al . Return of the devil in the details: Delving deep into convolutional nets [C ] // Proceedings of the British Machine Vision Conference 2014 . London : British Machine Vision Association , 2014 : 1 - 12 .

SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition [EB/OL ] . ( 2014-09-04 )[ 2015-04-10 ] . https://arxiv.org/abs/1409.1556 https://arxiv.org/abs/1409.1556 .

HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 770 - 778 .

JIANG Q Y , LI W J . Deep cross-modal hashing [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 3270 - 3278 .

YANG E K , DENG C , LIU W , et al . Pairwise relationship guided deep hashing for cross-modal retrieval [C ] // Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence . New York : ACM , 2017 : 1618 - 1625 .

LI C , DENG C , LI N , et al . Self-supervised adversarial hashing networks for cross-modal retrieval [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 4242 - 4251 .

GU W , GU X Y , GU J Z , et al . Adversary guided asymmetric hashing for cross-modal retrieval [C ] // Proceedings of the 2019 on International Conference on Multimedia Retrieval . New York : ACM , 2019 : 159 - 167 .

LIN Q B , CAO W M , HE Z H , et al . Semantic deep cross-modal hashing [J ] . Neurocomputing , 2020 , 396 : 113 - 122 .

LIN Q B , CAO W M , HE Z Q , et al . Mask cross-modal hashing networks [J ] . IEEE Transactions on Multimedia , 2020 , 23 : 550 - 558 .

YAO H L , ZHAN Y W , CHEN Z D , et al . TEACH: Attention-aware deep cross-modal hashing [C ] // Proceedings of the 2021 International Conference on Multimedia Retrieval . New York : ACM , 2021 : 376 - 384 .

YU E , MA J H , SUN J D , et al . Deep discrete cross-modal hashing with multiple supervision [J ] . Neurocomputing , 2022 , 486 : 215 - 224 .

GAO Z J , WANG J , YU G X , et al . Long-tail cross modal hashing [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2023 , 37 ( 6 ): 7642 - 7650 .

LI H X , ZHANG C , JIA X Y , et al . Adaptive label correlation based asymmetric discrete hashing for cross-modal retrieval [J ] . IEEE Transactions on Knowledge and Data Engineering , 2023 , 35 ( 2 ): 1185 - 1199 .

ZHOU J L , DING G G , GUO Y C , et al . Latent semantic sparse hashing for cross-modal similarity search [C ] // Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval . New York : ACM , 2014 : 415 - 424 .

XU X , SHEN F M , YANG Y , et al . Learning discriminative binary codes for large-scale cross-modal retrieval [J ] . IEEE Transactions on Image Processing , 2017 , 26 ( 5 ): 2494 - 2507 .

IRIE G , ARAI H , TANIGUCHI Y . Alternating co-quantization for cross-modal hashing [C ] // 2015 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2015 : 1886 - 1894 .

LIN Z J , DING G G , HAN J G , et al . Cross-view retrieval via probability-based semantics-preserving hashing [J ] . IEEE Transactions on Cybernetics , 2017 , 47 ( 12 ): 4342 - 4355 .

KRIZHEVSKY A , SUTSKEVER I , HINTON G E . Imagenet classification with deep convolutional neural networks [C ] // Advances in Neural Information Processing Systems . Massachusetts : MIT Press , 2012 : 1097 - 1105 .

ESCALANTE H J , HERNÁNDEZ C A , GONZALEZ J A , et al . The segmented and annotated IAPR TC-12 benchmark [J ] . Computer Vision and Image Understanding , 2010 , 114 ( 4 ): 419 - 428 .

HUISKES M J , LEW M S , HUISKES M J , et al . The MIR flickr retrieval evaluation [C ] // Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval . New York : ACM , 2008 : 39 - 43 .

CHUA T S , TANG J H , HONG R C , et al . NUS-WIDE: A real-world web image database from National University of Singapore [C ] // Proceedings of the ACM International Conference on Image and Video Retrieval . New York : ACM , 2009 : 1 - 9 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Binary Code Similarity Detection Method Based on Cross-Modal Coordinated Representation Learning

Scene Graph Generation of Livestreaming Video via VLM Convex Optimization

Synthesis Design and Experimental Study of a High-Selectivity IPD Bandpass Filter Based on Coupled Triangular Topology

A Compact Microstrip Second-Order Frequency-Variant Coupled Bandpass Filter with Simplified Composite Right-/Left-Handed Zeroth-Order Resonator

Research on Radio Signal Modulation Recognition Based on Hyperbolic State Space Model

Related Author

YANG Hong-yu

WANG Yun-long

HU Ze

CHENG Xiang

YANG Hong-yu

WANG Yun-long

HU Ze

CHENG Xiang

Related Institution

School of Safety Science and Engineering, Civil Aviation University of China

School of Computer Science and Technology, Civil Aviation University of China

School of Information Engineering, Yangzhou University

Key Laboratory of Civil Aviation Flight Networking, Civil Aviation University of China

School of Safety Science and Engineering, Civil Aviation University of China, Tianjin 3003000

⁰