电子学报 ›› 2022, Vol. 50 ›› Issue (4): 909-920.DOI: 10.12263/DZXB.20210760

所属专题: 机器学习交叉融合创新

• 机器学习交叉融合创新 • 上一篇    下一篇

基于松弛Hadamard矩阵的多模态融合哈希方法

庾骏1, 黄伟1, 张晓波1, 尹贺峰2()   

  1. 1.郑州轻工业大学计算机与通信工程学院,河南 郑州 450000
    2.江南大学计算机与人工智能学院,江苏 无锡 214000
  • 收稿日期:2021-06-16 修回日期:2022-02-12 出版日期:2022-04-25
    • 作者简介:
    • 庾 骏 男,1990年生,湖南邵阳人.工学博士.现在郑州轻工业大学计算机与通信工程学院从事科研和教学工作.研究领域为多媒体分析和检索、深度学习、模式识别.E-mail: yujun@zzuli.edu.cn
      黄 伟 男,1982年生,河南郑州人.工学博士.现为郑州轻工业大学计算机与通信工程学院副教授.研究领域为模式识别、深度学习.E-mail: hnhw235@163.com
      尹贺峰 男,1989年生,江苏无锡人.工学博士.现为江南大学计算机与人工智能学院博士后.研究领域为模式识别、机器学习.E-mail: yin_hefeng@jiangnan.edu.cn
    • 基金资助:
    • 河南省科技攻关计划项目 (222102210064); 郑州轻工业大学博士科研启动基金 (2021BSJJ025); 国家自然科学基金 (61902361)

Multimodal Fusion Hash Learning Method Based on Relaxed Hadamard Matrix

YU Jun1, HUANG Wei1, ZHANG Xiao-bo1, YIN He-feng2()   

  1. 1.The College of Computer and Communication Engineering, Zhengzhou University of Light Industry, zhengzhou, Henan 450000, China
    2.The School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu 214000, China
  • Received:2021-06-16 Revised:2022-02-12 Online:2022-04-25 Published:2022-04-25
    • Supported by:
    • Science and Technology Research and Development Program of Henan Province (222102210064); Doctoral Research Fund for Zhengzhou University of Light Industry (2021BSJJ025); National Natural Science Foundation of China (61902361)

摘要:

哈希作为一种有效的数据表征技术,已经在应对爆炸式增长的多媒体数据中扮演了重要的角色.它由于低存储和高效率的优势,在多媒体检索领域受到了越来越多的关注.目前多模态哈希学习方法在多媒体检索任务中得到了较好的研究和发展.然而,多数的方法通过编码特征的内积重构成对相似度来保持原始数据的结构信息,但是带来较复杂的优化问题.此外一些模型缺乏判别性使得检索性能的提升受到限制.为了克服上述问题,本文提出一种新型的多模态融合哈希方法,在类别信息的监督下利用Hadamard矩阵为数据生成目标编码,通过松弛严格的二值约束增大类间的间隔,同时采用图嵌入的方式促进类内的紧凑性.本文提出的方法既保证了模型具有很好的判别能力也简化了优化过程.在3个公开数据集上的实验结果表明,本文提出的方法在多媒体数据检索中是非常有效的,平均性能上相比最优的对比方法提高了8.47%.

关键词: 哈希学习, 多模态融合, Hadamard矩阵, 多媒体检索, 哈希中心

Abstract:

Hashing, as an effective data representation technology, has played an important role in dealing with the explosive growth of multimedia data. Due to the advantages of its low storage and high efficiency, it has received more and more attention in the field of multimedia retrieval. At present, multi-modal hashing methods have been well researched and developed in multimedia retrieval tasks. However, most of these methods usually use the inner product of hashing features to reconstruct larger pairwise similarity, aiming to preserve the structural information of the original data, which will bring more complex optimization problems. Besides, some models lack discriminant ability, which leads to limitations in the improvement of retrieval performance. In order to overcome the above-mentioned problems, this paper proposes a new multi-modal fusion hashing method. Under the supervision of category information, Hadamard matrix is ??used to generate target codes for data, and the margin between categories is increased by relaxing strict binary constraints. At the same time, the graph embedding approach is used to promote compactness within the class. The proposed method in this paper not only ensures the strong discriminative ability of the model, but also simplifies the optimization process. The experimental results on three public datasets show that the method proposed in this paper is very effective in multimedia data retrieval, and the average performance is 8.47% higher than that of the optimal comparison method.

Key words: hash learning, multimodal fusion, Hadamard matrix, multimedia retrieval, hash centers

中图分类号: