电子学报 ›› 2020, Vol. 48 ›› Issue (2): 272-278.DOI: 10.3969/j.issn.0372-2112.2020.02.007

• 学术论文 • 上一篇    下一篇

WQ:基于权重求解的哈希算法

孙瑶, 钱江波, 辛宇, 谢锡炯, 董一鸿   

  1. 宁波大学信息科学与工程学院, 浙江宁波 315211
  • 收稿日期:2018-05-23 修回日期:2019-10-13 出版日期:2020-02-25 发布日期:2020-02-25
  • 通讯作者: 钱江波
  • 作者简介:孙瑶 女.1991年12月出生,山西汾阳人.宁波大学信息科学与工程学院,硕士研究生,从事大数据技术、机器学习、哈希学习等方面的有关研究.E-mail:sunyao54578@163.com
  • 基金资助:
    浙江省自然科学基金(No.LZ20F020001,No.LY20F020009);国家自然科学基金(No.61472194,No.61572266)

WQ:Hashing Algorithm Based on Bits Weights

SUN Yao, QIAN Jiang-bo, XIN Yu, XIE Xi-jiong, DONG Yi-hong   

  1. Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo, Zhejiang 315211, China
  • Received:2018-05-23 Revised:2019-10-13 Online:2020-02-25 Published:2020-02-25

摘要: 由于最近邻查询算法一般需要较高时间和空间代价,往往不能满足大数据查询的需要.哈希技术可以大幅度减少查询时间和存储空间,其主要思想是将原始空间中的高维数据映射成为一组编码,且满足保相似性原则.现有的大部分哈希方法一般认为哈希编码的各维度权重相同.然而在实际情况中,不同的维度往往携带有不同的信息.为此,本文提出了新的算法,为编码的每个维度分配权重,并提出了对应的量化编码方式.理论证明了算法的可行性,在真实数据集下与其他哈希算法对比实验也验证了该算法的有效性.

关键词: 近似最近邻查找, 学习哈希, 加权哈希, 高维数据

Abstract: Many nearest neighbor query algorithms often fail to meet the query requirements on big data due to their high time and space cost.Hash query technology can significantly reduce not only query time, but also required storage cost.The main principle is to map the high-dimensional data into a set of binary codes with locality preserved.However, most existing hashing methods do not consider the weight differences between the binary bits when calculating the Hamming distances between those binary codes from data.Generally, different hashing bits may contain different amount of information.Focusing on the above issue, this paper proposes WQ (Weighted Quantization) that will assign different weights for each bit of the binary code, as well as a corresponding quantization method.Experimental results show that WQ algorithm has superior performance of data retrieval compared with several other hashing methods.

Key words: ANN, learning to Hash, weighted Hashing, high dimensional data

中图分类号: