电子学报 ›› 2019, Vol. 47 ›› Issue (1): 49-58.DOI: 10.3969/j.issn.0372-2112.2019.01.007

• 学术论文 • 上一篇    下一篇

行为识别中一种基于融合特征的改进VLAD编码方法

罗会兰, 王婵娟   

  1. 江西理工大学信息工程学院, 江西赣州 341000
  • 收稿日期:2017-10-18 修回日期:2018-02-12 出版日期:2019-01-25 发布日期:2019-01-25
  • 作者简介:罗会兰 女.1974年9月生,江西上高人.2008年浙江大学获工学博士学位.现为江西理工大学图像处理实验室教授、硕士生导师.主要从事机器学习、模式识别等方面的研究.E-mail:luohuilan@sina.com;王婵娟 女.1992年5月生,江西鄱阳人.2015年进入江西理工大学,在读硕士研究生.主要从事计算机视觉、机器学习技术方面的有关研究.E-mail:909748120@qq.com
  • 基金资助:
    国家自然科学基金(No.61862031,No.61462035);江西省自然科学基金"视觉特征表达的自我深度学习模型研究"(No.20171BAB202014)

An Improved VLAD Coding Method Based on Fusion Feature in Action Recognition

LUO Hui-lan, WANG Chan-juan   

  1. School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000, China
  • Received:2017-10-18 Revised:2018-02-12 Online:2019-01-25 Published:2019-01-25

摘要: 本文提出了一种新的基于融合特征的改进VLAD(Vector of Locally Aggregated Descriptors)编码方法,该方法命名为IVLAD(Improved Vector of Locally Aggregated Descriptors),将其应用于行为识别算法中,得到了较好的性能提升.针对单一特征描述符在描述视频空间信息的不足,提出将位置信息映射到特征空间中进行融合编码得到表示向量.在编码阶段为了克服传统VLAD方法只考虑特征与聚类中心距离的不足,提出在其基础之上另外计算每个聚类中心与其最相似特征的差值.为了进一步提高识别准确度,本文还提出对表征向量自身串联用以升维.另外本文还研究了不同词典大小及归一化方法对于识别算法的影响.在两个大型数据库UCF101及HMDB51上的实验比较表明,本文提出的方法比传统VLAD方法具有较大的性能提升.

关键词: 行为识别, 位置信息, 级联, 表示向量

Abstract: A novel coding method IVLAD (Improved Vector of Locally Aggregated Descriptors) based on the fusion of features was proposed in this paper.It obtained good performance in behavior recognition.In order to solve the problem that single feature descriptor cannot express space information well,location information was mapped into feature space and then jointly coded to get the video expression vector.In order to avoid the deficiency of the traditional VLAD methods which only consider the distances of features and clustering centers,the distance between each cluster and its most similar feature was also used in the coding stage.Finally concatenating the video expression vector with itself was proposed to raise the dimension of vectors to further improve the recognition accuracy.Furthermore,the influences of the visual dictionary size,the location dictionary size and the normalization method on the recognition accuracy were studied.The experimental results on two large databases UCF101 and HMDB51 have shown that the proposed method had better performance than the traditional VLAD method.

Key words: action recognition, position information, concatenate, expression vector

中图分类号: