电子学报 ›› 2015, Vol. 43 ›› Issue (2): 333-337.DOI: 10.3969/j.issn.0372-2112.2015.02.019

• 科研通信 • 上一篇    下一篇

基于有监督主题模型的排序学习算法

丁宇新1, 燕泽权1, 冯威1, 薛成龙1, 周迪2   

  1. 1. 哈尔滨工业大学深圳研究生院, 广东深圳 518055;
    2. 计算机体系结构国家重点实验室, 中科院计算所, 北京 100190
  • 收稿日期:2013-10-14 修回日期:2014-05-19 出版日期:2015-02-25 发布日期:2015-02-25
  • 作者简介:丁宇新 男,1972出生,天津人,博士,哈尔滨工业大学深圳研究生院副教授,研究方向为自然语言处理,机器学习. E-mail:yxding@hitsz.edu.cn 燕泽权 男,1989出生,河北唐山人,哈尔滨工业大学深圳研究生院硕士研究生,研究方向为自然语言处理,机器学习. E-mail:saiboyan@163.com
  • 基金资助:

    国家自然科学基金(No.61100192);中科院计算机体系结构国家重点实验室开放基金;哈工大学科研创新基金(No.HIT.NSRIF2010123);哈工大深圳研究生院网络智能计算重点实验室资助

Rank Learning Based on Supervised Topic Model

DING Yu-xin1, YAN Ze-quan1, FENG Wei1, XUE Cheng-long1, ZHOU Di2   

  1. 1. Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China;
    2. State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
  • Received:2013-10-14 Revised:2014-05-19 Online:2015-02-25 Published:2015-02-25

摘要:

文档表示是排序学习的关键,目前的排序学习算法多采用词袋法表示文档与查询,该方法假设词袋中的词相互独立,忽略了词之间的关系.为了表示文档中词之间的依赖关系,本研究利用文档与查询的主题特征构建排序学习模型,我们将排序函数定义为文档与查询之间的主题关系,提出了基于有监督主题模型的排序学习算法自动学习排序函数.为了评价模型的排序精度,我们在三个标准数据集(OHSUMED,MQ2007,MQ2008)上进行了实验.实验表明基于主题的排序学习算法能够发现文档与查询之间内在的语义关联,并改善排序模型的排序精度.

关键词: 排序学习, 机器学习, 关系主题模型, 主题特征

Abstract:

One of the key issues in learning to rank is document representation.In most of the learning to rank algorithms documents and queries are represented as a "bag of words",and words are assumed to occur independently.This kind of document representation ignores relationships between different words.To capture the important relationships between words,we try to learn a ranking model using the topic features of documents and queries.We define the ranking function as the topic relations between a document and a query.A novel rank learning algorithm based on supervised topic model is proposed to learn the ranking function.To evaluate the ranking accuracy of the proposed ranking algorithm,experiments are made on three benchmark datasets for information retrieval,OHSUMED,MQ2007,and MQ2008.The experimental results show that the proposed model can find the semantic relation between a document and a query,and can improve the ranking accuracy.

Key words: rank learning, machine learning, relational topic model, topic feature

中图分类号: