电子学报 ›› 2018, Vol. 46 ›› Issue (3): 607-613.DOI: 10.3969/j.issn.0372-2112.2018.03.014

• 学术论文 • 上一篇    下一篇

基于双词主题模型的半监督实体消歧方法研究

张雄, 陈福才, 黄瑞阳   

  1. 国家数字交换系统工程技术研究中心, 河南郑州 450001
  • 收稿日期:2016-07-11 修回日期:2016-10-24 出版日期:2018-03-25
    • 作者简介:
    • 张雄,男,1992年生于四川万源.现为国家数字交换系统工程技术研究中心硕士研究生.主要研究方向为文本挖掘和信息抽取.E-mail:979644317@qq.com;陈福才,男,1974年生于江西高安.现为国家数字交换系统工程技术研究中心研究员、硕士生导师.主要研究方向为大数据处理.E-mail:13503827650@139.com;黄瑞阳,男,1986年生于福建漳州.现为国家数字交换系统工程技术研究中心助理研究员.主要研究方向为数据挖掘.E-mail:277433109@qq.com
    • 基金资助:
    • 国家自然科学基金 (No.61171108); 国家重点基础研究发展计划 ("973"计划)资金 (No.2012CB315901,No.2012CB315905); 国家科技支撑计划 (No.2014BAH30B01)

Semi-supervised Entity Disambiguation Method Research Based on Biterm Topic Model

ZHANG Xiong, CHEN Fu-cai, HUANG Rui-yang   

  1. National Digital Switching System Engineering and Technological R & D Center, Zhengzhou, Henan 450001, China
  • Received:2016-07-11 Revised:2016-10-24 Online:2018-03-25 Published:2018-03-25
    • Supported by:
    • National Natural Science Foundation of China (No.61171108); Fund of National Key Basic Research Program of China  (973 Program) (No.2012CB315901, No.2012CB315905); National Key Technology Research and Development Program of the Ministry of Science and Technology (No.2014BAH30B01)

摘要: 针对实体上下文信息主题漂移的问题,提出一种基于双词主题模型的实体消歧方法.方法考虑到实体在一定语义环境下具有不同的主题,且在同一文档中同时出现的其他实体在一定程度上能够帮助待消歧实体确定所指代内容,利用命名实体构建双词的思想,将协同实体关系融合到主题模型中,并在此基础上利用维基百科知识库,进行半监督消歧.本文最后在网络文本数据上进行了相关的实验,验证了所提算法的有效性.实验表明该方法有效的提高了实体消歧精度.

关键词: 实体消歧, 维基百科, 双词主题模型

Abstract: Aimed at the problem of theme drift of the entity context information, this paper proposes an entity disambiguation method based on biterm topic model. The proposed method considers that the entity has a different theme in a certain semantic environment and the other entity appearing in the same document at the same time can help the disambiguated entity to determine the referred content to a certain extent. Therefore, using the ideas of named entity constructing double words to incorporate collaborative entity relationship to the topic model, and on this basis, we conduct semi-supervised disambiguation using Wikipedia knowledge base. Finally, this paper conducts some relevant experiments on the web text data, and verifies the effectiveness of the proposed algorithm. The experiments show that the proposed method effectively improve the precision of entity disambiguation.

Key words: entity disambiguation, Wikipedia, biterm topic model

中图分类号: