电子学报 ›› 2018, Vol. 46 ›› Issue (12): 3029-3036.DOI: 10.3969/j.issn.0372-2112.2018.12.029

• 学术论文 • 上一篇    下一篇

基于完全加权正负关联模式挖掘的越-英跨语言查询译后扩展

黄名选1,2, 蒋曹清1,2   

  1. 1. 广西跨境电商智能信息处理重点实验室培育基地(广西财经学院), 广西南宁 530003;
    2. 广西财经学院信息与统计学院, 广西南宁 530003
  • 收稿日期:2017-09-07 修回日期:2018-05-28 出版日期:2018-12-25
    • 作者简介:
    • 黄名选 男,1966年出生于广西乐业县,工学硕士,现为广西财经学院计算机系教授,主要研究方向为数据挖掘、信息检索、机器学习,主持国家自然科学基金项目2项,主持完成广西自然科学基金项目1项,主持广西教育厅科研项目3项,获2011年广西高校优秀人才资助计划项目1项,参与完成国家自然科学基金项目1项,发表学术论文60余篇,其中,中文核心期刊论文40余篇,被期刊EI收录4篇,ISTP收录1篇,授权的发明专利9件.E-mail:mingxh05@163.com;蒋曹清 男,1973年出生于湖南省永州市,博士,现为广西财经学院教授,主要研究方向为形式化方法,程序分析,数据挖掘.Email:jcqng@163.com
    • 基金资助:
    • 国家自然科学基金 (No.61762006,No.61662003,No.61262028)

Vietnamese-English Cross Language Query Post-Translation Expansion Based on All-Weighted Positive and Negative Association Patterns Mining

HUANG Ming-xuan1,2, JIANG Cao-qing1,2   

  1. 1.Guangxi Key Laboratory Cultivation Base of Cross-border E-commerce Intelligent Information Processing, Guangxi University of Finance and Economics, Nanning, Guangxi 530003, China;
    2.School of Information and Statistics, Guangxi University of Finance and Economics, Nanning, Guangxi 530003, China
  • Received:2017-09-07 Revised:2018-05-28 Online:2018-12-25 Published:2018-12-25
    • Supported by:
    • National Natural Science Foundation of China (No.61762006, No.61662003, No.61262028)

摘要: 主题漂移和词不匹配是自然语言处理中一个难题,文本挖掘与信息检索的结合有助于解决该问题.鉴于此,本文提出一种基于完全加权正负关联模式挖掘的越-英跨语言查询译后扩展算法.该算法采用新的完全加权正负项集支持度和关联度计算方法以及模式评价框架,对初检用户相关反馈文档集挖掘与原查询词相关的正负关联模式,从模式中提取扩展词实现跨语言查询译后扩展.与现有基于伪相关反馈、加权关联模式挖掘的跨语言扩展算法比较,本文算法能有效地减少查询主题漂移和词不匹配问题,提高跨语言信息检索性能;本文模式挖掘方法可用于推荐系统,提高其准确性.

关键词: 自然语言处理, 信息检索, 文本挖掘, 模式挖掘, 查询扩展, 推荐系统

Abstract: Topic drift and word mismatch are a difficult problem in natural language processing. The combination of text mining and information retrieval can help to solve the problem. In view of this, this paper proposes an algorithm of Vietnamese-English cross language (VECL) query post-translation expansion based on all-weighted positive and negative association pattern mining. The algorithm utilized a computing method of support and correlation degree of all-weighted positive and negative itemset, and mined the all-weighted positive and negative association pattern related to the original query by the pattern evaluation framework in the user relevance feedback document set from the VECL first retrieval results. The expansion terms were extracted from the patterns in order to carry out VECL query post-translation expansion. A comparison between the proposed algorithm and the existing cross language query expansion algorithms based on pseudo relevance feedback and weighted association pattern mining is made, which shows that the former can effectively reduce the problems of query topic drift and word mismatch, and improve the performance of cross language information retrieval. And moreover, the method of pattern mining in this paper can be used in recommender systems and improve its accuracy.

Key words: natural language processing, information retrieval, text mining, pattern mining, query expansion, recommender system

中图分类号: