电子学报 ›› 2019, Vol. 47 ›› Issue (5): 1121-1128.DOI: 10.3969/j.issn.0372-2112.2019.05.020

• 学术论文 • 上一篇    下一篇

一般间隙序列模式挖掘的关键词抽取

刘慧婷1,2, 刘志中1,2, 王利利1,2, 吴信东3,4   

  1. 1. 安徽大学计算智能与信号处理教育部重点实验室, 安徽合肥 230601;
    2. 安徽大学计算机科学与技术学院, 安徽合肥 230601;
    3. 合肥工业大学计算机与信息学院, 安徽合肥 230601;
    4. School of Computing and Informatics, University of Louisiana at Lafayette, Lafayette 70503
  • 收稿日期:2018-03-05 修回日期:2018-08-03 出版日期:2019-05-25 发布日期:2019-05-25
  • 作者简介:刘慧婷 女,1978年出生,安徽阜阳人,博士,副教授,CCF专业会员,主要研究领域为数据挖掘、机器学习.E-mail:htliu@ahu.edu.cn;刘志中 男,1990年出生,硕士,主要研究领域为数据挖掘.
  • 基金资助:
    国家重点研发计划(No.2016YFB1000901);国家自然科学基金(No.61202227);安徽高校自然科学研究项目(No.KJ2018A0013)

Keyphrase Extraction Using Sequential Patterns Mining Algorithm with One-Off and General Gaps Condition

LIU Hui-ting1,2, LIU Zhi-zhong1,2, WANG Li-li1,2, WU Xin-dong3,4   

  1. 1. Key Laboratory of Intelligent Computing and Signal Processing of the Ministry of Education, Anhui University, Hefei, Anhui 230601, China;
    2. School of Computer Science and Technology, Anhui University, Hefei, Anhui 230601, China;
    3. School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, Anhui 230601, China;
    4. School of Computing and Informatics, University of Louisiana at Lafayette, Lafayette 70503, USA
  • Received:2018-03-05 Revised:2018-08-03 Online:2019-05-25 Published:2019-05-25

摘要: 本文提出了有监督的关键词抽取算法——KEING(Keyphrase Extraction using sequentIal patterns with oNe-off and General gaps condition)算法.首先,将每篇文档作为一个序列库,利用SPING(Sequential Patterns mIning with oNe-off and General gaps condition)算法获取词语之间的关系及其多种变化形式,并利用统计模式特征的方式描述候选关键词;然后,通过朴素贝叶斯分类算法对大量带标记的训练数据进行训练,构造分类器;最后利用分类器从测试文档中识别出关键词.通过实验验证了SPING算法的完备性以及KEING算法的有效性.

关键词: 一般间隙, 模式挖掘, 关键词抽取, 机器学习

Abstract: Keyphrases are used to summarize the document and high-quality keyphrases have great importance in text summarizing,reading and indexing.However,most studies of keyphrase extraction have strict limitation in the form of patterns,and are unable to achieve the semantic relation between words and phrases.The results are failure to autonomously extract keyphrases.Keyphrase extraction using sequential patterns mining with one-off and general gaps condition algorithm (KEING) is proposed in this paper.Taking into account one off condition and general gaps,SPING(Sequential Patterns mIning with oNe-off and General gaps condition)can catch semantic relations between words and phrases more effectively.Therefore,KEING will get effective candidate keyphrases and count their features.Then a supervised machine learning method is used to train features and construct a classification model,we can extract keyphrase with this model.Experimental results demonstrate KEING can effectively extract high quality keyphrases.

Key words: general gap, sequential patterns mining, keyphrase extraction, machine learning

中图分类号: