哈尔滨工业大学计算机科学与技术学院,黑龙江,哈尔滨,150001
纸质出版:2007
移动端阅览
刘远超, 王晓龙, 徐志明, 等. 基于粗集理论的中文关键词短语构成规则挖掘[J]. 电子学报, 2007,35(2):371-374.
LIU Yuan-chao, WANG Xiao-long, XU Zhi-ming, et al. Mining Construction Rules of Chinese Keyphrase Based on Rough Set Theory[J]. Acta Electronica Sinica, 2007, 35(2): 371-374.
短语比词信息量更加丰富
更能够体现原文的主题
通常所说的关键词实际上多数为短语形式.然而目前的问题是关键词短语的自动标引缺乏统一的规则指导.本文利用粗集理论在数据泛化和知识约简方面的优势
对人工标注的人民日报关键词短语语料进行了挖掘
从而得到了中文关键词短语的若干构成规则.规则可以用于自动关键词抽取
也可以对手工关键词标引进行指导.实验结果表明获取的规则使关键词自动抽取的性能有较大改善.
Phrase conveys more information than word
and can better represent main topic of one article.Most of keywords we referred to are actually in form of phrases.The problem is that extraction of keyphrase lacks guidance of some general rules.By taking advantage of the ability of rough set theory on data generalization and knowledge reduction
the manually labeled keyphrase corpus which come from People's Daily was mined and some construction rules of Chinese keyphrase has been generated.These rules can be used for automatic keyword extraction
and can also help people manually label keyword.The experimental results are promising:the performance of keyword extraction improved greatly after importing these rules.
0
浏览量
1194
下载量
9
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621