电子学报 ›› 2016, Vol. 44 ›› Issue (10): 2459-2465.DOI: 10.3969/j.issn.0372-2112.2016.10.025

• 学术论文 • 上一篇    下一篇

基于级联模型的中文情感要素抽取

王亚珅, 黄河燕, 冯冲, 刘全超   

  1. 北京理工大学计算机学院北京市海量语言信息处理与云计算应用工程技术研究中心, 北京 100081
  • 收稿日期:2015-02-11 修回日期:2015-06-26 出版日期:2016-10-25 发布日期:2016-10-25
  • 通讯作者: 黄河燕
  • 作者简介:王亚珅,男,1989年出生,北京理工大学计算机科学与技术专业博士研究生,主要研究领域为社交网络分析和信息检索.E-mail:yswang@bit.edu.cn
  • 基金资助:

    国家重点基础研究发展计划(973计划)资助项目(No.2013CB329605,No.2013CB329303);国家自然科学基金(No.61132009,No.61201351)

Chinese Evaluation Element Extraction Based on Cascaded Model

WANG Ya-shen, HUANG He-yan, FENG-Chong, LIU Quan-chao   

  1. Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications, School of Computer, Beijing Institute of Technology, Beijing 100081, China
  • Received:2015-02-11 Revised:2015-06-26 Online:2016-10-25 Published:2016-10-25

摘要:

随着社交媒体的发展及成熟,每天在互联网环境中都会产生大量的用户评论信息.抽取评价短语、评价对象和观点持有者等情感要素,已经成为了中文观点挖掘和情感分析的重要先决任务.针对中文情感要素抽取任务,本文提出了一个统计和规则相结合的级联模型,主要贡献包括:(1)针对汽车领域评论信息,构建情感要素标注语料库和相关词典;(2)对于以往研究较少关注的中文评价短语,本文详细分析阐述其定义和分类;(3)结合统计和规则,分别针对评价短语和情感要素提出级联抽取策略.实验结果充分证明了该级联模型的有效性,相比较于其它基于规则的情感要素抽取算法有效提升了召回率,同时为后续社交媒体情感分析任务提供了有力的支持.

关键词: 信息抽取, 情感要素, 评价短语, 评价对象, 观点持有者

Abstract:

With the development of social media,massive reviews are generated by users every day.The extraction of evaluation elements,including evaluation phrase,comment target and opinion holder,is an important pre-task of Chinese opinion mining and sentiment analysis.This paper proposes an efficient method for extracting Chinese evaluation elements based on cascaded model and mainly makes three contributions:(i) to implement and evaluate the method,we construct an original annotated corpus for Chinese evaluation elements of automobile;(ii) we provide specific definition and classification of Chines evaluation phrase;(iii) combing statistic method and rule-based method,we present cascaded strategy for extraction of evaluation phrase and evaluation elements,respectively.According to the experiment results,the proposed method performs well,and effectively improve the recall compared with other rule-based algorithm.Meanwhile it contributes greatly to our subsequent tasks,such as sentiment analysis of social media.

Key words: information extraction, evaluation element, evaluation phrase, comment target, opinion holder

中图分类号: