电子学报 ›› 2016, Vol. 44 ›› Issue (12): 3036-3043.DOI: 10.3969/j.issn.0372-2112.2016.12.032

• 学术论文 • 上一篇    下一篇

基于主题模型的(Aspect,Rating)摘要生成方法研究

吕品, 汪鑫, 罗宜元, 计春雷   

  1. 上海电机学院电子信息学院, 上海 201306
  • 收稿日期:2014-12-24 修回日期:2016-08-22 出版日期:2016-12-25
    • 作者简介:
    • 吕品,女,1973年3月出生,湖北鄂州人,现为上海电机学院副教授、博士,研究方向为数据挖掘、观点挖掘与情感分析.E-mail:lvp@sdju.edu.cn;汪鑫,男,1978年3月出生,安徽黟县人,现为上海电机学院讲师、硕士,研究方向为数据挖掘、云计算.E-mail:wangx@sdju.edu.cn;罗宜元,男,1986年9月出生,河南信阳人,现为上海电机学院讲师、博士,研究方向为密码学与计算机安全.E-mail:luoyy@sdju.edu.cn;计春雷,男,1964年1月出生,上海人,现为上海电机学院教授、博士、硕士生导师,研究方向为大数据、数据挖掘.E-mail:jicl@sdju.edu.cn
    • 基金资助:
    • 国家自然科学基金青年基金 (No.61402280); 上海电机学院计算机科学与技术优势学科 (No.16YSXK04); 上海电机学院科研计划项目 (No.B1-0227-16-032-031)

(Aspect,Rating) Summarization Based on Topic Model

LÜ Pin, WANG Xing, LUO Yi-yuan, JI Chun-lei   

  1. School of Electronic and Information, Shanghai Dianji University, Shanghai 201306, China
  • Received:2014-12-24 Revised:2016-08-22 Online:2016-12-25 Published:2016-12-25
    • Supported by:
    • Youth Fund of National Natural Science Foundation of China (No.61402280); Computer Science and Technology Preponderant Disciplines of Shanghai DianJi University (No.16YSXK04); Scientific Research Project of Shanghai DianJi University (No.B1-0227-16-032-031)

摘要:

提出基于短语参数学习的主题模型TMPP(Topic Model based on Phrase Parameter)对在线评论中被评价实体的aspect和与之对应的rating进行抽取.TMPP具有三个特点:1)评论用“短语袋”表示;2)将标准的LDA中表示文档-主题的参数扩展为(aspect,rating)集;3)融合了先验知识.介绍了TMPP模型参数的物理含义、模型的生成过程以及先验知识的获取和表示方法;阐述了在TMPP模型中引入方面集聚类使用先验知识的原因与好处、TMPP模型提取(方面,等级)对形成(aspect,rating)摘要的原理.以真实的在线产品评论数据集为实验对象,在实验过程中引入先验知识的方面识别分析和等级预测精度分析,列出了五类产品相关方面和对立的情感词的实验结果.通过与已有的基线方法比较,实验表明若评论集中每篇评论有一个总体等级,TMPP能产生高质量的(aspect,rating)摘要.

关键词: 主题模型, (aspect,rating)摘要, 短语袋, TMPP

Abstract:

This paper proposes a topic model TMPP (Topic Model based on Phrase Parameter),which can extract the aspects and associated with their ratings for the evaluated entities in online reviews.TMPP has three characterisitcs:(1)It assumes the review is represented as a bag-of-phrase.(2)It extends the document-topic parameter from the standard LDA as a set of (aspect,rating).(3)It incorporates the prior knowledge.We introduce the physical meaning of each parameter for the TMPP,the generative process for the TMPP and the representation of the prior knowledge.Furthermore,the reason and advantage of incorporating the aspect cluster into the TMPP are presented; the mechanism of obtaining the (aspect,rating) is also given by extracting the aspects and associated with their ratings from the online product reviews.We conduct extensive experiments on a very large real life dataset from taobao.com and find that TMPP can produce high quality (aspect,rating) summarization if each review has an overall rating by comparing the performance between existing baseline models and TMPP.

Key words: topic model, (aspect,rating) summarization, bag-of-phrase, topic model based on phrase parameter(TMPP)

中图分类号: