Youth Fund of National Natural Science Foundation of China (No.61402280);Computer Science and Technology Preponderant Disciplines of Shanghai DianJi University (No.16YSXK04);Scientific Research Project of Shanghai DianJi University (No.B1-0227-16-032-031)
提出基于短语参数学习的主题模型TMPP(Topic Model based on Phrase Parameter)对在线评论中被评价实体的aspect和与之对应的rating进行抽取.TMPP具有三个特点:1)评论用短语袋表示;2)将标准的LDA中表示文档-主题的参数扩展为(aspect,rating)集;3)融合了先验知识.介绍了TMPP模型参数的物理含义、模型的生成过程以及先验知识的获取和表示方法;阐述了在TMPP模型中引入方面集聚类使用先验知识的原因与好处、TMPP模型提取(方面,等级)对形成(aspect,rating)摘要的原理.以真实的在线产品评论数据集为实验对象,在实验过程中引入先验知识的方面识别分析和等级预测精度分析,列出了五类产品相关方面和对立的情感词的实验结果.通过与已有的基线方法比较,实验表明若评论集中每篇评论有一个总体等级,TMPP能产生高质量的(aspect,rating)摘要.
Abstract
This paper proposes a topic model TMPP (Topic Model based on Phrase Parameter)
which can extract the aspects and associated with their ratings for the evaluated entities in online reviews.TMPP has three characterisitcs:(1)It assumes the review is represented as a bag-of-phrase.(2)It extends the document-topic parameter from the standard LDA as a set of (aspect
rating).(3)It incorporates the prior knowledge.We introduce the physical meaning of each parameter for the TMPP
the generative process for the TMPP and the representation of the prior knowledge.Furthermore
the reason and advantage of incorporating the aspect cluster into the TMPP are presented; the mechanism of obtaining the (aspect
rating) is also given by extracting the aspects and associated with their ratings from the online product reviews.We conduct extensive experiments on a very large real life dataset from taobao.com and find that TMPP can produce high quality (aspect
rating) summarization if each review has an overall rating by comparing the performance between existing baseline models and TMPP.