电子学报 ›› 2016, Vol. 44 ›› Issue (11): 2780-2787.DOI: 10.3969/j.issn.0372-2112.2016.11.030

• 学术论文 • 上一篇    下一篇

基于迁移学习的唐诗宋词情感分析

吴斌, 吉佳, 孟琳, 石川, 赵惠东, 李仪清   

  1. 北京邮电大学智能通信软件与多媒体北京市重点实验室, 北京 100876
  • 收稿日期:2015-02-13 修回日期:2015-07-01 出版日期:2016-11-25 发布日期:2016-11-25
  • 作者简介:吴斌,男,1969年生,湖南长沙人,教授、博士生导师.2002年中国科学院计算技术研究所博士毕业.主要从事复杂网络、数据挖掘、海量数据并行处理、可视分析、电信客户关系管理等方面的研究工作.E-mail:wubin@bupt.edu.cn;吉佳,女,1989年生,辽宁鞍山人,北京邮电大学硕士研究生.主要研究领域为数据挖掘与物联网大数据;孟琳,女,1993年生,山东莱芜人,2015年在北京邮电大学获学士学位,现为北京邮电大学计算机学院硕士研究生.主要研究领域为数据挖掘.
  • 基金资助:

    国家973重点基础研究发展计划(No.2013CB329606);国家自然科学基金(No.71231002,No.61375058)

Transfer Learning Based Sentiment Analysis for Poetry of the Tang Dynasty and Song Dynasty

WU Bin, JI Jia, MENG Lin, SHI Chuan, ZHAO Hui-dong, LI Yi-qing   

  1. Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2015-02-13 Revised:2015-07-01 Online:2016-11-25 Published:2016-11-25

摘要:

随着计算社会学的兴起,利用数据挖掘分析社会情感是近期的研究重点.当前的研究主要针对现代文本,对于古代诗歌这类短文本的情感分析相对较少.本文提出了一个基于短文本特征扩展的迁移学习模型CATL-PCO,通过分析诗歌情感对当时社会及文化进行进一步了解.该模型首先基于频繁词对对古文特征向量进行扩展,再通过迁移学习方式,建立三个分类器并投票得出最后的情感分析结果.CATL-PCO模型首先能够解决古文短文本特征稀疏的问题,在此基础上进一步解决由于现代译文信息匮乏所导致的古代诗歌情感分析困难问题,从而准确的分析古诗词情感倾向,从计算社会学的角度,增进对中国历史的认识.实验表明,当训练集为中国唐诗时,本文提出方法能够准确的对唐代诗歌进行情感分类,并能应用于唐代和宋代各个时期情感分析及代表流派分析.

关键词: 情感分析, 社会计算学, 唐诗宋词, 迁移学习

Abstract:

With the rise of computational social science,analyzing social sentiment with data mining methods has attracted widespread attention and has become a hot spot in recent years.Existing researches of sentiment analysis mainly focus on modern text,but hardly involve the ancient short text literature.This paper proposes a short text feature extension based transfer learning model CATL-PCO(Correlation Analysis Transfer Learning-Probability Co-occurrence).Through sentiments analysis in ancient literature,this paper can discovery social and cultural development in the ancient era.CATL-PCO expands the ancient literature feature vector based on the frequent word pairs,and utilizes transfer learning method to train three sentiment classifiers.CATL-PCO solves the problem of sparsity of short text feature vector,and the scarcity of modern translation,which improves the cognition of Chinese History.Experiments demonstrate the effectiveness of the proposed method on the dataset of Chinese poems in Tang Dynasty.Moreover,different periods of Tang and Song Dynasty,and different genres are analyzed in this paper in details.

Key words: sentiment analysis, computational social science, poetries of the Tang dynasty and Song dynasty, transfer learning

中图分类号: