• 学术论文 •

熵权约束稀疏表示的短文本分类算法

1. 1. 西北师范大学计算机科学与工程学院, 甘肃兰州 730070;
2. 桂林电子科技大学广西可信软件重点实验室, 广西桂林 541004;
3. 广西师范大学广西多源信息挖掘与安全重点实验室, 广西桂林 541004;
4. 华中师范大学计算机学院, 湖北武汉 430079
• 收稿日期:2018-07-30 修回日期:2020-07-01 出版日期:2020-11-25
• 通讯作者:
• 马慧芳
• 作者简介:
• 脱婷 女,1990年9月出生,甘肃庆阳人.自2016年进入西北师范大学计算机科学与工程学院学习,现为硕士研究生,主要从事自然语言处理与分类算法方面研究.E-mail:nwnutuot@yeah.net;李志欣 男,1971年10月出生,广西桂林人.博士,博士生导师.现为广西师范大学计算机科学与信息工程学院教授,主要从事图像理解与机器学习等方面的研究.E-mail:lizx@gxnu.edu.cn;赵卫中 男,1981年10月出生,山东菏泽人.博士,硕士生导师,现为华中师范大学计算机学院副教授,主要从事机器学习与数据挖掘等方面研究工作.E-mail:zhaoweizhong@gmail.com
• 基金资助:
• 国家自然科学基金 (No.61762078，No.61363058，No.61663004，No.61966004，No.61762079）; 广西可信软件重点实验室研究课 (No.kx202003）; 广西多源信息挖掘与安全重点实验室开放基金 (No.MIMS18-08）; 西北师范大学2019年度青年教师科研能力提升计划 (No.NWNU-LKQN2019-2）

Effectively Classifying Short Texts by Entropy Weighted Constraints Sparse Representation

TUO Ting1, MA Hui-fang1,2,3, LI Zhi-xin3, ZHAO Wei-zhong4

1. 1. College of Computer Science and Engineering, Northwest Normal University, Lanzhou, Gansu 730070, China;
2. Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, Guangxi 541004, China;
3. Guangxi Key Lab of Multi-Source Information Mining and Security, Guangxi Normal University, Guilin, Guangxi 541004, China;
4. School of Computer Central China Normal University, Wuhan, Hubei 430079, China
• Received:2018-07-30 Revised:2020-07-01 Online:2020-11-25 Published:2020-11-25

Abstract: Aiming at the problem of short text feature sparsity,a short text sparse representation classification method based on entropy weighted constraint is proposed.Considering that the initial dictionary dimension is high,firstly,the word in the dictionary is represented as a word vector form via using the Word2vec tool,and then the original dictionary is reduced according to the average weighted vectors.Secondly,a fast feature subset selection algorithm is adopted to remove the irrelevant and redundant short texts in the dictionary,and the filtered dictionary can then be obtained.Thirdly,based on the sparse representation theory,an improved entropy-weighted sparse representation method is designed for the objective function,and the Lagrange multiplier method is introduced to obtain the optimal value of the objective function,thus the subspace of each class is obtained.Finally,the distance between the short text to be classified and the short text in each class is calculated under the subspace,and the short text is classified according to three classification rules.A large number of experimental results on real data sets show that the proposed method can effectively alleviate the short text feature sparse problem and exhibits better performance than the existing short text classification methods.