National Natural Science Foundation of China (No.61762078, No.61363058, No.61663004, No.61966004, No.61762079);Research Project of Guangxi Key Laboratory of Trusted Software (No.kx202003);Guangxi Key Laboratory Foundation for Multi-source Information Mining and Security (No.MIMS18-08);2019 Key Project of Young Teachers Research Ability Enhancement Program of Northwest Normal University (No.NWNU-LKQN2019-2)
TUO Ting, MA Hui-fang, LI Zhi-xin, et al. Effectively Classifying Short Texts by Entropy Weighted Constraints Sparse Representation[J]. Acta Electronica Sinica, 2020, 48(11): 2131-2137.
DOI:
TUO Ting, MA Hui-fang, LI Zhi-xin, et al. Effectively Classifying Short Texts by Entropy Weighted Constraints Sparse Representation[J]. Acta Electronica Sinica, 2020, 48(11): 2131-2137. DOI: 10.3969/j.issn.0372-2112.2020.11.006.
Effectively Classifying Short Texts by Entropy Weighted Constraints Sparse Representation
Aiming at the problem of short text feature sparsity
a short text sparse representation classification method based on entropy weighted constraint is proposed. Considering that the initial dictionary dimension is high
firstly
the word in the dictionary is represented as a word vector form via using the Word2vec tool
and then the original dictionary is reduced according to the average weighted vectors. Secondly
a fast feature subset selection algorithm is adopted to remove the irrelevant and redundant short texts in the dictionary
and the filtered dictionary can then be obtained. Thirdly
based on the sparse representation theory
an improved entropy-weighted sparse representation method is designed for the objective function
and the Lagrange multiplier method is introduced to obtain the optimal value of the objective function
thus the subspace of each class is obtained. Finally
the distance between the short text to be classified and the short text in each class is calculated under the subspace
and the short text is classified according to three classification rules. A large number of experimental results on real data sets show that the proposed method can effectively alleviate the short text feature sparse problem and exhibits better performance than the existing short text classification methods.