熵权约束稀疏表示的短文本分类算法

脱婷; 马慧芳; 李志欣; 赵卫中

doi:10.3969/j.issn.0372-2112.2020.11.006

您当前的位置：

首页 >

文章列表页 >

熵权约束稀疏表示的短文本分类算法

学术论文 | 更新时间：2025-12-08

- 熵权约束稀疏表示的短文本分类算法
- Effectively Classifying Short Texts by Entropy Weighted Constraints Sparse Representation
- 电子学报 2020年48卷第11期页码：2131-2137
- 作者机构：
  
  1. 西北师范大学计算机科学与工程学院,甘肃,兰州,730070
  2. 桂林电子科技大学广西可信软件重点实验室,广西,桂林,541004
  3. 广西师范大学广西多源信息挖掘与安全重点实验室,广西,桂林,541004
  4. 华中师范大学计算机学院,湖北,武汉,430079
  5. 西北师范大学计算机科学与工程学院,甘肃,兰州,730070
  6. 桂林电子科技大学广西可信软件重点实验室,广西,桂林,541004
  7. 广西师范大学广西多源信息挖掘与安全重点实验室,广西,桂林,541004
  8. 华中师范大学计算机学院,湖北,武汉,430079
- 作者简介：
- 基金信息：
  
  国家自然科学基金 (No.61762078，No.61363058，No.61663004，No.61966004，No.61762079）;广西可信软件重点实验室研究课 (No.kx202003）;广西多源信息挖掘与安全重点实验室开放基金 (No.MIMS18-08）;西北师范大学2019年度青年教师科研能力提升计划 (No.NWNU-LKQN2019-2）
- DOI：10.3969/j.issn.0372-2112.2020.11.006
  中图分类号： TP393.092
- 网络出版：2020-11-25，
  
  纸质出版：2020
- 稿件说明：
移动端阅览
脱婷, 马慧芳, 李志欣, 等. 熵权约束稀疏表示的短文本分类算法[J]. 电子学报, 2020,48(11):2131-2137.

TUO Ting, MA Hui-fang, LI Zhi-xin, et al. Effectively Classifying Short Texts by Entropy Weighted Constraints Sparse Representation[J]. Acta Electronica Sinica, 2020, 48(11): 2131-2137.
脱婷, 马慧芳, 李志欣, 等. 熵权约束稀疏表示的短文本分类算法[J]. 电子学报, 2020,48(11):2131-2137. DOI： 10.3969/j.issn.0372-2112.2020.11.006.

TUO Ting, MA Hui-fang, LI Zhi-xin, et al. Effectively Classifying Short Texts by Entropy Weighted Constraints Sparse Representation[J]. Acta Electronica Sinica, 2020, 48(11): 2131-2137. DOI： 10.3969/j.issn.0372-2112.2020.11.006.

摘要

针对短文本特征稀疏性问题，提出一种熵权约束稀疏表示的短文本分类方法.考虑到初始字典维数较高，首先，利用Word2vec工具将字典中的词表示成词向量形式，然后根据加权向量平均值对原始字典进行降维.其次，利用一种快速特征子集选择算法去除字典中不相关和冗余短文本，得到过滤后的字典.再次，基于稀疏表示理论在过滤后的字典上，为目标函数设计一种熵权约束的稀疏表示方法，引入拉格朗日乘数法求得目标函数的最优值，从而得到每个类的子空间.最后，在学习到的子空间下通过计算待分类短文本与每个类中短文本的距离，并根据三种分类规则对短文本进行分类.在真实数据集上的大量实验结果表明，本文提出的方法能够有效缓解短文本特征稀疏问题且优于现有短文本分类方法.

Abstract

Aiming at the problem of short text feature sparsity

a short text sparse representation classification method based on entropy weighted constraint is proposed. Considering that the initial dictionary dimension is high

firstly

the word in the dictionary is represented as a word vector form via using the Word2vec tool

and then the original dictionary is reduced according to the average weighted vectors. Secondly

a fast feature subset selection algorithm is adopted to remove the irrelevant and redundant short texts in the dictionary

and the filtered dictionary can then be obtained. Thirdly

based on the sparse representation theory

an improved entropy-weighted sparse representation method is designed for the objective function

and the Lagrange multiplier method is introduced to obtain the optimal value of the objective function

thus the subspace of each class is obtained. Finally

the distance between the short text to be classified and the short text in each class is calculated under the subspace

and the short text is classified according to three classification rules. A large number of experimental results on real data sets show that the proposed method can effectively alleviate the short text feature sparse problem and exhibits better performance than the existing short text classification methods.

关键词

Keywords

references

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于跨尺度相似先验的遥感图像时空融合算法

结合最近邻图模型的稀疏ISAR成像方法

基于脑电熵值特征和功能连接的不同线型道路下驾驶状态检测

基于深度强化学习的码率自适应算法研究

水声信号处理中的稀疏表示理论及应用