基于差异性评估对Co-training文本分类算法的改进

唐焕玲; 林正奎; 鲁明羽

您当前的位置：

首页 >

文章列表页 >

基于差异性评估对Co-training文本分类算法的改进

科研通信 | 更新时间：2025-07-16

- 基于差异性评估对Co-training文本分类算法的改进
- An Improved Co-training Text Categorization Algorithm Based on Diversity Measures
- 电子学报 2008年36卷第S1期页码：138-143
- 作者机构：
  
  1. 大连海事大学信息科学技术学院,辽宁,大连,116026
  2. 烟台职业学院计算机与信息工程系,山东,烟台,264670
  3. 大连海事大学信息科学技术学院,辽宁,大连,116026
  4. 烟台职业学院计算机与信息工程系,山东,烟台,264670
- 作者简介：
- 基金信息：
  
  国家自然科学基金 (No.60773084,J0724003,60603023);教育部博士点基金 (No.20070151009)
- DOI：
  中图分类号： TP181
- 纸质出版：2008
- 稿件说明：
移动端阅览
唐焕玲, 林正奎, 鲁明羽. 基于差异性评估对Co-training文本分类算法的改进[J]. 电子学报, 2008,36(S1):138-143.

TANG Huan-ling, LIN Zheng-kui, LU Ming-yu. An Improved Co-training Text Categorization Algorithm Based on Diversity Measures[J]. Acta Electronica Sinica, 2008, 36(S1): 138-143.
唐焕玲, 林正奎, 鲁明羽. 基于差异性评估对Co-training文本分类算法的改进[J]. 电子学报, 2008,36(S1):138-143. DOI：

TANG Huan-ling, LIN Zheng-kui, LU Ming-yu. An Improved Co-training Text Categorization Algorithm Based on Diversity Measures[J]. Acta Electronica Sinica, 2008, 36(S1): 138-143. DOI：

摘要

Co-training算法要求两个特征视图满足一致性和独立性假设

但是

许多实际应用中不存自然的划分且满足这种假设的两个视图

且直接评估两个视图的独立性有一定的难度.分析Co-training的理论假设

本文把寻找两个满足一致性和独立性特征视图的目标

转变成寻找两个既满足一定的正确性

又存在较大的差异性的两个基分类器的问题.首先利用特征评估函数建立多个特征视图

每个特征视图包含足够的信息训练生成一个基分类器

然后通过评估基分类器之间的差异性间接评估二者的独立性

选择两个满足一定的正确性和差异性比较大的基分类器协同训练.根据每个视图上采用的分类算法是否相同

提出了两种改进算法TV-SC和TV-DC.实验表明改进的TV-SC和TV-DC算法明显优于基于随机分割特征视图的Co-Rnd算法

而且TV-DC算法的分类效果要优于TV-SC算法.

Abstract

Co-training algorithm is constrained by its assumption that the features can be split into two compatible and independent subsets.However

the assumption is usually violated in real-world application

especially for independence.We discover its real purpose is to find two classifiers with certain accuracy and sufficient diversity to co-train.First

multi-views are created using different term evaluation functions.Second

instead of directly computing the independence between two sub-views

this paper evaluates the independence between two classifiers

trained on them

by using diversity measures indirectly.Thus a pair of classifiers with certain accuracy and greater diversity is selected.The experimental results show two improved algorithms named TV-SC and TV-DC are both outperform another co-training algorithm named Co-Rnd based on random splitting method

and TV-DC outperforms TV-SC.

关键词

Keywords

references

浏览量

1283

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

暂无数据