1. 大连海事大学信息科学技术学院,辽宁,大连,116026
2. 烟台职业学院计算机与信息工程系,山东,烟台,264670
3. 大连海事大学信息科学技术学院,辽宁,大连,116026
4. 烟台职业学院计算机与信息工程系,山东,烟台,264670
纸质出版:2008
移动端阅览
唐焕玲, 林正奎, 鲁明羽. 基于差异性评估对Co-training文本分类算法的改进[J]. 电子学报, 2008,36(S1):138-143.
TANG Huan-ling, LIN Zheng-kui, LU Ming-yu. An Improved Co-training Text Categorization Algorithm Based on Diversity Measures[J]. Acta Electronica Sinica, 2008, 36(S1): 138-143.
Co-training算法要求两个特征视图满足一致性和独立性假设
但是
许多实际应用中不存自然的划分且满足这种假设的两个视图
且直接评估两个视图的独立性有一定的难度.分析Co-training的理论假设
本文把寻找两个满足一致性和独立性特征视图的目标
转变成寻找两个既满足一定的正确性
又存在较大的差异性的两个基分类器的问题.首先利用特征评估函数建立多个特征视图
每个特征视图包含足够的信息训练生成一个基分类器
然后通过评估基分类器之间的差异性间接评估二者的独立性
选择两个满足一定的正确性和差异性比较大的基分类器协同训练.根据每个视图上采用的分类算法是否相同
提出了两种改进算法TV-SC和TV-DC.实验表明改进的TV-SC和TV-DC算法明显优于基于随机分割特征视图的Co-Rnd算法
而且TV-DC算法的分类效果要优于TV-SC算法.
Co-training algorithm is constrained by its assumption that the features can be split into two compatible and independent subsets.However
the assumption is usually violated in real-world application
especially for independence.We discover its real purpose is to find two classifiers with certain accuracy and sufficient diversity to co-train.First
multi-views are created using different term evaluation functions.Second
instead of directly computing the independence between two sub-views
this paper evaluates the independence between two classifiers
trained on them
by using diversity measures indirectly.Thus a pair of classifiers with certain accuracy and greater diversity is selected.The experimental results show two improved algorithms named TV-SC and TV-DC are both outperform another co-training algorithm named Co-Rnd based on random splitting method
and TV-DC outperforms TV-SC.
0
浏览量
1283
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621