电子学报 ›› 2021, Vol. 49 ›› Issue (3): 424-434.DOI: 10.12263/DZXB.20200337
杨艺1, 蒋良孝1,2, 李超群3, 李宏伟3
收稿日期:
2020-04-07
修回日期:
2020-10-28
出版日期:
2021-03-25
通讯作者:
作者简介:
基金资助:
YANG Yi1, JIANG Liang-xiao1,2, LI Chao-qun3, LI Hong-wei3
Received:
2020-04-07
Revised:
2020-10-28
Online:
2021-03-25
Published:
2021-03-25
摘要: 在众包学习中,使用标记集成算法得到的集成标记中仍然存在一定程度的标记噪声.本文受三重训练思想的启发,提出了一种基于tri-training的众包标记噪声纠正算法(Tri-Training-based Label Noise Correction,TTLNC).TTLNC首先使用过滤器获得干净集和噪声集,然后在干净集上进行bagging分别训练三个不同的分类器,并通过这些分类器重新标注噪声集中的实例,同时按照实例分配策略将实例分配给相应的训练集.最后在新训练集上重新训练三个不同的分类器,并用新分类器的分类结果重新标注所有实例.在仿真标准数据和真实众包数据集上的实验结果表明TTLNC比其他四种最先进的噪声纠正算法在噪声比和模型质量两个度量指标上表现更优.
中图分类号:
杨艺, 蒋良孝, 李超群, 李宏伟. 一种基于Tri-training的众包标记噪声纠正算法[J]. 电子学报, 2021, 49(3): 424-434.
YANG Yi, JIANG Liang-xiao, LI Chao-qun, LI Hong-wei. A Tri-training-Based Label Noise Correction Algorithm for Crowdsourcing[J]. Acta Electronica Sinica, 2021, 49(3): 424-434.
[1] Zhang H,Jiang L,Xu W.Differential evolution-based weighted majority voting for crowdsourcing[A].Proceedings of the 15th Pacific Rim International Conference on Artificial Intelligence[C].Berlin,Heidelberg:Springer,2018.228-236. [2] Sheng V S,Provost F,Ipeirotis P G.Get another label? improving data quality and data mining using multiple,noisy labelers[A].Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining[C].New York,USA:ACM,2008.614-622. [3] Demartini G,Difallah D E,CudreMauroux P.ZenCrowd:leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking[A].Proceedings of the 21th International Conference on World Wide Web[C].New York,USA:ACM,2012.469-478. [4] Ma F,Li Y,Li Q,et al.FaitCrowd:fine grained truth discovery for crowdsourced data aggregation[A].Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining[C].New York,USA:ACM,2015.745-754. [5] Zhang J,Sheng V S,Wu J,et al.Multi-class ground truth inference in crowdsourcing with clustering[J].IEEE Transactions on Knowledge and Data Engineering,2015,28(4):1080-1085. [6] Tian T,Zhu J.Max-margin majority voting for learning from crowds[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2018,41(10):2480-2494. [7] Zhang J,Sheng V S,Li T.Label aggregation for crowdsourcing with bi-layer clustering[A].Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval[C].New York,USA:ACM,2017.921-924. [8] Zhang H,Jiang L,Xu W.Multiple noisy label distribution propagation for crowdsourcing[A].Proceedings of the 28th International Joint Conference on Artificial Intelligence[C].Palo Alto,USA:AAAI Press,2019.1473-1479. [9] Sheng V S,Zhang J,et al.Majority voting and pairing with multiple noisy labeling[J].IEEE Transactions on Knowledge and Data Engineering,2019,31(7):1355-1368. [10] Nicholson B,Sheng V S,Zhang J.Label noise correction and application in crowdsourcing[J].Expert Systems with Applications,2016,66:149-162. [11] Zhang J,Sheng V S,Li T,Wu X.Improving crowdsourced label quality using noise correction[J].IEEE Transactions on Neural Networks and Learning Systems,2018,29(5):1675-1688. [12] Li C,Jiang L,Xu W.Noise correction to improve data and model quality for crowdsourcing[J].Engineering Applications of Artificial Intelligence,2019,82:184-191. [13] Zhou Z,Li M.Tri-training:exploiting unlabeled data using three classifiers[J].IEEE Transactions on Knowledge and Data Engineering,2005,17(11):1529-1541. [14] Gamberger D,Lavrac N,Groselj C.Experiments with noise filtering in a medical domain[A].Proceedings of the 16th International Conference on Machine Learning[C].Amsterdam,the Netherlands:Elsevier,1999.143-151. [15] Zhang J,Sheng V S,Nicholson B,et al.CEKA:a tool for mining the wisdom of crowds[J].Journal of Machine Learning Research,2015,16(1):2853-2858. [16] Witten I H,Frank E.Data Mining:Practical Machine Learning Tools and Techniques (Third Edition)[M].Beijing:China Machine Press,2005. [17] Garcia S,Herrera F.An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons[J].Journal of Machine Learning Research,2008,9(12):2677-2694. [18] Jiang L,Zhang L,Li C,Wu J.A correlation-based feature weighting filter for naive Bayes[J].IEEE Transactions on Knowledge and Data Engineering,2019,31(2):201-213. [19] Wilson,Dennis L.Asymptotic properties of nearest neighbor rules using edited data[J].IEEE Transactions on Systems,Man,and Cybernetics,1972,3:408-421. [20] Brodley C E,Friedl M A.Identifying mislabeled training data[J].Journal of Artificial Intelligence Research,1999,11:131-167. |
[1] | 王子为, 鲁继文, 周杰. 基于自适应梯度优化的二值神经网络[J]. 电子学报, 2023, (): 1-10. |
[2] | 刘金平, 吴娟娟, 张荣, 徐鹏飞. 基于结构重参数化与多尺度深度监督的COVID-19胸部CT图像自动分割[J]. 电子学报, 2023, (): 1-9. |
[3] | 张笑宇, 沈超, 蔺琛皓, 李前, 王骞, 李琦, 管晓宏. 面向机器学习模型安全的测试与修复[J]. 电子学报, 2023, (): 1-35. |
[4] | 王炼红, 罗志辉, 林飞鹏, 李潇瑶. 采用多头注意力机制的C&RM-MAKT预测算法[J]. 电子学报, 2022, (): 1-9. |
[5] | 苏田田, 王慧敏, 张小凤. 基于多分支瓶颈结构的轻量型图像分类算法研究[J]. 电子学报, 2022, (): 1-9. |
[6] | 刘芳, 朱天贺, 苏卫星, 刘阳. 基于高斯隐马尔可夫模型的人机共享控制区域化决策算法[J]. 电子学报, 2022, 50(11): 2659-2667. |
[7] | 桑海峰, 陈旺兴, 王海峰, 王金玉. 基于多模式时空交互的行人轨迹预测模型[J]. 电子学报, 2022, 50(11): 2806-2812. |
[8] | 刘耿耿, 李泽鹏, 郭文忠, 陈国龙, 徐宁. 面向超大规模集成电路物理设计的通孔感知的并行层分配算法[J]. 电子学报, 2022, 50(11): 2575-2583. |
[9] | 魏博文, 全红艳. 基于语义与形态特征融合的语义分割网络[J]. 电子学报, 2022, 50(11): 2688-2697. |
[10] | 姚睿, 朱享彬, 周勇, 王鹏, 张艳宁, 赵佳琦. 基于重要特征的视觉目标跟踪可迁移黑盒攻击方法[J]. 电子学报, 2022, (): 1-10. |
[11] | 金紫凤, 潘思聪, 危辉. 可变环境下基于位姿变换矩阵的机器人无标定手眼协调方法[J]. 电子学报, 2022, 50(10): 2318-2328. |
[12] | 魏钰轩, 陈莹. 基于自适应层信息熵的卷积神经网络压缩[J]. 电子学报, 2022, 50(10): 2398-2408. |
[13] | 马百腾, 张士伟, 高常鑫, 桑农. 面向行为边界框生成的端到端时间全局相关网络[J]. 电子学报, 2022, 50(10): 2452-2461. |
[14] | 肖斌, 陈嘉博, 毕秀丽, 张俊辉, 李伟生, 王国胤, 马旭. 基于一维卷积神经网络与循环神经网络串联的心音分析方法[J]. 电子学报, 2022, 50(10): 2425-2432. |
[15] | 周登文, 李文斌, 李金新, 黄志勇. 一种轻量级的多尺度通道注意图像超分辨率重建网络[J]. 电子学报, 2022, 50(10): 2336-2346. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||