电子学报 ›› 2021, Vol. 49 ›› Issue (3): 424-434.DOI: 10.12263/DZXB.20200337
杨艺1, 蒋良孝1,2, 李超群3, 李宏伟3
收稿日期:
2020-04-07
修回日期:
2020-10-28
出版日期:
2021-03-25
通讯作者:
作者简介:
基金资助:
YANG Yi1, JIANG Liang-xiao1,2, LI Chao-qun3, LI Hong-wei3
Received:
2020-04-07
Revised:
2020-10-28
Online:
2021-03-25
Published:
2021-03-25
Corresponding author:
Supported by:
摘要: 在众包学习中,使用标记集成算法得到的集成标记中仍然存在一定程度的标记噪声.本文受三重训练思想的启发,提出了一种基于tri-training的众包标记噪声纠正算法(Tri-Training-based Label Noise Correction,TTLNC).TTLNC首先使用过滤器获得干净集和噪声集,然后在干净集上进行bagging分别训练三个不同的分类器,并通过这些分类器重新标注噪声集中的实例,同时按照实例分配策略将实例分配给相应的训练集.最后在新训练集上重新训练三个不同的分类器,并用新分类器的分类结果重新标注所有实例.在仿真标准数据和真实众包数据集上的实验结果表明TTLNC比其他四种最先进的噪声纠正算法在噪声比和模型质量两个度量指标上表现更优.
中图分类号:
杨艺, 蒋良孝, 李超群, 等. 一种基于Tri-training的众包标记噪声纠正算法[J]. 电子学报, 2021, 49(3): 424-434.
YANG Yi, JIANG Liang-xiao, LI Chao-qun, et al. A Tri-training-Based Label Noise Correction Algorithm for Crowdsourcing[J]. Acta Electronica Sinica, 2021, 49(3): 424-434.
[1] Zhang H,Jiang L,Xu W.Differential evolution-based weighted majority voting for crowdsourcing[A].Proceedings of the 15th Pacific Rim International Conference on Artificial Intelligence[C].Berlin,Heidelberg:Springer,2018.228-236. [2] Sheng V S,Provost F,Ipeirotis P G.Get another label? improving data quality and data mining using multiple,noisy labelers[A].Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining[C].New York,USA:ACM,2008.614-622. [3] Demartini G,Difallah D E,CudreMauroux P.ZenCrowd:leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking[A].Proceedings of the 21th International Conference on World Wide Web[C].New York,USA:ACM,2012.469-478. [4] Ma F,Li Y,Li Q,et al.FaitCrowd:fine grained truth discovery for crowdsourced data aggregation[A].Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining[C].New York,USA:ACM,2015.745-754. [5] Zhang J,Sheng V S,Wu J,et al.Multi-class ground truth inference in crowdsourcing with clustering[J].IEEE Transactions on Knowledge and Data Engineering,2015,28(4):1080-1085. [6] Tian T,Zhu J.Max-margin majority voting for learning from crowds[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2018,41(10):2480-2494. [7] Zhang J,Sheng V S,Li T.Label aggregation for crowdsourcing with bi-layer clustering[A].Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval[C].New York,USA:ACM,2017.921-924. [8] Zhang H,Jiang L,Xu W.Multiple noisy label distribution propagation for crowdsourcing[A].Proceedings of the 28th International Joint Conference on Artificial Intelligence[C].Palo Alto,USA:AAAI Press,2019.1473-1479. [9] Sheng V S,Zhang J,et al.Majority voting and pairing with multiple noisy labeling[J].IEEE Transactions on Knowledge and Data Engineering,2019,31(7):1355-1368. [10] Nicholson B,Sheng V S,Zhang J.Label noise correction and application in crowdsourcing[J].Expert Systems with Applications,2016,66:149-162. [11] Zhang J,Sheng V S,Li T,Wu X.Improving crowdsourced label quality using noise correction[J].IEEE Transactions on Neural Networks and Learning Systems,2018,29(5):1675-1688. [12] Li C,Jiang L,Xu W.Noise correction to improve data and model quality for crowdsourcing[J].Engineering Applications of Artificial Intelligence,2019,82:184-191. [13] Zhou Z,Li M.Tri-training:exploiting unlabeled data using three classifiers[J].IEEE Transactions on Knowledge and Data Engineering,2005,17(11):1529-1541. [14] Gamberger D,Lavrac N,Groselj C.Experiments with noise filtering in a medical domain[A].Proceedings of the 16th International Conference on Machine Learning[C].Amsterdam,the Netherlands:Elsevier,1999.143-151. [15] Zhang J,Sheng V S,Nicholson B,et al.CEKA:a tool for mining the wisdom of crowds[J].Journal of Machine Learning Research,2015,16(1):2853-2858. [16] Witten I H,Frank E.Data Mining:Practical Machine Learning Tools and Techniques (Third Edition)[M].Beijing:China Machine Press,2005. [17] Garcia S,Herrera F.An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons[J].Journal of Machine Learning Research,2008,9(12):2677-2694. [18] Jiang L,Zhang L,Li C,Wu J.A correlation-based feature weighting filter for naive Bayes[J].IEEE Transactions on Knowledge and Data Engineering,2019,31(2):201-213. [19] Wilson,Dennis L.Asymptotic properties of nearest neighbor rules using edited data[J].IEEE Transactions on Systems,Man,and Cybernetics,1972,3:408-421. [20] Brodley C E,Friedl M A.Identifying mislabeled training data[J].Journal of Artificial Intelligence Research,1999,11:131-167. |
[1] | 陈君毅, 蒋德琛, 王智铭, 曹佳禾, 王勇. 一种基于双维度滤波和自适应定长化的FMCW雷达手势识别算法研究[J]. 电子学报, 2023, (): 1-9. |
[2] | 韩光洁, 赵腾飞, 刘立, 张帆, 徐政伟. 基于多元区域集划分的工业数据流概念漂移检测[J]. 电子学报, 2023, (): 1-11. |
[3] | 彭锦佳, 王辉兵. 基于异构卷积神经网络集成的无监督行人重识别方法[J]. 电子学报, 2023, (): 1-13. |
[4] | 余伶俐, 易倩, 金鸣岳, 周开军. 面向仿射目标识别的几何与仿生融合特征提取方法[J]. 电子学报, 2023, (): 1-12. |
[5] | 郭凯红, 崔明茜, 刘婷婷. 模糊知识测度下图像脉冲噪声去除方法[J]. 电子学报, 2023, (): 1-14. |
[6] | 李杨帅, 彭斐, 韩倩, 李小帅, 解光军. 一种针对QCA电路自动布局布线的混合策略研究[J]. 电子学报, 2023, 51(3): 666-674. |
[7] | 吕杭, 蒋明峰, 李杨, 张鞠成, 王志康. 基于混合时频域特征的卷积神经网络心律失常分类方法的研究[J]. 电子学报, 2023, 51(3): 701-711. |
[8] | 郑云飞, 王晓兵, 张雄伟, 曹铁勇, 孙蒙. 基于金字塔知识的自蒸馏HRNet目标分割方法[J]. 电子学报, 2023, 51(3): 746-756. |
[9] | 隗昊, 唐焕玲, 周爱, 张益嘉, 陈飞, 鲁明羽. 基于双路分段注意力神经张量网络的临床文本关系抽取[J]. 电子学报, 2023, 51(3): 658-665. |
[10] | 黄赟, 张帆, 郭威, 陈立, 羊光. 一种基于数据标准差的卷积神经网络量化方法[J]. 电子学报, 2023, 51(3): 639-647. |
[11] | 范兵兵, 何庭建, 张聪炫, 陈震, 黎明. 联合遮挡约束与残差补偿的特征金字塔光流计算方法[J]. 电子学报, 2023, 51(3): 648-657. |
[12] | 张晶, 王翌歆, 任永功. 统一全局空间表达的脑电信号跨被试情感识别[J]. 电子学报, 2023, (): 1-9. |
[13] | 申铉京, 李涵宇, 黄永平, 王玉. 基于自适应多尺度特征融合网络的车辆检测方法[J]. 电子学报, 2023, (): 1-9. |
[14] | 陈习坤, 杨俊美. 基于离散小波包变换与胶囊生成对抗网络的语音超分辨率算法[J]. 电子学报, 2023, (): 1-11. |
[15] | 张娜, 包梓群, 罗源, 吴彪, 涂小妹. 基于改进Cascade R-CNN算法在目标检测上的应用[J]. 电子学报, 2023, (): 1-11. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||