Attribute Reduction for Concepts and Concept Drifting Detection in Heterogeneous Data
DENG Da-yong1,3, LU Ke-wen1, HUANG Hou-kuan2, DENG Zhi-xuan1
1. College of Mathematics, Physics and Information Engineering, Zhejiang Normal University, Jinhua, Zhejiang 321004, China;
2. School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China;
3. Xingzhi College, Zhejiang Normal University, Jinhua, Zhejiang 321004, China
Abstract:Rough set theory is one of important methods of granular computing,and data heterogeneities are one of remarkable characteristics in big data.For data heterogeneities,we define attribute reduction for concepts after investigating intrinsic quality of attribute reducts,which can contain value reducts,Pawlak attribute reducts and parallel reducts.After investigating properties of concept-attribute-reduction,we present a new method to reduce redundant attributes and a new method to detect concept drift for heterogeneous concepts.Theoretical analysis and examples show that these methods are valid.This work provides a new type way for rough set theory and granular computing to integrate into big data.
[1] Pawlak Z.Rough Sets-Theoretical Aspect of Reasoning about Data[M].Dordrecht:Kluwer Academic Publishers,1991.
[2] 邓大勇,陈林.并行约简与F-粗糙集.云模型与粒计算[M].北京:科学出版社,2012:210-228. Deng D Y,Chen L.Parallel Reducts and F-rough Sets.Cloud Model and Granular Computing[M].Beijing:Science Press,2012:210-228.(in Chinese)
[3] Qian J,Miao D Q,Zhang Z H,et al.Parallel reduction algorithm using MapReduce[J].Information Sciences,2014,279:671-690.
[4] Wang F,Xu J,Li L.A novel rough set reduct algorithm to feature selection based on artificial fish swarm algorithm[A].LNCS8795:Proc of 5th International Conference on Swarm Intelligence[C].Berlin:Springer,2014.24-33.
[5] Liu Y,Huang W L,Jiang Y L,et al.Quick attribute reduct algorithm for neighborhood rough set model[J].Information Sciences,2014,271:65-81.
[6] Hu F,Wang G Y.Knowledge reduction based on divide and conquer method in rough set theory[J].Mathematical Problems in Engineering,2012(1):542-551.
[7] Eskandari S,Javidi M M.Online streaming feature selection using rough sets[J].International Journal of Approximate Reasoning,2016,69(C):35-57.
[8] Lin T Y,Liu Y,Huang W L.Unifying rough set theories via large scaled granular computing[J].Fundamenta Informaticae,2013,127:413-428.
[9] Cao F Y,Huang J Z.A concept-drfting detection algorithm for categorical evolving data[A].LNAI 7819:Proc of the 17th Pacific-Asia Conf on Knowledge Discovery and Data Mining[C].Berlin:Springer,2013.485-496.
[10] 邓大勇,徐小玉,黄厚宽.基于并行约简的概念漂移探测[J].计算机研究与发展,2015,52(5):1071-1079. Deng D Y,Xu X Y,Huang H K.Concept drifting detection for categorical evolving data based on parallel reducts[J].Journal of Computer Research and Development,2015,52(5):1071-1079.(in Chinese)
[11] 邓大勇,苗夺谦,黄厚宽.信息表中概念漂移与不确定性分析[J].计算研究与发展,2016,53(11):2607-2612. Deng D Y,Miao D Q,Huang H K.Analysis of concept drifting and uncertainty in an information system[J].Journal of Computer Research and Development,2016,53(11):2607-2612.(in Chinese)
[12] 梁吉业,钱宇华,李德玉,等.大数据挖掘的粒计算理论与方法[J].中国科学E辑信息科学,2015,45(11):1355-1369. Liang J Y,Qian Y H,Li D Y,et al.Theory and method of granular computing for big data mining[J].Science in China Ser E Information Sciences,2015,45(11):1355-1369.(in Chinese)
[13] Hu Q,Yu D,Liu J,et al.Neighborhood rough set based heterogeneous feature subset selection[J].Information Sciences,2008,178(18):3577-3594.
[14] Chen D,Yang Y.Attribute reduction for heterogeneous data based on the combination of classical and fuzzy rough set models[J].IEEE Transactions on Fuzzy Systems,2014,22(5):1325-1334.
[15] Qian Y,Liang J,Yao Y,et al.MGRS:A multi-granulation rough set[J].Information Sciences,2010,180(6):949-970.
[16] Lu N,Zhang G,Lu J.Concept drift detection via competence models[J].Artificial Intelligence,2014,209(1):11-28.
[17] Lu N,Lu J,Zhang G.A concept drift-tolerant case-base editing technique[J].Artificial Intelligence,2015,230(C):108-133.
[18] 孙雪,李昆仑,韩蕾,等.基于特征项分布的信息熵及特征动态加权概念漂移检测模型[J].电子学报,2015,43(7):1356-1361. Sun X,Li K L,Han L,et al.Construction of the concept drift detection model based on the information entropy of feature distribution and dynamic weighting algorithm[J].Acta Electronica Sinica,2015,43(7):1356-1361.(in Chinese)
[19] Li P,Wu X,Hu X,Learning concept-drifting data streams with random ensemble decision trees[J].Neurocomputing,2015,166(C):68-83.