CIE Homepage  |  Join CIE  |  Login CIE  |  中文 

Collections

Machine Learning—Feature Selection
Sort by Default Latest Most read  
Please wait a minute...
  • Select all
    |
  • YIN Jian-qin, TIAN Guo-hui, WEI Jun, LI Jin-ping, LIN Jia-ben
    Acta Electronica Sinica. 2015, 43(2): 248-254. https://doi.org/10.3969/j.issn.0372-2112.2015.02.007

    Frequent pattern mining is used widely in feature selection for classification problem.In order to provide theoretical basis for the application,we established the relationship between the classification discriminative ability and the support of the feature.Information gain was adopted as evaluation criteria,and we discussed the connection between the support of the feature and its discriminative ability.Firstly,we proved the information gain is a concave function about the support of the feature;secondly,we proved the conclusion that the feature with too-high or too-low support has limited discriminative ability under the two classes and multiple classes circumstances separately;Finally,simulation experiments validate our conclusions.And the conclusion provides a theoretical basis for the application of frequent pattern mining in classification problems.

  • GAO Wen, QIAN Ya-guan, WU Chun-ming, GUO Ye, ZHU Kai, CHEN Shuang-xi
    Acta Electronica Sinica. 2015, 43(4): 795-799. https://doi.org/10.3969/j.issn.0372-2112.2015.04.024

    Feature selection as a substantial preprocess step is a key factor for improvement of classification accuracy.The network traffic is characterized by huge volume and high dimensions.So how to extract the optimal feature subset in short time is practical for traffic classification based on machine learning.A novel method is proposed,which partitions the traffic dataset into several small subsets,and applies special feature selection algorithm to them respectively.Finally,the optimal feature subset is obtained by voting on these alternative feature subsets.The experiment results show that the proposed method has good time efficiency in searching optimal features and helps to improve classification accuracy efficiently.

  • SUN Xue, LI Kun-lun, HAN Lei, BAI Xiao-liang
    Acta Electronica Sinica. 2015, 43(7): 1356-1361. https://doi.org/10.3969/j.issn.0372-2112.2015.07.016

    Most of the existing concept drift algorithm focuses on the classification model data streams,some of which overlook the distribution of the feature space and sample space,and the importance of feature selection and weighting.To solve this problem,we propose a dynamic information entropy and feature weighting algorithm based on the distribution of feature items from the dynamic evolution of the concept drift departure.To realize the concept transition,we capture the concept drifting of the data stream by the information entropy,according to the fitness degree between the sample and feature space.We improve the feature dynamic weighting latent dirichlet model,to overcome the problem of the current and historical feature weight assignment,as well as cropping the invalid features.Furthermore,the validity of the proposed algorithm was confirmed by the test in open corpus CCERT and Trec06.

  • CHEN Xiao-hong, LI Xia, WANG Na
    Acta Electronica Sinica. 2015, 43(7): 1300-1307. https://doi.org/10.3969/j.issn.0372-2112.2015.07.008

    Objective reduction approach is an effective means for many-objective optimization problems by eliminating redundant objectives with respect to the original objective set.The geometrical structural characteristics and Pareto-dominance relation of approximation set can represent the characteristics of the original problem in different aspects.This paper proposed a new algorithm based on sparse feature selection.It used the geometrical structural characteristics to construct a graph representing the original problem.A sparse projection matrix mapping the high dimensional data into low dimensional space was then learned by a sparse regression model,which was used to measure the importance of each objective.The change of Pareto-dominance relation induced by reduced set was also adopted to identify a minimum set with error not exceeding threshold value.By comparing with other algorithms,the experimental results show that the accuracy of the new algorithm outperforms other dimension reduction techniques,and is scarcely effected by the quality of approximation set.

  • SHEN Jian, XIA Jing-bo, ZHANG Xiao-yan, ZHAO Guang-hui, FU Kai
    Acta Electronica Sinica. 2017, 45(1): 128-134. https://doi.org/10.3969/j.issn.0372-2112.2017.01.018

    The diversified and high-speed development of network traffic presents a great challenge for traffic identification.As an effective method for data dimensionality reduction,the research of feature extraction is of great significance.A secondary traffic feature extraction model is described as the foundation of the secondary feature extraction algorithm of network traffic.The algorithm divides traffic data into several subsets and gathers the features extracted from different subsets.The index of influence is proposed as the reference of feature ranking and extraction.The experiment results show that the secondary traffic feature extraction model has better performance,and the algorithm can identify traffic more accurately with fewer features.

  • XI Xu-gang, TANG Min-yan, ZHANG Zi-hao, ZHANG Qi-zhong, LUO Zhi-zeng
    Acta Electronica Sinica. 2017, 45(11): 2735-2741. https://doi.org/10.3969/j.issn.0372-2112.2017.11.022
    In order to improve the recognition rate of lower limb motion pattern,(a novel lower limb motion recognition method was designed by fusion of surface electromyography (sEMG) signal and acceleration signal.Firstly,the sEMG signal was decomposed into a set of product functions(PFs)by Local mean decomposition(LMD),and the multiscale permutation entropy(MPE) of PFs was calculated.Then,one scale permutation entropy was selected as the feature of sEMG by the Laplacian score.The feature vector is composed by this sEMG feature and the permutation entropy of acceleration signal.Finally,based on the combination of inter-class Euclidean distance and intra-class sample distribution,an improved support vector machine based binary tree(ISVM-BT) was designed.The feature vector was inputted into this SVM to recognize the lower limb motion.The experimental results indicate that the proposed method achieved 98.62% at the average recognition rate for seven daily activities,and has higher accuracy than other methods.
  • ZHOU Guang-bing, SONG Hua-jun, WU Yu-xing, REN Peng
    Acta Electronica Sinica. 2018, 46(10): 2384-2390. https://doi.org/10.3969/j.issn.0372-2112.2018.10.011
    3D image registration (IR) aims to map one image to another image of a same scene, widely used in medical diagnosis and other applications. The existing methods mostly use feature to registration and have specific constraint condition which have many problems such as time-consuming, strong random in feature extraction and not flexible under constraint condition. For those problems, an intensity-based method for non-feature 3D rigid IR is proposed in this paper. The method uses Taylor expansion and the least squares (LS) to directly get the transformation parameters and has advantage of high processing speed with less processed data. It is shown by numerous experiments that the proposed IR method has high accuracy and only uses very small proportion data to process.
  • FANG Jia-yan, LIU Qiao
    Acta Electronica Sinica. 2020, 48(1): 44-58. https://doi.org/10.3969/j.issn.0372-2112.2020.01.006
    In this paper, a new clustering algorithm with simultaneous feature selection is proposed, which is called iterative tighter nonparallel support vector clustering with simultaneous feature selection (IT-NHSVC-SFS). In learning with two nonparallel hyperplanes model,we use the iterative (alternating) optimization algorithm to achieve clustering, and at the same time introduce two types of regularizes,the Euclidean norm and the infinite norm, respectively. Euclidean norm clustering model is used to improve the generalization ability and the infinite norm actually fulfills implicit feature extraction for the two nonparallel hyperplanes in order to reduce data noises from irrelevant features, and the clustering precision of the model is guaranteed. We also introduce a set of bounding variables to avoid maximization operation of the infinite norm, converting the non-convex optimization problem into a quadratic convex optimization problem. Meanwhile, because the new model embodies the idea of "maximum margin", it has good generalization ability. IT-NHSVC-SFS chooses nonparallel hyperplanes SVM (NHSVM) as the basis of the algorithm model. Unlike TWSVM and its variant models, only a quadratic programming problem (QP problem) needs to be solved to get the two optimal hyperplane simultaneously. This property is helpful to design a synchronous feature selection process for two nonparallel hyperplanes. The new algorithm adds two sets of equality constraints in the constraint set of the original NHSVM model, which can avoid the inverse operation of two large matrices and reduce the computational complexity. In addition, in the IT-NHSVC-SFS model, the Laplacian loss function replaces the original hinge loss function in NHSVM to avoid premature convergence. Numerical experiments on a set of benchmark data sets show that IT-NHSVC-SFS algorithm performs better in terms of clustering accuracy than other existing clustering algorithms.
  • ZHOU Bo-yang, GUO Zhi-min, WANG Yan-song, RUAN Wei, WU Chun-ming, ZHOU Ning, ZHANG Wei, CHENG Guo-zhen
    Acta Electronica Sinica. 2020, 48(8): 1552-1557. https://doi.org/10.3969/j.issn.0372-2112.2020.08.013
    The security of the wireless access network of electric power grids is critical for power grid productions. However, the control data anomalies are difficult to be detected in a fast and effective manner, due to the high dimension of the control protocol data in IEC 60870-5-104 protocol, as well as the dynamics on the quality of wireless channels. To this end, this paper proposes an anomaly traffic detector (ATD) for the wireless network of power girds based on multi-resolution low rank (MRLR) model. Firstly, the ATD is designed with the MRLR for the protocol, to regularize and reduce the security feature dimensions. Secondly, it utilizes the improved recursive feature selection and focused classification algorithms for accurate data anomaly detection. The results demonstrate the accuracy for the classification on data anomalies, and the performance for the dimensionality reduction.