FANG Jia-yan, LIU Qiao
In this paper, a new clustering algorithm with simultaneous feature selection is proposed, which is called iterative tighter nonparallel support vector clustering with simultaneous feature selection (IT-NHSVC-SFS). In learning with two nonparallel hyperplanes model,we use the iterative (alternating) optimization algorithm to achieve clustering, and at the same time introduce two types of regularizes,the Euclidean norm and the infinite norm, respectively. Euclidean norm clustering model is used to improve the generalization ability and the infinite norm actually fulfills implicit feature extraction for the two nonparallel hyperplanes in order to reduce data noises from irrelevant features, and the clustering precision of the model is guaranteed. We also introduce a set of bounding variables to avoid maximization operation of the infinite norm, converting the non-convex optimization problem into a quadratic convex optimization problem. Meanwhile, because the new model embodies the idea of "maximum margin", it has good generalization ability. IT-NHSVC-SFS chooses nonparallel hyperplanes SVM (NHSVM) as the basis of the algorithm model. Unlike TWSVM and its variant models, only a quadratic programming problem (QP problem) needs to be solved to get the two optimal hyperplane simultaneously. This property is helpful to design a synchronous feature selection process for two nonparallel hyperplanes. The new algorithm adds two sets of equality constraints in the constraint set of the original NHSVM model, which can avoid the inverse operation of two large matrices and reduce the computational complexity. In addition, in the IT-NHSVC-SFS model, the Laplacian loss function replaces the original hinge loss function in NHSVM to avoid premature convergence. Numerical experiments on a set of benchmark data sets show that IT-NHSVC-SFS algorithm performs better in terms of clustering accuracy than other existing clustering algorithms.