Open Access Open Access  Restricted Access Subscription or Fee Access

Bisecting K-Means Clustering Approach for High Dimensional Dataset

R. Indhumathi, Dr.S. Sathiyabama

Abstract


High dimensional data is phenomenon in real-world data mining applications. Developing effective clustering methods for high dimensional dataset is a challenging problem due to the curse of dimensionality. Usually k-means clustering algorithm is used but it results in time consuming, computationally expensive and the quality of the resulting clusters depends on the selection of initial centroid and the dimension of the data. The accuracy of the resultant value perhaps not up to the level of expectation when the dimension of the dataset is high because we cannot say that the dataset chosen are free from noisy and flawless. Hence to improve the efficiency and accuracy of mining task on high dimensional data, the data must be pre-processed by an efficient dimensionality reduction method. This paper proposes a method in which the high dimensional data is reduced through Principal Component Analysis and then bisecting k-means clustering is performed on the reduced data where there is no initialization of the centroids.

Keywords


Bisecting K-Means, Dimensionality Reduction, K-means, Principal Component Analysis, Principal Components

Full Text:

PDF

References


Pang-Ning Tang, Michal Steinbach and Vipin Kumar, “Introduction to Data Mining”, Pearson Education, Third edition, 2009.

Chris Ding and Xiaofeng He, “K-Means Clustering via Principal Component Analysis”, In proceedings of the 21stInternational Conference on Machine Learning, Banff, Canada, 2004

Sandro Saitta, Combining PCA and K-means March 26, 2007 by Filed under: PCA, k-means

Chris Ding and Xiaofeng He ,K-means Clustering via Principal Component Analysis: Proceedings of the twenty-first international conference on Machine learning, Page: 29 ,Year of Publication: 2004

Zhang Z., Zhang J. and Xue H.2008.Improved K-means clustering algorithm, Proceedings of the congress on Image and signal Processing, Vol.5, n0.5, pp.162-172.

Principal component analysis From Wikipedia, the free encyclope

I.T. Jolliffe. Principal Component Analysis. Springer, 2nd edition2002, ISBN 978-0-387-95442-4.

Rajashree Dash,Debahuti Mishra,Amiya Kumar Rath,Milu Acharya ,A hybridized K-means clustering approach for high dimensional dataset, ,Inertnatioanl Journal of Engineering Science and Technology,Vol 2,No.2,2010,pp,59-66.

Merz C and Murphy P UCI Respository of Machine Learning Databases.

A Deterministic Method for Initializing K-Means Clustering, Ting Su,Jennifer Dy, Proceedings of the 16th IEEE International Conference on Tools with Artifical Intelligence,pp 784-786.

Valarmathie P.,Srinath M.and Dinakaran K., 2009.An Increased performance of Clustering high dimensional data through dimensionality reduction technique,Journal of Theoretical and Applied Information Technology,Vol 13,pp 271-273

Sergio M. Savaresi and Daniel L. Boley, On the performance of Bisecting K-Means and PDDP.

N.Tajunisha and V.Saravanan,”An increased performance of clustering high dimensional data using Priniciapl Component Analysis, 2010 First International Conference on Integrated Intelligent Computing”DOI 10.1109.

A k-means-Based Projected Clustering Algorithm,Yufen Sun,Gang Liy and Kun Xu, 2010 Third International Joint Conference on Computational Science and Optimization, DOI 10.1109.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.