Open Access Open Access  Restricted Access Subscription or Fee Access

K-Means Algorithm for Centroid Detection and Estimation of Number of Clusters-A Review

D. SharmilaRani, N. Kousika, G. Komarasamy

Abstract


Clustering is an unsupervised classification that is the partitioning of a data set in a set of meaningful subsets. Each object in dataset shares some common property often proximity according to some defined distance measure. Among various types of clustering techniques, K-Means is one of the most popular algorithms. The objective of K-means algorithm is to make the distances of objects in the same cluster as small as possible. Algorithms, systems and frameworks that address clustering challenges have been more elaborated over the past years. In this review paper, we present the K-Means algorithm and its improved techniques.

Keywords


Classification, Clustering, K-Means Clustering, Partitioning Clustering.

Full Text:

PDF

References


Liang Wang,Xin Geng,James Bezdek,Christopher Lekie,Kotagiri Ramamohanarao ”Automatically Determining Number of clusters in Unlabeled Dataset” IEEE Transaction on Knowledge Engineering Vol.21 No.3 March 2009.

Liang Wang,Xin Geng,James Bezdek,Christopher Lekie,Kotagiri Ramamohanarao “Enhanced Visual Analysis for Cluster Tendency Assessment and Data Partitioning” IEEE Transaction on Knowledge Engineering Vol.22 No.10 October 2010.

Mahmuddin,Yusof “Automatic Estimation Total Number of Cluster Using A Hybrid Test-and-Generate and K-means Algorithm” ICCAIE 2010 Dec 2010.

Madhu Yedla,Srinivasa Rao Pathakoda “Enhancing K-means Clustering Algorithm with improved Initial Center” IJCSIT pp121-125,2010.

R.C. Gonzalez and R.E. Woods, Digital Image Processing. Prentice Hall, 2002.

R.F. Ling, “A Computer Generated Aid for Cluster Analysis,” Comm. ACM, vol. 16, pp. 355-361, 1973.

T. Tran-Luu, “Mathematical Concepts and Novel Heuristic Methods for Data Clustering and Visualization,” PhD dissertation, Univ. of Maryland, College Park, 1996.

J.C. Bezdek and R. Hathaway, “VAT: A Tool for Visual Assessment of (Cluster) Tendency,” Proc. Int‟l Joint Conf. Neural Networks (IJCNN ‟02), pp. 2225-2230, 2002.

J. Huband, J.C. Bezdek, and R. Hathaway, “bigVAT: VisualAssessment of Cluster Tendency for Large Data Sets,” PatternRecognition, vol. 38, no. 11, pp. 1875-1886, 2005.

M.Sakthi and Dr. Antony Selvadoss Thanamani “An Effective Determination of Initial Centroids in K-Means Clustering Using Kernel PCA” International Journal of edComputer Science and Information Technologies, Vol. 2 (3) , 2011, 955-959

S. Deelers, and S. Auwatanamongkol “Enhancing K-Means Algorithm with Initial Cluster Centers Derived from Data Partitioning along the Data Axis with the Highest Variance” International Journal of Electrical and Computer Engineering 2:4 2007

K. A. Abdul Nazeer, M. P. Sebastian “Improving the Accuracy and Efficiency of the K-means Clustering Algorithm” Proceedings of the World Congress on Engineering 2009 Vol I WCE 2009, July 1 - 3, 2009, London, U.K.

Rajashree Dash , Debahuti Mishra , Amiya Kumar Rath , Millu Acharya “A hybridized K-means clustering approach for high dimensional dataset” International Journal of Engineering, Science and Technology” Vol. 2, No. 2, 2010, pp. 59-66.

Pena J. M., Lozano J. A. and Larranaga P., 1999. An empirical comparison of four initialization methods for the k-meansalgorithm, Pattern Recognition Letters, Vol. 20, No. 10, pp. 1027-1040.

Valarmathie P., Srinath M. and Dinakaran K., 2009. An increased performance of clustering high dimensional data through dimensionality reduction technique, Journal of Theoretical and Applied Information Technology, Vol. 13, pp. 271-273.

Xu R. and Wunsch D., 2005. Survey of clustering algorithms, IEEE Trans. Neural Networks, Vol. 16, No. 3, pp. 645-678.

Xu Junling, Xu Baowen, Zhang Weifeng, Zhang Wei and Hou Jun, 2009. Stable initialization scheme for K-means clustering,

Wuhan University Journal of National Sciences, Vol. 14, No. 1, pp. 24-28.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.