Open Access Open Access  Restricted Access Subscription or Fee Access

Improved K-Means Clustering using Constraints and Centroid Initialization

P. Boopathi

Abstract


The rapid worldwide increase in the data available leads to the difficulty for analyzing those data. Organizing data into interesting collection is one of the most basic forms of understanding and learning. Thus, a proper data mining approach is required to organize those data for better understanding. Clustering is one of the standard approaches in the field of data mining. The main of this approach is to organize a dataset into a set of clusters, which consists of similar data items, as calculated by some distance function. K-Means algorithm is the widely used clustering algorithm because of its ability and simple nature. When the dataset is larger, K-Means will misclassify the data points. For overcoming this problem, some constraints must be included in the algorithm. The resulting algorithm is called as Constrained K-Means Clustering. The constraints used in this paper are Must-link constraint, Cannot-link constraint, δ-constraint and ε-constraint. For generating the must-link and cannot-link constraints, Self Organizing Map (SOM) is used in this paper. The accuracy of clustering can be further improved by initializing the centroid instead of random generation. For this purpose, this paper uses Ant Colony Optimization (ACO). The experimental result shows that the proposed algorithm results in better classification than the standard K-Means clustering technique.

Keywords


K-Means, Self Organizing Map (SOM), Constrained K-Means, Ant Colony Optimization (ACO)

Full Text:

PDF

References


Zhang Zhe, Zhang Junxi and Xue Huifeng, "Improved K-Means Clustering Algorithm", Congress on Image and Signal Processing, Vol. 5, Pp. 169-172, 2008.

Hai-xiang Guo, Ke-jun Zhu, Si-wei Gao and Ting Liu, "An Improved Genetic k-means Algorithm for Optimal Clustering", Sixth IEEE International Conference on Data Mining Workshops, Pp. 793-797, 2006.

Yanfeng Zhang, Xiaofei Xu and Yunming Ye, "NSS-AKmeans: An Agglomerative Fuzzy K-means clustering method with automatic selection of cluster number", 2nd International Conference on Advanced Computer Control, Vol. 2, Pp. 32-38, 2010.

Xiaoyun Chen, Youli Su, Yi Chen and Guohua Liu, "GK-means: an Efficient K-means Clustering Algorithm Based on Grid", International Symposium on Computer Network and Multimedia Technology, Pp. 1-4, 2009.

Trujillo, M., Izquierdo, E., "Combining K-means and semivariogram-based grid clustering", 47th International Symposium, Pp. 9-12, 2005.

Huang, J.Z., Ng, M.K., Hongqiang Rong and Zichen Li, "Automated variable weighting in k-means type clustering", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 5, Pp. 657-668, 2005.

Yi Hong and Sam Kwong “Learning Assignment Order of Instances for the constrained k-means clustering algorithm” IEEE Transactions on Systems, Man, and Cybernetics, Vol 39, No 2. April, 2009.

I. Davidson,M. Ester and S.S. Ravi, “Agglomerative hierarchical clustering with constraints: Theoretical and empirical results”, in Proc. of Principles of Knowledge Discovery from Databases, PKDD 2005.

Wagstaff, Kiri L., Basu, Sugato, Davidson, Ian “When is constrained clustering beneficial, and why?” National Conference on Aritficial Intelligence, Boston, Massachusetts 2006.

Kiri Wagstaff, Claire Cardie, Seth Rogers, Stefan Schrodl “Constrained K-means Clustering with Background Knowledge” ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning, 2001.

I. Davidson, M. Ester and S.S. Ravi, “Efficient incremental constrained clustering”. In Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2007, August 12-15, San Jose, California, USA.

I. Davidson, M. Ester and S.S. Ravi, “Clustering with constraints: Feasibility issues and the K-means algorithm”, in proc. SIAM SDM 2005, Newport Beach, USA.

D. Klein, S.D. Kamvar and C.D. Manning, “From Instance-Level constraintes to space-level constraints: Making the most of Prior Knowledge in Data Clustering”, in proc. 19th Intl. on Machine Learning (ICML 2002), Sydney, Australia, July 2002, p. 307-314.

N. Nguyen and R. Caruana, “Improving classification with pairwise constraints: A margin-based approach”, in proc. of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD‟08).

K. Wagstaff, C. Cardie, S. Rogers and S. Schroedl, “Constrained Kmeans clustering with background knowledge”, in: Proc. Of 18th Int. Conf. on Machine Learning ICML‟01, p. 577 - 584.

Y. Hu, J. Wang, N. Yu and X.-S. Hua, “Maximum Margin Clustering with Pairwise Constraints”, in proc. of the Eighth IEEE International Conference on Data Mining (ICDM) , 253-262, 2008.

Merz C and Murphy P, UCI Repository of Machine Learning Databases, Available: ftp://ftp.ics.uci.edu/pub/machine-Learning-databases.

Nada M. A. Al Salami, "Ant Colony Optimization Algorithm", UbiCC Journal, Volume 4, Number 3, 2009.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.