An Analytical Study on Behavior of Clusters Using EM and K-Means Algorithm
Abstract
Clustering is an unsupervised learning method that constitutes a cornerstone of an intelligent data analysis process. It is used for the exploration of inter-relationships among a collection of patterns, by organizing them into homogeneous clusters. Clustering has been dynamically applied to a variety of tasks in the field of Information Retrieval (IR). Clustering has become one of the most active area of research and the development. Clustering attempts to discover the set of consequential groups where those within each group are more closely related to one another than the others assigned to different groups. The resultant clusters can provide a structure fororganizing large bodies of text for efficient browsing and sea rching. There exists a wide variety of clustering algorithms that has been intensively studied in the clustering problem. Among the algorithms that remain the most common and effectual, the iterative optimization clustering algorithms have been demonstrated reasonable performance for clustering, e.g. the Expectation Maximization (EM) algorithm and its variants, and the well known kmeans algorithm. This paper presents an analysis on how partition method clustering techniques – EM and K -means algorithm work on heartspect dataset with below mentioned features – Purity, Entropy, CPU time, Cluster wise analysis, Mean value analysis and inter cluster distance. Thus the paper finally provides the experimental results of datasets for five clusters to strengthen the results that the quality of the behavior in clusters in EM algorithm is far better than kmeans algorithm.
Keywords
Full Text:
PDFReferences
G.J. McLachlan and T. Krishnan, “The EM Algorithm and Extensions”,Wiley, 1997.
Yiu-Ming Cheung, k*-Means: A new generalized k-means clustering algorithm, Pattern Recognition Letters 24, 2003.
Michiko Watanabe and Kazunori Yamaguchi, “The EM Algorithm and Related Statistical Models” in 2004
S.B. Kotsiantis, P. E. Pintelas, “Recent Advances in Clustering: A Brief Survey”, 2005.
Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu, “Efficient Algorithms for K-Means Clustering”.
William J. Palm III, University of Rhode Island, “A Concise Introduction to MATLAB”, McGraw hill, 2008.
Oren Kurland Lillian Lee, Clusters, language models, and ad hoc information retrieval, Volume 27, Issue 3-ACM Transactions on Information Systems (TOIS), May 2009.
Hans-Peter Kriegel, Peer Kröger, Arthur Zimek, “Clustering highdimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering”, Article No. 1 Volume 3 , Issue 1 ACM Transactions on Knowledge Discovery from Data (TKDD), March 2009.
Changqing Zhou, Dan Frankowski, Pamela Ludford, Shashi Shekhar,Loren Terveen, “Discovering personally meaningful places: An interactive clustering approach”, Volume 25, Issue 3 ACM Transactions on Information Systems (TOIS), July 2007.
Achtert, E., Böhm, C., David, J., Kröger, P., and Zimek, A., “Robust clustering in arbitrarily oriented subspaces”, In Proceedings of the 8th SIAM International Conference on Data Mining (SDM), 2008.
Strehl, A., Ghosh, J., and Mooney, R. J. Impact of similarity measures on webpage clustering. In Proceedings of AAAI Workshop on AI for Web Search, pages 58–64, 2000.
Strehl, A., and Ghosh, J. Cluster ensembles a knowledge reuse framework for combining multiple partitions. Journal on Machine Learning Research, 3:583–617, 2002.
McCallum, A. K. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/~mccallum/bow.
Rasmussen, E. Clustering algorithms. In W. Frakes and R. BaezaYates, editors, Information retrieval: data structures and algorithms. Prentice Hall, 1992
Salton, G., and Buckley, C. Termweighting approaches in automatic text retrieval. Information Processing and Management: an International Journal, 24(5):513–523, 1988.
He, J., Tan, A.H., Tan, C.L., and Sung, S.Y. On Quantitative Evaluation of Clustering Systems. In W. Wu and H. Xiong, editors, Information Retrieval and Clustering. Kluwer Academic Publishers, 2003.
Boley, D., and Borst, V. unsupervised clustering: A fast scalable method for large datasets. CSE Report TR99029, University of Minnesota, 1999.
Bradley, P. S., and Fayyad, U. M, “Refining initial points for kmeans clustering”, In Proceedings of the Fifteenth International Conference on Machine Learning, pages 91–99, 1998.
Boley, D. Principal direction divisive partitioning, Data mining and Knowledge Discovery, 2(4):325–344, 1998.
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 3.0 License.