Open Access Open Access  Restricted Access Subscription or Fee Access

A Survey on Data Clustering Algorithms

R. Shanmugasundaram, Dr. M. Punithavalli

Abstract


Clustering is a significant area of application for a range of fields including data mining, statistical data analysis, image compression, and vector quantization. Moreover Clustering has been formulated in different manners in machine learning, pattern recognition, optimization, and statistics literature. The basic problem in clustering arise at grouping together (clustering) data streams which are analogous to each other. A variety of algorithms have emerged that meet the requirements and were successfully applied to real-life data clustering problems. This paper makes a general survey on various Clustering algorithms that have been proposed earlier in literature. In addition the future enhancement section of this paper suggests some of the modifications of earlier proposed work to overcome their limitations.

Keywords


Clustering, Data Mining, Image Compression, Machine Learning, Optimization, Pattern Recognition, Statistical Data Analysis, Vector Quantization.

Full Text:

PDF

References


P. Berkhin, “Survey of Clustering Data Mining Techniques,” A Book on grouping multi-dimensional data, pp. 25-71, 2006.

Faruq A. Al-Omari, and Nabeel I. Al-Fayoumi, “IMDC: An Image-Mapped Data Clustering Technique for Large Datasets,” World Academy of Science, Engineering and Technology, 2005.

Biernacki, G. Celeux, and G. Govaert, “Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood,” IEEE Trans on pattern analysis and Machine Intelligence, vol. 22, no. 7, pp. 719-725, 2000.

Mu-Chun Su, and Chien-Hsing Chou, “Modified Version of the K Means Algorithm with a Distance Based on Cluster Symmetry,” IEEE Transactions on Patterns Analysis and Machine Intelligence, vol. 23, no. 6, pp.674-680, June 2001.

R. Ostrovsky and Y. Rabani, “Polynomial time approximation schemes for geometric k-clustering,” Proceedings of the 41st Annual Symposium on Foundations of Computer Science, pp.349, Telcordia Technologies, Morristown, NJ, USA, 2000.

Andrew K.C. Wong, and Gary C.L. Li, “Simultaneous Pattern and Data Clustering for Pattern Cluster Analysis,” IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 7, pp. 911-923, 2008.

Guangsheng Feng, Huiqiang Wang, Qian Zhao, and Ying Liang, “A Novel Clustering Algorithm for Prefix-Coded Data Stream Based upon Median-Tree,” IEEE, International Conference on Internet Computing in Science and Engineering, ICICSE '08, pp. 79-84, 2008.

P. S. Bradley, and U. M. Fayyad, “Refining Initial Points for K-Means Clustering,” ACM, Proceedings of the 15th International Conference on Machine Learning, pp. 91-99, 1998.

Ana L. N. Fred and A. K. Jain, “Robust Data Clustering,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '03), vol. 2, pp.128, 2003.

N. Kumar, and K. Kummamuru, “Semisupervised Clustering with Metric Learning using Relative Comparisons,” IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 4, pp. 496-503, 2008.

T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: an efficient data clustering method for very large databases,” ACM SIGMOD, vol. 25, no. 2, pp. 103-114, 1996.

F. Yang, T. Sun, and C. Zhang, “An efficient hybrid data clustering method based on K-harmonic means and Particle Swarm Optimization,” An International Journal on Expert Systems with Applications, vol. 36, no. 6, pp. 9847-9852, 2009.

Aristidis Likas, Nikos Vlassis, and Jakob J. Verbeek, “The global k-means clustering algorithm,” The Journal of Pattern Recognition society, Elsevier, vol. 36, no. 2, pp. 451-461, 2003.

X. Z. Fern, and C. E. Brodley, “Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach,” Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC, 2003.

T. Hofmann, and J. M. Buhmann, “Pairwise data clustering by deterministic annealing,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 1, pp. 1-14, 1997.

Michael Steinbach, George Karypis, and Vipin Kumar, “A Comparison of Document Clustering Techniques,” 2000.

Andrew McCallum, Kamal Nigam, and Lyle H. Ungar, “Efficient Clustering of High Dimensional Data Sets with Application to Reference Matching,” 2000.

Zhexue Huang, “Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values,” Journal on Data Mining and Knowledge Discovery, Springer, vol. 2, no. 3, pp. 283-304, 1998.

Marcelo Blatt, Shai Wiseman, and Eytan Domany, “Data Clustering Using a Model Granular Magnet,” MIT Press Journals on Neural Computation, vol. 9, no.8, pp. 1805-1842, and 1997.

Alexander Hinneburg, and Daniel A. Keim, “An Efficient Approach to Clustering in Large Multimedia Databases with Noise,” American Association for Artificial Intelligence, 1998.

Lance Parsons, Ehtesham Haque and Huan Liu, “Subspace clustering for high dimensional data: a review,” ACM SIGKDD, vol. 6, no. 1, pp. 90-105, 2004.

Charu C. Aggarwal, Jiawei Han, Jianyong Wang and Philip S. Yu, “A framework for clustering evolving data streams,” ACM, Proceedings of the 29th international conference on Very large data bases, vol. 29, pp. 81-92, 2003.

Hongyuan Zha, Xiaofeng He, Chris Ding, Horst Simon, and Ming Gu, “Bipartite graph partitioning and data clustering,” Conference on Information and Knowledge Management, Proceedings of the tenth international conference on Information and knowledge management, pp. 25-32, 2001.

D. Chaudhuri, and B. B. Chaudhuri, “A novel multiseed nonhierarchical data clustering technique,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 27, no. 5, pp. 871-876, 1997.

C. Bouveyron, S. Girard, and C. Schmid, “High-dimensional data clustering,” Computational Statistics & Data Analysis, Elsevier, vol. 52, no. 1, pp. 502-519, 2007.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.