Open Access Open Access  Restricted Access Subscription or Fee Access

A Survey on Data Clustering Algorithms

N. Kamalraj, V. Shobana


Clustering is a technique adapted in many real world applications. Generally clustering can be thought of as partitioning the data into group or subsets, which contain analogous objects. A lot of clustering techniques like K-Means algorithm, Fuzzy C-Means algorithm (FCM), spectral clustering algorithm and so on has been proposed earlier in literature. Recently, clustering algorithms are extensively used for mixed data types to evaluate the performance of the clustering techniques. This paper presents a survey on various clustering algorithms that are proposed earlier in literature. Moreover it provides an insight into the advantages and limitations of some of those earlier proposed clustering techniques. The comparison of various clustering techniques is provided in this paper. The future enhancement section of this paper provides a general idea for improving the existing clustering algorithms to achieve better clustering accuracy.  


Artificial Intelligence, Clustering, Mixed dataset, Learning Algorithm, Image Processing

Full Text:



P. Berkhin, “Survey of Clustering Data Mining Techniques,” A Book on grouping multi-dimensional data, pp. 25-71, 2006.

Faruq A. Al-Omari, and Nabeel I. Al-Fayoumi, “IMDC: An Image-Mapped Data Clustering Technique for Large Datasets,” World Academy of Science, Engineering and Technology, 2005.

Biernacki, G. Celeux, and G. Govaert, “Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood,” IEEE Trans on pattern analysis and Machine Intelligence, vol. 22, no. 7, pp. 719-725, 2000.

Mu-Chun Su, and Chien-Hsing Chou, “Modified Version of the K Means Algorithm with a Distance Based on Cluster Symmetry,” IEEE Transactions on Patterns Analysis and Machine Intelligence, vol. 23, no. 6, pp.674-680, June 2001.

R. Ostrovsky and Y. Rabani, “Polynomial time approximation schemes for geometric k-clustering,” Proceedings of the 41st Annual Symposium on Foundations of Computer Science, pp.349, Telcordia Technologies, Morristown, NJ, USA, 2000.

Andrew K.C. Wong, and Gary C.L. Li, “Simultaneous Pattern and Data Clustering for Pattern Cluster Analysis,” IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 7, pp. 911-923, 2008.

Guangsheng Feng, Huiqiang Wang, Qian Zhao, and Ying Liang, “A Novel Clustering Algorithm for Prefix-Coded Data Stream Based upon Median-Tree,” IEEE, International Conference on Internet Computing in Science and Engineering, ICICSE '08, pp. 79-84, 2008.

P. S. Bradley, and U. M. Fayyad, “Refining Initial Points for K-Means Clustering,” ACM, Proceedings of the 15th International Conference on Machine Learning, pp. 91-99, 1998.

Ana L. N. Fred and A. K. Jain, “Robust Data Clustering,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '03), vol. 2, pp.128, 2003.

N. Kumar, and K. Kummamuru, “Semisupervised Clustering with Metric Learning using Relative Comparisons,” IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 4, pp. 496-503, 2008.

T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: an efficient data clustering method for very large databases,” ACM SIGMOD, vol. 25, no. 2, pp. 103-114, 1996.

F. Yang, T. Sun, and C. Zhang, “An efficient hybrid data clustering method based on K-harmonic means and Particle Swarm Optimization,” An International Journal on Expert Systems with Applications, vol. 36, no. 6, pp. 9847-9852, 2009.

Aristidis Likas, Nikos Vlassis, and Jakob J. Verbeek, “The global k-means clustering algorithm,” The Journal of Pattern Recognition society, Elsevier, vol. 36, no. 2, pp. 451-461, 2003.

X. Z. Fern, and C. E. Brodley, “Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach,” Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC, 2003.

T. Hofmann, and J. M. Buhmann, “Pairwise data clustering by deterministic annealing,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 1, pp. 1-14, 1997.

Michael Steinbach, George Karypis, and Vipin Kumar, “A Comparison of Document Clustering Techniques,” 2000.

Andrew McCallum, Kamal Nigam, and Lyle H. Ungar, “Efficient Clustering of High Dimensional Data Sets with Application to Reference Matching,” 2000.

Zhexue Huang, “Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values,” Journal on Data Mining and Knowledge Discovery, Springer, vol. 2, no. 3, pp. 283-304, 1998.

Marcelo Blatt, Shai Wiseman, and Eytan Domany, “Data Clustering Using a Model Granular Magnet,” MIT Press Journals on Neural Computation, vol. 9, no.8, pp. 1805-1842, and 1997.

Alexander Hinneburg, and Daniel A. Keim, “An Efficient Approach to Clustering in Large Multimedia Databases with Noise,” American Association for Artificial Intelligence, 1998.

Lance Parsons, Ehtesham Haque and Huan Liu, “Subspace clustering for high dimensional data: a review,” ACM SIGKDD, vol. 6, no. 1, pp. 90-105, 2004.

Charu C. Aggarwal, Jiawei Han, Jianyong Wang and Philip S. Yu, “A framework for clustering evolving data streams,” ACM, Proceedings of the 29th international conference on Very large data bases, vol. 29, pp. 81-92, 2003.

Hongyuan Zha, Xiaofeng He, Chris Ding, Horst Simon, and Ming Gu, “Bipartite graph partitioning and data clustering,” Conference on Information and Knowledge Management, Proceedings of the tenth international conference on Information and knowledge management, pp. 25-32, 2001.

D. Chaudhuri, and B. B. Chaudhuri, “A novel multiseed nonhierarchical data clustering technique,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 27, no. 5, pp. 871-876, 1997.

C. Bouveyron, S. Girard, and C. Schmid, “High-dimensional data clustering,” Computational Statistics & Data Analysis, Elsevier, vol. 52, no. 1, pp. 502-519, 2007.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.