Open Access Open Access  Restricted Access Subscription or Fee Access

Clustering Algorithms for Biological Data - A Survey Approach

M. Yasodha, M. Mohanraj


Data mining and knowledge extraction is a significant problem in bioinformatics. Biological data mining is an emerging field of research and development. Essential knowledge can be extracted from these data by the use of data analysis techniques. Clustering plays a very important role in data analysis, by organizing similar objects from a dataset into meaningful groups. Clustering analysis has become an obligatory investigative technique for many problems in bioinformatics. Clustering is the task of grouping a set of objects into different subsets such that objects belonging to the same cluster are highly similar to each other. Several clustering algorithms like K-Means clustering, Fuzzy c means clustering, probabilistic clustering, and so on have been proposed in the literature. One common technique is supervised and unsupervised clustering to partition the experimental data. This paper presents a survey on various clustering techniques that are employed for integrating biological data. This paper further provides an overview of limitations of some of the clustering techniques that are proposed in literature. The future enhancement section of this paper discusses some of the fundamental ideas to improve the clustering accuracy of the earlier proposed algorithms.


Bioinformatics, Biological Data, Clustering, Data Mining, Knowledge Extraction, and Subsets.

Full Text:



M. Akay, “Special Issue on Bioinformatics, Part I: Advances and Challenges,” Proceedings of the IEEE, vol. 90, no. 11, pp. 1703-1704, November 2002.

A. Jain and R. Dubes, “Algorithms for Clustering Data,” Prentice Hall, 1998.

M. Kirsten, S. Wrabel, and T. Horvath, “Distance based approaches to relational learning and clustering,” In Relational data mining, pp. 213–230. Springer-Verlag New York, Inc, 2000.

M. Ankerst, M. Breunig, H. P. Kriegel, and J. Sander, “Optics: Ordering points to identify the clustering structure,” In ACM SIGMOD, 1999.

J. MacQueen, “Some methods for classification and analysis of multivariate observations,” In L. M. L. Cam and J. Neyman, editors, Proc. of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. I, Statistics.

Huimin Geng, Dhundy Bastola, and Hesham Ali, “A New Approach to Clustering Biological Data Using Message Passing,” IEEE Computational Systems Bioinformatics Conference (CSB‟04), pp. 493-494, 2004.

Huimin Geng, Xutao Deng, Dhundy Bastola, and Hesham Ali, “On clustering biological data using unsupervised and semi-supervised message passing,” pp. 294-298, 2005.

Jongil Jeong, Byunggul Ryu, Dongil Shin, and Dongkyoo Shin, “Integration of Distributed Biological Data Using Modified K-Means Algorithm,” Book on Emerging Technologies in Knowledge Discovery and Data Mining, vol. 4819, pp. 469-475, 2009.

Eran Segal, and Daphne Koller, “Probabilistic hierarchical clustering for biological data,” Proceedings of the sixth annual international conference on Computational biology, pp. 273-280, 2002.

E. Segal, D. Koller, and D. Ormoneit, “Probabilistic abstraction hierarchies,” In Proceedings of NIPS, 2001.

Desheng Huang, and Wei Pan, “Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data,” Oxford Journals on Bioinformatics, vol. 22, no. 10, pp. 1259-1268, 2006.

Mariá C. V. Nascimento, Franklina M. B. Toledo, and André C. P. L. F. de Carvalho, “Investigation of a new GRASP-based clustering algorithm applied to biological data,” Elsevier Journal on Computers & Operations Research, vol. 37, no. 8, pp. 1381-1388, 2010.

Michele Ceccarelli, and Antonio Maratea, “Improving fuzzy clustering of biological data by metric learning with side information,” International Journal of Approximate Reasoning and Machine learning for Bioinformatics, vol. 47, no. 1, pp. 45-57, 2008.

Daniel Hanisch, Alexander Zien, Ralf Zimmer, and Thomas Lengauer, “Co-clustering of biological networks and gene expression data,” Oxford Journals on Bioinformatics, vol. 18, no. 9, pp. 145-154, 2002.

William Pentney, and Marina Meila, “Spectral Clustering of Biological Sequence Data,” AAAI publications, pp. 845-850, 2005.

Robert Beverly, “Robust DNA Microarray Clustering Techniques for Oncological Diagnosis,” MIT Computer Science and Artificial Intelligence Laboratory, 2005.

Aynur Dayanik, and Craig G. Nevill-Manning, “Clustering in Relational Biological Data,”, 2004.

Shreyas Sen, Seetharam Narasimhan, and Amit Konar, “Biological Data Mining for Genomic Clustering Using Unsupervised Neural Learning,” Engineering Letters, vol. 14, no. 2, 2007.

Xuemei Ning, and Shihua Zhang, “A Robust Clustering Technique for Grouping Biological Data: an Illustrative Study in Gene Expression Data,” The Third International Symposium on Optimization and Systems Biology (OSB‟09), pp. 267-275, 2009.

Gunjan K. Gupta, Alexander Y. Liu, and Joydeep ghosh, “Clustering and Visualization of High-Dimensional Biological Datasets Using a Fast HMA Approximation,” The University of Texas at Austin, 2003.

Jinyan Li, Limsoon Wong, and Qiang Yang, “Introduction: Data Mining in Bioinformatics,” IEEE Transactions on Intelligent Systems, vol. 20, no. 6, pp. 16-18, 2005.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.