Open Access Open Access  Restricted Access Subscription or Fee Access

A Survey on Similarity Measures for Microarray Gene Expression Data Analysis

S.P. Vidhya Priya, N.S. Nithya


Microarray technology is a present advancement used to concurrently monitor the expression profiles of thousands of genes under different experimental conditions. This paper first momentarily introduce the concepts of microarray technology, survey on similarity measure and discuss the basic elements of clustering on gene expression data. Finding groups of gens with similar expression is usually achieved by exploratory techniques such as cluster analysis. From the detailed survey it mainly concentrates on similarity measure. Similarity measure is important task in gene expression data for clustering technique. In gene expression data two similarity measures are used .Mutual Information similarity measure will be used first and then redundancy can be removed, after that Intuitionistic Fuzzy Sets are used to get more accuracy and it can be applicable for multiple data sets.


Mutual Information, Intuitionistic Fuzzy Sets, Gene Based Clustering, Similarity Measure

Full Text:



Pradipta Maji and Chandra Das, “Relevant and Significant Supervised Gene Clusters for Microarray Cancer Classification”, IEEE transactions on nano bioscience, vol. 11, no. 2,June 2012.

P. Maji, “Mutual information based supervised attribute clustering for microarray sample classification,” IEEE transactions on knowledge and data engineering, vol. 24, no. 1, January 2012.

Wai-Ho Au Member, IEEE, Keith C. C. Chan, Andrew K. C. Wong, Fellow, IEEE, and Yang Wang, Member, IEEE “ Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data”, IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 2, no. 2, pp. 83-101, Apr.-June 2005.

P. Maji, “Fuzzy-Rough Supervised Attribute Clustering Algorithm and Classification of Microarray Data”, IEEE Transactions on System, Man, and Cybernetics, Part B: Cybernetics, vol. 41, no. 1, pp. 222–233, 2011.

Zhiwen Yu Member, IEEE, Jane You Member, IEEE, Le Li, Hau-San Wong Member, IEEE, Guoqiang Han “ Representative distance: a new similarity measure for class discovery from gene expression Data”,IEEE Transactions on nano bioscience ,2011.

Daxin Jiang, Chun Tang, and Aidong Zhang, “Cluster Analysis for Gene Expression Data: A Survey”, IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 11, November 2004.

Pavel Berkhin,”Survey of Clustering Data Mining Techniques”, accurate software, (2002).

Sung-Hyuk Cha, “Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions”, International journal

of mathematical models and methods in applied sciences issue 4, volume 1, 2007.

Binyamin Yusoff, Imran Taib, Lazim Abdullah and Abd Fatah Wahab, “A New Similarity Measure on Intuitionistic Fuzzy Sets”, International Journal of Computational and Mathematical Sciences 5:2 2011.

B. Sridevi and R. Nadarajan, “ Fuzzy Similarity Measure for Generalized Fuzzy Numbers”, Int. J. Open Problems Compt. Math., Vol. 2, No. 2, June 2009.

Young Sook Son, Jangsun Baek,’’ A modified correlation coefficient based similarity measure for clustering time-course gene expression data”, Elsevier-Pattern Recognition Letters 29 , pp.232-242, 2008.

Hongmei Wang, Sanghyuk Lee, and Jaehyung Kim, “Quantitative Comparison of Similarity Measure and Entropy for Fuzzy Sets”, Springer-Verlag Berlin Heidelberg pp. 688–695, 2009.

J.Jeba emilyn, Dr.k.Ramar, “rough set based clustering of gene expression data: a survey”, J.Jeba emilyn et. al. / international journal of engineering science and technology vol. 2 (12), 7160-7164, 2010.

Sathi Mukherjee1, and Kajla Basu, “Solving Intuitionistic Fuzzy Assignment Problem by using Similarity Measures and Score Functions”, International Journal of Pure and Applied Sciences and Technology ISSN 2229 – 6107, 2011.

Atanassov K, “Intuitionistic fuzzy sets, Fuzzy Sets and Systems”, 20 (1986) 87-96.

Atanassov K, “Intuitionistic Fuzzy Sets: Theory and Applications”, Physica-Verlag, 1999.

C. Ding and H. Peng, “Minimum Redundancy Feature Selection from Microarray Gene Expression Data,” J. Bioinformatics and Computational Biology, vol. 3, no. 2, pp. 185-205, 2005.

H. Peng, F. Long, and C. Ding, “Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy,” IEEE Trans. Pattern Analysis IEEE Trans Pattern Anal Mach Intell, 27(8):1226-38, Aug 2005.

R. Battiti, “Using Mutual Information for Selecting Features in Supervised Neural Net Learning”, IEEE Trans. Neural Networks, vol. 5, no. 4, pp. 537-550, July 1994.

D. Huang and T.W.S. Chow, “Effective Feature Selection Scheme Using Mutual Information”, Neurocomputing, vol. 63, pp. 325-343, 2004.

X. Liu, A. Krishnan, and A. Mondry,“An Entropy Based Gene Selection Method for Cancer Classification Using Microarray Data”, BMC Bioinformatics, vol. 6, no. 76, pp. 1-14, 2005.

I. Dhillon, S. Mallela, and R. Kumar, “Divisive Information- Theoretic Feature Clustering Algorithm for Text Classification”, J. Machine Learning Research, vol. 3, pp. 1265-1287, 2003.

P. Maji, “f-Information Measures for Efficient Discriminative Genes from Microarray Data,” IEEE Trans. Biomedical Eng., vol. 56, no. 4, pp. 1063-1069, Apr. 2009.

C. Shannon and W. Weaver, “The Math Theory of Communication Univ. Illinois Press”, 1964.

Zadeh L.A, “Fuzzy sets. Information and Control”, pp. 338 – 353,1965.

L. J. Heyer, S. Kruglyak, and S. Yooseph, “Exploring Expression Data: Identification and Analysis of Coexpressed Genes,” Genome Research, vol. 9, pp. 1106–1115, 1999.

K. Fukunaga, “Introduction to Statistical Pattern Recognition”, Academic Press, 1990.

H. Wang, W. Wang, Y. Wei, J. Yang, and P.S. Yu, “Clustering byPattern Similarity in Large Data Sets”, SIGMOD 2002, Proc. ACM SIGMOD Int’l Conf. Management of Data, pp. 394-405, 2002.

S. Tavazoie, D. Hughes, M.J. Campbell, R.J. Cho, and G.M. Church,“Systematic Determination of Genetic Network Architecture”, Nature Genetics, pp. 281-285, 1999.

F.D. Smet, J. Mathys, K. Marchal, G. Thijs, M. Moor, D. Bart, and Y. Moreau, “Adaptive Quality-Based Clustering of Gene Expression Profiles”, Bioinformatics, vol. 18, pp. 735-746, 2002.

R. Shamir and R. Sharan, “Click: A Clustering Algorithm for Gene Expression Analysis”, Proc. Eighth Int’l Conf. Intelligent Systems for Molecular Biology (ISMB ’00), 2000.

D. J. C. MacKay, “Information Theory, Inference, and Learning Algorithms”, Cambridge, U.K.: Cambridge University Press, 2003.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.