Open Access Open Access  Restricted Access Subscription or Fee Access

Gene Data Classification Using Hybrid Hierarchical Multi-label Classifier

Dr. Santhi Thilagam, Rama Sri Sindhura


Gene function prediction is a multi-class classification problem since genes typically play multiple roles biologically. The predictions can then be given to biologists for experimental validation. As such, we face a more challenging classification problem than typical binary classification that only needs to determine whether a gene belongs to a particular functional class or not. The solution to this problem has been formulated using Predictive Clustering Trees and its implementation exists. We attempt to improve the accuracy of prediction of the results of the above implementation using additional single classifiers. We define an appropriate distance metric for hierarchical multi-classification and present experiments evaluating this approach on a number of data sets that are available for yeast.


Hierarchical multi-label classification, Gene prediction, Predictive Clustering Trees.

Full Text:



Smith, T.F., Waterman, M.S.: Identification of Common Molecular Subsequences. J. Mol. Biol. 147, 195--197 (1981)

Jan Struyf, Saˇso Dˇzeroski, Hendrik Blockeel, Amanda Clare “Hierarchical Multi-classification with Predictive Clustering Trees in Functional Genomics” In proceedings of the International Conference on Machine Learning 2008.

Jan Struyf, Saˇso Dˇzeroski, Hendrik Blockeel, Amanda Clare “Decision Trees for Hierarchical Multi-label Classification: A Case Study in Functional Genomics” 2006.

Zafer Barutcuoglu, Robert E. Schapire, and Olga G. Troyanskaya. Hierarchical multi-label prediction of gene function. Bioinformatics, 22(7): 830 -836, 2006.

A. Elisseeff and J.Weston. A kernel method for multi-labelled classification. In Advances in Neural Information Processing Systems 14, 2002

M-L Zhang and Z-H Zhou. A k-nearest neighbor based algorithm for multi-label classification. In Proceedings of the 1st IEEE International Conference on Granular Computing, pages 718-721, 2005.

Gao, F. and Zhang, C.T. 2004. Comparison of various algorithms for recognizing short coding sequences of human genes. Bioinformatics 20: 673-681.

Allen, J.E., et al. 2004. Computational gene prediction using multiple sources of evidence. Genome Res.14: 142-148.

P. Pavlidis et al. Learning gene functional classification from multiple data. Journal of Computational Biology pages 401–411, 2002.

Boutell, M.R., Luo, J., Shen, X. & Brown, C.M. (2004), 'Learning multi-label scene classification', Pattern Recognition, vol. 37, no. 9, pp. 1757-71.

Clare, A. & King, R.D. (2001), 'Knowledge Discovery in Multi-Label Phenotype Data', paper presented to Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2001), Freiburg, Germany

Elisseeff, A. & Weston, J. (2002), 'A kernel method for multi-labelled classification', paper presented to Advances in Neural Information Processing Systems.

Godbole, S. & Sarawagi, S. (2004), 'Discriminative Methods for Multi-labeled Classification', paper presented to Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2004)

Schapire, R.E. & Singer, Y. (2000), 'Boostexter: a boosting-based system for text categorization', Machine Learning, vol. 39, no. 2/3, pp. 135-68

Thabtah, F.A., Cowling, P. & Peng, Y. (2004), 'MMAC: A New Multi-class, Multi-label Associative Classification Approach', paper presented to Proceedings of the 4th IEEE International Conference on Data Mining, ICDM '04

Zhang, M.-L. & Zhou, Z.-H. (2005), 'A k-Nearest Neighbor Based Algorithm for Multi-label Classification', paper presented to Proceedings of the 1st IEEE International Conference on Granular Computing


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.