Open Access Open Access  Restricted Access Subscription or Fee Access

Support Vector Machine Approach for Isomerases Prediction Problem

Lavanya Rishishwar, Neha Mishra, Bhasker Pant, Kumud Pant, Kamal R. Pardasani

Abstract


As the proteinic enzyme sequences are entering the databases at a prodigious rate, the functional annotation of these sequences has become a major challenge in the field of Bioinformatics. The dispersion in the data makes this task even tougher. The authors illustrate in this paper a simple yet efficient way for functionally characterizing a novel enzyme by the application of support vector machines. The best accuracy gained by this method on generalization test is 91.55% with Mathew's Correlation Coefficient (MCC) of 0.63. The method was further validated by three different types of testing. The resulting accuracy for the LOO estimate was found to be 91.05% with MCC of 0.62 henceforth resolving any over fitting of data that may be present in the instance sets.

Keywords


Isomerases, Support Vector Machine (SVM), Leave-one-out Estimates, Amino Acid Composition

Full Text:

PDF

References


Webb, C. Edwin, “Enzyme nomenclature (1992): Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the nomenclature and classification of enzymes”. San Diego: Published for the International Union of Biochemistry and Molecular Biology by Academic Press, 1992.

EC 5 Introduction from the Department of Chemistry at Queen Mary, University of London.

“Isomerase”. Dorland's Medical Dictionary for Health Consumers. 2007. Saunders, an imprint of Elsevier, Inc 6 Nov. 2009 http://medical-dictionary.thefreedictionary.com/isomerase.

“Isomerase”. Webster's New World College Dictionary. LoveToKnow, n.d. Web. 6 November 2009, http://www.yourdictionary.com/isomerase.

Priyadarshi, Eun Hye Lee, Min Woo Sung, Ki Hyun Nam, Won Ho Lee, Eunice Eunkyeong Kim, and Kwang Yeon Hwang, " Structural insights into the alanine racemase from Enterococcus faecalis, " Biochim Biophys Acta., vol. 1794 (7), 2009 Jul., pp. 1030-40.

L. Daniel Milligan, L. Sieu Tran, Ulrich Strych, M. Gregory Cook, and L. Kurt Krause, "The Alanine Racemase of Mycobacterium smegmatis Is Essential for Growth in the Absence of d-Alanine," J Bacteriol., vol. 189(22), Novembre 2007, pp. 8381–8386.

T. Michael Redmond, "Focus on Molecules: RPE65, the visual cycle retinol isomerase, " Exp Eye Res., vol. ED-88(5), pp. 846–847, 2009 May

Manoj Bhasin and G. P. S. Raghava, "GPCRsclass: a web tool for the classification of amine type of G-protein-coupled receptors," Nucleic Acids Research, 2005, Vol. 33(Web Server Issue), W143-W147.

Manoj Bhasin and G. P. S. Raghava, "Classification of Nuclear Receptors Based on Amino Acid Composition and Dipeptide Composition," The journal of Biological Chemistry, 2004, Vol. 279, pp. 23262-23266.

Ni Huang, Hu Chen and Zhirong Sun, " CTKPred: an SVM-based method for the prediction and classification of the cytokine superfamily," Protein Engineering Design and Selection, 2005, Vol. 18(8), pp. 365-368.

B. Louie, R. Higdon, E. Kolker, “A Statistical Model of Protein Sequence Similarity and Function Similarity Reveals Overly-Specific Function Predictions,” PLoS ONE, 2009, 4(10): e7546.

R. A. Laskowski, J. A. Watson, and J. A. Thornton, “ProFunc: a server for predicting protein function from 3D structure,”. Nucleic Acids Research, 2005, nar/gki414.

E. M. Marcotte, M. Pellegrimi, Ng H L, D. W.Rice, T. O. Yeates, and D. Elsenberg, “Detecting Protein Function and Protein-Protein Interactions from Genome Sequences,” Science, 2009, pp. 751-753.

M. Huynen, B. Snell, W. III Lathe, and Bork P, “Predicting Protein Function by Genomic Context: Quantitative Evaluation and Qualitative Inferences,” Genome Research, vol. ED-10, pp. 1204–1210, 2000.

R. Sharan, I. Ulitsky, and R. Shamir, “Network-based prediction of protein functions,” Molecular Systems Biology, Vol. 3:88, 2007.

Yedida, C. C. Chain, and Z. H. Duan, “Protein function prediction using decision trees,” IEEE, 2008, pp. 193-199.

E. Torda, J. B. Procter, and T. Huber, “Wurst: a protein threading server with a structural scoring function, sequence profiles and optimized substitution matrices,” Nucleic Acids Research, Vol.32 (Web Server Issue):W532-W535, 2004.

The UniProtein Knowledge bank at Swiss Institute of Bioinformatics, www.uniprot.org/ .

V. N. Vapnik, “The Nature of Statistical Learning Theory,” Springer Verlag, 1995, Second Ed.

J. Han, and M. Kamber,“Data mining: concepts and techniques. In: Classification and Prediction,” Morgan Kaufmann Publishers, 2006, Second Ed. 285-344.

N. Cristianini, and J.S. Taylor, “Support Vector Machines. In Support Vector Machines and other kernel-based learning methods,” Cambridge U. 93-112. 2000.

UniProtKB/Swiss-Prot protein knowledgebase release 57.10 statistics. 2009.

L. Rishishwar, N. Mishra, B. Pant, K. Pant, and K. R. Pardasani, “ProCoS: PROtein COmposition Server," Bioinformation, to be publised.

T. Joachims, “Making large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning,” B. Schölkopf and C. Burges and A. Smola (ed.), MIT-Press, http://svmlight.joachims.org, 1999.

Chih-Chung Chang and Chih-Jen Lin,“LIBSVM: a library for support vector machines,” http://www.csie.ntu.edu.tw/~cjlin/libsvm, 2001.

V. Kecman, “Learning and Soft Computing: Support Vector Machines, Neutral Networks and Fuzzy Logic Models In: Support Vector Machines,” Cambridge, MA: MIT Press, 2001.

R. Apweiler, et. al.,“UniProt: The Universal Protein Knowledgebase,” Nucleic Acids Reasearch, 2004.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.