Comparison of PCA and SVM for a West Indian Script-Gujarati

M.J. Baheti; A.V. Mane; M.S. Hannan; K.V. Kale

Comparison of PCA and SVM for a West Indian Script-Gujarati

M.J. Baheti, A.V. Mane, M.S. Hannan, K.V. Kale

Abstract

Through the dawn of technical era, translation of scanned document (handwritten or printed) into machine editable format has attracted many researchers. Gujarati is spoken and used as official language in Gujarat, a western state in India. In this paper an attempt is made to compare the offline recognition system for the isolated handwritten Gujarati numerals with database size of 800 numerals. As database was not available it has been created by us by taking samples from different people on specially designed sheet. For feature extraction affine invariant moments based model is used. We are using SVM classifier and PCA (to reduce dimensions of feature space) and used Euclidean similarity measure to classify the numerals. SVM classifier yielded 92% as recognition rate whereas PCA scored recognition rate of 84%. The comparison of SVM and PCA is made and it can be seen that SVM classifier has shown better results as compared to PCA classifier.

Keywords

Support Vector Machine, Principal Component Analysis, Gujarati handwritten numerals.

Full Text:

PDF

References

S. Antani and L. Agnihotri, “Gujarati Character Recognition” in Proceedings of the Fifth International Conference on Document Analysis and Recognition, 1999 pp 418-422.

Dholkia J., Yajnik A., and Negi A., “Wavelet Feature Based Confusion character sets for Gujarati script” in Proceedings of the International Conference on Computational Intelligence and Multimedia Applications, 2007 vol. 2, pp 366-370.

A. A. Desai, “Gujarati handwritten numeral optical character reorganization through neural network”, Pattern Recognition, vol. 43, pp 2582-2589, Jan 2010

R.K. Sinha and H. N. Mahabala “Machine recognition of Devnagari script”. IEEE Trans. Systems Man Cybern. 1979 Pgs.435–441.

Kumar S. and Singh C. “A Study of Zernike Moments and its use in Devanagari Handwritten Character Recognition” in Proceedings of the International Conference on Cognition and Recognition, 2005 pp 514-520

Bansal V., Integrating knowledge sources in Devnagari text recognition. Ph.D. Thesis, IIT Kanpur 1999

S Chaudhari and R M Gulati “A font size independent OCR for machine printed Gujarati numerals” NJSIT 2010 vol. 3(1)., pp70-78

Shah S K and Sharma “A Design and Implementation of Optical Character Recognition System to Recognize Gujarati Script using Template Matching” IE(I) Journal−ET vol.86, pp. 44-49, 2006

Rao P. and Ajitha T., “Telugu script recognition”, in Proceedings of Third International Conference on Document Analysis and Recognition, 1995, pp 323–326.

Jawahar C V, M N S S K Pavan Kumar, S S Ravi Kiran (2003) “A Bilingual OCR for Hindi-Telugu Documents and its Applications”. in Proceedings of Seventh International Conference on Document Analysis and Recognition, 2003 vol. 1, pp 408-412.

Kurian C., Firoz Shah. A, Kannan B., “Isolated Malayalam Digit Recogntion Using Support Vector Machines” in Proceedings of IEEE International Conference on Communication Control and Computing Technologies, 2010 pp 692-695.

R. Jagadeesh Kannan and R. Prabhakar, “ Accuracy Augmentation of Tamil OCR using Algorithm Fusion”, International Journal of Computer Science and Network Security, vol 8. no. 5, pp51-56 , 2008

Jalal Uddin Mahmud, Mohammed Feroz Raihan and Chowdhury Mofizur Rahman “A Complete OCR System for Continuous Bengali Characters” in Proceedings of Conference on Convergent Technologies for Asia-Pacific Region TENCON 2003 vol. 4, issue, 15-17 pp1372 – 1376.

V. Vapnik, “The Nature of Statistical Learning Theory” Springer Verlag, 1995.

C. Burges, “A Tutorial on support Vector machines for pattern recognition” Data mining and knowledge discovery, vol.2,1998,pp.1-43.

V.N. Vapnik, Statistical Learning Theory. John Wiley and sons, 1998

Hall P., Park B.U., Samworth R.J. “Choice of neighbor order in nearest neighbor classification”, Annals of Statistics vol 36 (5): 2135-2152 2008.

Cover T.M. and Hart P.E. “Nearest neighbor pattern classification”, IEEE Transactions of Information Theory, vol 13 no. 1pp 21-27. 1967

Ramteke R.J., Borkar P.D. and Mehrotra; “Recognition of isolated Marathi Handwritten Numerals: An Invariant Moments Approach” Proceedings of the International Conference on Cognition and Recognition 2005, pp 482-489

http//ccat.sas.upenn.edu Accessed 15 June 2007

http//languages.iloveindia.com Accessed 15 June 2007

http//india.mapsofindia.com Accessed 15 June 2007

http//en.wikipedia.org Accessed 15 June 2007

Nello Cristianini and John Shawe-Taylor, “An Introduction to Support Vector Machines and Other Kernel-based Learning Methods”, Cambridge University Press, 2000.

S.V. Rajashekararadhya, P. Vanaja Ranjan “Support Vector Machine based Handwritten Numeral Recognition of Kannada Script” IEEE International Advance Computing Conference (IACC 2009) pp 381-386

B. Schölkopf, Support Vector Learning., Ph.D. Thesis, R. Oldenbourg Verlag Publications, Munich, Germany, 1997.

Smith L. I.. A Tutorial on Principal Components Analyzing – Discussion and singular value decomposition, http://www.cs.otago.ac.nz/cosc453/ student_tutorials/principal_components.pdf (2002).

Shlens J. A Tutorial on Principal Component Analysis, available at: http://www.cs.princeton.edu/picasso/mats/PCA-Tutorial-Intuition _jp.pdf (2005).

Jolliffe I. T. Principal Component Analysis, Springer Series in Statistics, 2nd ed., Springer. (2002)..

Kim K. Face Recognition using Principle Component Analysis, DCS, University of Maryland. (2003).

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me