Open Access Open Access  Restricted Access Subscription or Fee Access

Neural Network Based Hybrid Technique towards Recognition of Distorted Historic Broken Tamil Scripts

R. Indra Gandhi, Dr.K. Iyakutti

Abstract


Document analysis and optical character recognition (OCR) systems have been under research for few decades. Especially for distorted characters, still it remains a highly challenging task to implement an OCR that works under all possible conditions and gives highly accurate results. In this paper, we investigate the application of hybrid technique that recognizes distorted broken Tamil characters. Entire application is carried out in two different phases. Parker’s shape tracing along with line tracing features was included as first phase. Neural Network training along with selective thresholding minimum distance technique (MDT) for recognition was continued as second phase. The results show that the proposed hybrid technique can improve character recognition accuracy of distorted documents containing broken characters and feasible with other existing techniques, easy for extension, and may be very effective for non-headline based complex Indic scripts.

Keywords


Broken Character, Shape Tracing, Line Tracing, Minimum Distance Technique, Non-headline scripts.

Full Text:

PDF

References


L. Likforman-Sulem and M. Sigelle, “Recognition of Broken Characters from Historical Printed Books Using Dynamic Bayesian Networks”, 9th International Conference on Document Analysis and Recognition (ICDAR 2007), vol.1, pp.173-177, Sept 23-26,2007.

P. Stubberud, J. Kanai and V. Kalluri, “Adaptive image restoration of text images that contain touching or broken characters”, 3rd International Confer., on Doc., Analysis and Recognition, vol.2, pp.778 – 781, Aug 14-16 , 1995.

A. Whichello and H. Yan, “Linking broken character borders with variable sized masks to improve recognition”, Pattern Recognition, vol. 29, pp. 1429-1435, August 1996.

Y. Lu, B. Haist, L. Harmon, J. Trenkle and R. Vogi, “An accurate and efficient system for segmenting machine-printed text”, U.S. Postal Service 5th Advanced Technique Conference, Washington , Vol.3, pp.A93-A105, 1992.

O. Nakamura, M.Ujiie, N.Okamoto and T. Minami, “A character segmentation algorithm for mixed-mode communication”, Trans. IEICE, (D) 167-D, 11, pp. 1277- 1285, 1984.

N. Okamoto, O. Nakamura and T. Minami, “Character segmentation for mixed-mode communication”,IFIP’83, pp.681-685, 1983.

B.A. Yanikoglu, “Pitch - based segmentation and recognition of dot-matrix text”, International Journal of Document Analysis and Recognition (IJDAR), Vol.3, pp.34- 39, 2000.

M. Droettboom, “Correcting broken characters in the recognition of historical printed documents”, in the Proceedings of the 3rd ACM/IEEE-CS Joint Conference on Digital libraries (JCDL), Houston, Texas, USA, pp. 364-366, 2003.

J.R. Parker, “Vector Templates and Handprinted Character Recognition”, Proc. 12th IAPR Conference on Pattern Recognition, Jerusalem, Israel. Oct 9-13, 1994.

J.R. Parker, “Handprinted Digit Recognition by Stroke Tracing”, proceedings of 3rd Australian and New Zealand Conference on Intelligent Information Systems (ANZIIS-95), Perth, Western Australia, Vol +325, pp 64-69, Nov. 1995.

L. Lam, and C.Y. Suen, “Structural Classification and Relaxation Matching of Totally Unconstrained Handwritten Zip-Code Numbers”, Pattern Recognition, Vol. 21 No. 1. pp. 19-31, 1988.

C.Y. Suen, “Computer Recognition of Unconstrained Handwritten Numerals”, Proc. IEEE, Vol. 80, Issue 7, pp. 1162-1180, July 1992.

S.Mori, K. Yamamoto and M.Yasuda, “Research on Machine Recognition and Handprinted Characters”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-6, Issue 4, PP386-405, July 1984.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.