Open Access Open Access  Restricted Access Subscription or Fee Access

A Technique for Clinical Segmentation over Merged Characters on Non-Headline Based Distorted Tamil Scripts

R. Indra Gandhi, Dr. K. Iyakutti

Abstract


Segmentation is an important phase towards the designing of optical character recognition system. One of the important reasons for poor recognition rate in OCR system is due to incorrect segmentation of characters. Most of the segmentation algorithms primarily aim at segmenting text, graphics, page, line and word. Character segmentation is the fundamental process in character recognition approaches, which rely on isolated characters. Sometimes during segmentation, characters of same word touch each other thus producing vertically overlapping characters. In Tamil scripts, applying the simple concept of vertical projection in segmenting the whole document into individual characters does not work well. As a first step in resolving this, this paper presents an intelligent technique for solving the key problems of distorted merging (touching) characters segmentation. The results show that the proposed algorithm yields promising segmentation output and feasible with other existing techniques, easy for extension, and may be very effective for non-headline based complex Indic scripts.

Keywords


Segmentation, Distorted Character Segmentation, Vertical Overlapping, Merging (touching) Character Segmentation, Non-headline scripts

Full Text:

PDF

References


J. Dholakia, A. Negi and S. R. Mohan, “Zone identification in the printed Gujarati text”, in the Proceedings of 8th ICDAR, pp. 272-276, 2005.

R.Indra Gandhi and K. Iyakutti, “An Attempt to Recognize Handwritten Tamil Character Using Kohonen SOM”, International Journal of Advanced Networking and Applications (IJANA), Vol.01, Issue 03, pp.188-192, 2009.

S. W. Lee, D. J. Lee and H. S. Park, “A new methodology for gray-scale character segmentation and recognition”, IEEE Trans., on PAMI, Vol. 18(10), pp. 1045- 1050, 1996.

C. B. Bose and S. S. Kuo, “Connected and degraded text recognition using hidden markov model”, Pattern Recognition, Vol. 27(10), pp. 1345-1363, 1994.

S. Tsujimoto and H. Asada, “Resolving ambiguity in segmenting touching characters,” 1st Int., Conf., on Document Analysis and Recognition, pp. 701-709, Saint-Marlo, France, Sept. 1991.

R.G. Casey and G. Nagy, “Recursive segmentation and classification of composite character patterns,” Proc. 6thInt., Conf., on Pattern Recognition, pp. 1023-1026, Munich, 1982.

T. Hong, Degraded Text Recognition using Visual and Linguistic Context, Ph. D. thesis, Computer Science Dept., of SUNY at Buffalo, 1995.

S. Kahan, T. Pavlidis and H. S. Baird, “On the recognition of printed characters of any font and size”, IEEE Trans.Pattern Anal. Mach. Intell. 9, 274 288 (March 1987).

Y. Lu, “On the segmentation of touching characters,” in Proc. Int. Conf. Document Anal. Recognition, Tsukuba Science City, Japan, 1993, pp. 440–443.

U. Garain and B. B. Chaudhuri, “Compound character recognition by run number based metric distance”, SPIE Proc., Vol. 3305, pp. 90-97, 1998.

B. B. Chaudhuri, U. Pal and M. Mitra, “Automatic recognition of printed Oriya script”, in the Proceedings of 6th ICDAR, pp. 795-799, 2001.

M. K. Jindal, G. S. Lehal and R. K. Sharma, “Segmentation of touching characters of Indian scripts-an overview”, in the proceedings of National Conference on Recent Advances and Future Trends in IT (RAFIT 2005), Punjabi University Patiala, pp. 74-77, 2005.

G. S. Lehal and C. Singh, “Text segmentation of machine-printed Gurmukhi script”, Document Recognition and Retrieval VIII, proceedings SPIE, USA, Vol. 4307, pp. 223-231, 2001.

G. S. Lehal and C. Singh, “A technique for segmentation of Gurmukhi text”, Computer Analysis of Images and Patterns, Proceedings CAIP 2001, W. Skarbek (Ed.), Lecture Notes in Computer Science, Vol. 2124, Springer-Verlag, Germany, pp. 191-200, 2001.

V. Bansal, Integrating Knowledge Sources in Devanagari Text Recognition, Ph. D. thesis, IIT Kanpur, India, 1999.

R.M.K.Sinha and H.Mahabala, “Machine recognition of Devanagari script”, IEEE Trans. Syst. Man Cybern. Vol. 9, 1979.

R. G. Casey and E. Lecolinet, “A survey of methods and strategies in character segmentation”, IEEE Transactions on PAMI, Vol. 18(7), pp. 690-706, 1996.

C. E. Dunn and P. S. P. Wang, “Character segmentation techniques for handwritten text - a survey”, in the Proceedings of ICPR, Vol. 2, pp. 577-580, 1992.

Y. Lu and M. Shridhar, “Character segmentation in handwritten words – an overview”, Pattern Recognition, Vol. 29(1), pp. 77-96, 1996.

R. L. Hoffman and J. W. McCullough, “Segmentation methods for recognition of machine-printed characters”, IBM Journal of Research and Development Vol. 15(2), pp. 153-165, 1971.


Refbacks

  • There are currently no refbacks.