Automated Text Summarization: A Case Study for Marathi Language

Umakant Dakulge; S. C. Dharmadhikari

Automated Text Summarization: A Case Study for Marathi Language

Umakant Dakulge, S. C. Dharmadhikari

Abstract

The amount of information on the Internet/Web is growing day by day, which has caused information overload. To find relevant useful information is becoming crucial task. This growth has created a huge demand for automatic methods and tools for text summarization. In Natural Language Processing, Text summarization is an area getting attention of lots of researcher. In this paper, we present a survey on text summarization techniques, also discuss the key morphology of Marathi Languages and proposed framework of Text Summarization. Last decade, lots of work done on English language text summarization but a few notable works have been done for Marathi Language. The Proposed framework summarizes a single document using extraction method. Before creating the summary of a text, first it is preprocessed by segmentation, tokenization, removal of stop words and stemming. In feature extraction process, the countable features like TF-ISF, sentence length, sentence positional value, SOV verification are used to make the summary more relevant and precise. For stemming purpose we develop a rule based as well as directory based Marathi Stemmer.

Keywords

Stemming, Stop Words, Text Summarization, Tokenization

Full Text:

PDF

References

D. Radev, E. Hovy and K. McKeown, "Introduction to the Special Issue on Summarization," Computational Linguistics, vol. 28, no. 4, pp. 399-408, Dec. 2002.

A. Nenkova and K. McKeown, "A Survey of Text Summarization Techniques," in Mining Text Data, Springer US, 2012, pp. 43-76 © Springer Science+Business Media, LLC 2012.

Das, D., Martins, A. F. T. "A survey of automatic text summarization," Carnegie Mellon University technical report. http://www.cs.cmu.edu/~nasmith/LS2/dasmartins.07.pdf.

L. Dolamic and J. Savoy, "Comparative Study of Indexing and Search Strategies for the Hindi, Marathi, and Bengali Languages," ACM Transactions on Asian Language Information Processing (TALIP), vol. 9, no. 3, pp. 1-24, Sept. 2010. [doi>10.1145/1838745.1838748].

R. Ferreira, L. Cabral, R. Lins, G. Silva, F. Freitas, G. Cavalcanti, R. Lima, S. Simske and L. Favaro, "Assessing sentence scoring techniques for extractive text summarization," Expert Systems with Applications, vol. 40, no. 14, pp. 5755-5764, Oct. 2013 © 2013 Elsevier Ltd. http://dx.doi.org/10.1016/j.eswa.2013.04.023.

Bharati, Akshar, Vineet Chaitanya and Rajeev Sangal: 1995. Natural Language Processing: A Pāainian Perspective. New Delhi: Prentice Hall of India.

Walambe M. R. (1990), Marathi Vyakran, Nitin Prakashan, Pune.

"Provisional Population Total Figures At A Glance Maharashtra," 2011 [Online]. Available: http://www.censusindia.gov.in/2011-prov-results/data_files/maharastra/3%20-%20Prelude.pdf [Accessed: Jan 2013].

G. Sizov, "Extraction-Based Automatic Summarization Theoretical and Empirical Investigation of Summarization Techniques," M.S. thesis, Dept. Comp. and Inform. Sci., Norwegian Univ. Sci. and Tech., June 2010.

C. Lin, "Rouge: A package for automatic evaluation of summaries," In Proc. ACL-04 Workshop on Text Summarization Branches Out, pp. 74-81, Barcelona, Spain, 2004.

A. Nenkova and R. Passonneau, "Evaluating Content Selection in Summarization: The Pyramid Method," In Proc. HLT/NAACL 2004.

J. Steinberger and J. Karel, "Evaluation Measures For Text Summarization," In Computing and Informatics, Vol. 28, pp. 1001–1026, Mar. 2009.

http://trec.nist.gov/overview.html

http://duc.nist.gov/.

http://www.isical.ac.in/~fire/

http://www.clef-initiative.eu/

http://ntcir.nii.ac.jp/about/

http://www.cfilt.iitb.ac.in/wordnet/webmwn/wn.php

http://en.wikipedia.org/wiki/Marathi_language

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me