Syntactical Knowledge based Stemmer for Automatic Document Summarization
Abstract
With the rapid growth of the data in the Internet the
users are overloaded with huge amounts of information which is more difficult to access large volumes of documents. Automatic text summarization technique is an important activity in the analysis of high volume text documents. Text Summarization is condensing the source text into a shorter version preserving its information content and overall meaning. The proposed system generates a summary for a
given input document based on identification and extraction of important sentences in the document. The model will consist of four steps. In first stage, the system decomposes the given text into its constituent sentences. The second stage removes the stop words, stemming the text. Assignment of the POS tag will be done in third stage using dependency grammar. Finally the sentences will be
ranked depending on feature terms. The paper presents our work done till the stemming process. The stemmer implemented here promises good results.
Keywords
Full Text:
PDFReferences
I. Mani and M. Maybury. Advances in Automatic Text
Summarization. MIT Press, ISBN 0-262-13359-8, 1999.
Vishal Gupta , Gurpreet Singh LehalKuceral., A Survey of Text
Summarization Extractive, Journal of Emerging Technologies in Web
Intelligence, vol. 2, no. 3, august 2010
Hongyan Jing, Sentence Reduction for Automatic Text
Summarization, Proceedings of the sixth conference on Applied natural
language processing, Seattle, Washington, pp.310 - 315, 2000.
Rafeeq Al-Hashemi, Text Summarization Extraction System (TSES)
Using Extracted Keywords, International Arab Journal of e-
Technology, Vol. 1, No. 4, June 2010 pp 164 168
Kai Ishikawa et. al.; “Trainable Automatic Text Summarization Using
Segmentation of Sentence”; Multimedia Research Laboratories, NEC
Corporation 4-1-1 Miyazaki Miyamae-kuKawasaki-shi Kanagawa
-8555, 2003.
Ferranpla and Antoniomol i n a, Improving part-of-speech tagging
using lexicalized HMMs, Cambridge University Press, Natural
Language Engineering 10 (2): 167-189, 2004
Brill, E.A Simple Rule-Based Part-of-speech Tagger. Proceedings 3rd
Conference on Applied Natural Language Processing, ANLP, pp. 152-
ACL, 1992.
M. Santosh Kumar and Kavi Narayana (2006)“Corpus Based Statistical
approaches for stemming telugo” Journal of quantative linguistic, Vol.
No.16, Issue No.1 ,pp 130-133.
Frakes, W.B., 1992. Stemming algorithms. O'Neill, C. & Paice, C.D.,
What is Stemming?,
Brill, E. Transformation-based error-driven learning and natural
language processing: A case study in art-of-speech tagging.
Computational Linguistics 21(4): 543-565. 1995a
Ratnaparkhi, A. A maximum entropy part-of-speech tagger.
Proceedings1st Conference on Empirical Methods in Natural Language
Processing, EMNLP, 1996.
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 3.0 License.