Open Access Open Access  Restricted Access Subscription or Fee Access

Comparative Study on Extraction of Keywords using Salton Buckley and Clustering of Correlation

M. Saravanan, R. Sridhar

Abstract


Keywords are index terms that contain most important information. Keyword extraction is considered as the processing for text documents. Keyword extraction is a process by which a short list of keywords is extracted out from the documents. This brings the advantage of reaching the information sources in a quick way. In this paper, keywords are extracted from the document collections to improve the effectiveness of Information Retrieval. Keyword extraction can help people quickly find hot spots on the web, since keywords in a document provide important information about the content of the document. We develop a keyword extraction method using correlation and Salton Buckley method. Documents containing keywords are identified in correlation are better than Salton Buckley method. Experimental result carries out the performance of best way extraction of keywords.


Keywords


Classification; Clustering; Correlation; Salton Buckley

Full Text:

PDF

References


B.Y.Ricardo and R.N.Berthier, Modern Information Retrieval. Addison Wesley Longman,1999.

G.Salton and M.J.McGill, Introduction to Modern Retrieval, McGraw Hill Book Company, 1983.

T.Joachims, “A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization,” Proc.14th International Conference, Machine Learning, pp.143-151, 1997.

Text Mining with Information Extraction Raymond J. Mooney and Un Yong Nahm Department of Computer Sciences,

Han, J., Kamber, M. (2001) Data Mining: Concepts and Techniques, Morgan Kaufmann.

Automatic Text Classification, Yutaka Sasaki,NaCTeM

A Frequent Keyword-Set Based Algorithm for Topic Modeling and Clustering of Research Papers,Kumar Shubankar, AdityaPratap Singh, Vikram Pudi.

Keyword Extraction from a Single Document using Word Co-occurrence Statistical Information, Y.MATSUO, National Institute of Advanced Industrial Science and Technology.

Keyword Extraction Using Naive Bayes, Yasin Uzun.

Topic Detection by Clustering Keywords, Christian Wartena and Rogier Brussee

Using Genetic Algorithm to Improve Information Retrieval Systems, Ahmed A. A. Radwan,Bahgat A. Abdel Latef, Abdel Mgeid A. Ali, and Osman A. Sadek.

G. Salton and C. Buckley. “Improving retrieval performance by relevance feedback”. Journal of the American Society for Information Science, 41(4), 1990, pp. 288–297.

H. Kucera and N. Francis. “Computational analysis of present-day American English” Providence, RD: BrownUniversity Press, 1967.

Moving beyond Kuera and Francis: A critical valuation of current word frequency norms and the introduction of a new and improved word frequency measure for American EnglishMarc BrysBaert.

Using Internet search engines to estimate word frequency, IRENE V. BLAIR and GEOFFREY R. URLAND.

Introduction to correlation http://en.wikipedia.org/wiki/Correlation_and_dependence.

Introduction to Information Retrieval and Text Clustering, Magnus Rosell KTHCSC

Hooper, R., et al. The Lancaster Stemming Algorithm 2005; Available from:://www.comp.lancs.ac.uk/computing/research/ stemming/Files/porter.JPG

Porter, M. The Porter Stemming Algorithm. 2006 [cited 201114 November 2011]; available from: http://tartarus.org/~martin/ Porter Stemmer/ index.html.

Grigore, M., Introduction to Stemming. 2008.

Frakes, W.B., et al., "Strength and similarity of affix removal stemming algorithms," SIGIR Forum, vol. 37, pp. 26-30. 2003.

Simultaneous Categorization of Text Documents and Identification of Cluster dependent Keywords, Hichem Frigui and Olfa Nasraoui.

A Corpus-based Approach for Keyword Identification using Supervised Learning Techniques Jakkrit TeCho Cholwich Nattee Thanaruk Theeramunkong.

A Fuzzy Self-Constructing Feature Clustering Algorithm for Text Classification Jung-Yi Jiang, Ren-Jia Liou, and Shie-Jue Lee.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.