Open Access Open Access  Restricted Access Subscription or Fee Access

Classification Analysis on Corpus Data

Dr .S. Charles, S. Antony Joseph Raj, Y. Sunil Raj

Abstract


Knowledge can be discovered from various sources of information. Text mining is the field of extraction of useful information from unstructured data by identifying and exploring interesting patterns. Text mining techniques may rely on simple bag of words text representations based on vector space. This study is intended to analyze accuracy of three different classifiers while working with three different corpus classification. The comparison of algorithm based on three different datasets, such as two movie review datasets and twitter dataset are presented in this study. The algorithms selected for this study are Naïve Bayes, J48 and Decision Table which are known classifiers applied highly structured and unstructured datasets.


Keywords


Text Mining, Corpus, Naïve Bayes, J48, Decision Table.

Full Text:

PDF

References


Jiawei Han and Micheline Kamber, “Data Mining: concepts and techniques”, Morgan Kaufmann Publishers, San Francisco, 2006.

Weiss, S. I., and Kulikowski, C. 1991. Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Networks, Machine Learning, and Expert Systems. San Francisco, Calif.: Morgan Kaufmann.

Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. “Thumbs up? Sentiment classification using machine learning techniques”, Proceedings of EMNLP, pp. 79-86.

Jyotika Yadav”A Survey on Sentiment Classification of Movie Reviews”,International Journal of Engineering Development and Research,Volume 3 Issue 7, July 2014.

Savita Harer,Sandeep Kadam,“Sentiment Classification and Feature based Summarization of Movie Reviews in Mobile Environment”,Volume 100– No.1, August 2014.

Franco Salvetti,Stephen Lewis,Christoph Reichenbach,“Automatic Opinion Polarity Classification of Movie Reviews”,Colorado Research in Linguistics,Volume 17 (2004)

G.Vinodhini, RM.Chandrasekaran,“Sentiment Analysis and Opinion Mining: A Survey”,International Journal of Advanced Research in Computer Science and Software Engineering,Volume 2, Issue 6, June 2012.

A Pappu Rajan, S.P.Victor, “Web Sentiment Analysis for Scoring Positive or Negative Words using Tweeter Data”, International Journal of Computer Applications (0975 – 8887),Volume 96– No.6, June 2014.

Abhijit Chakankar, Sanjukta Pal Mathur, Krishna Venuturimilli, “Sentiment analysis of users’ reviews and comments”.

Bo Pang, Lillian Lee, Shivakumar Vaithyanathan. “Thumbs up? Sentiment classification using machine learning techniques”, Proceedings of EMNLP, pp. 79¬¬-86.

Shailendra Kumar Singh, Sanchita Paul and Dhananjay Kumar,”Sentiment Analysis Approaches on Different Data set Domain: Survey”,International Journal of Database Theory and Application,Vol.7, No.5 (2014),pp.39-50.

Manning, Raghavan, Schutze Manning, Christopher, Raghavan, P, Schutze H., “Introduction to Information Retrieval”, Cambridge University Press, 2008.

Samuel Odei Danso., “An Exploration of Classification Prediction Techniques in Data Mining: The insurance domain”, September, 2006.

Vishal Gupta and Gurupreet S.Lehal., “A Survey of Text Mining `Techniques and Applications”, Journal of Emerging Technologies in Web Intelligence, Vol. 1, No. 1, August 2009.

Kao A., Poteet S.R., “Natural Language Processing and Text Mining”, Springer, April 2006.

Jijy George Sandhya .N. Suja George., ”Classification Problem In Text Mining”, International Journal of Innovative Research in Advanced Engineering, Volume 1 Issue 8, September 2014.

Poobana S, Sashi Rekha K., “Opinion Mining From Text Reviews Using Machine Learning Algorithm”., International Journal of Innovative Research in Computerand Communication Engineering., Vol. 3, Issue 3, March 2015.

Goutam Chakraborty, Murali Pagolu, Satish Garla., “Text Mining and Analysis: Practical Methods, Examples, and Case Studies Using SAS”, Publisher: SAS Institute November 2013.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.