Classification Analysis on Corpus Data

Dr .S. Charles, S. Antony Joseph Raj, Y. Sunil Raj


Knowledge can be discovered from various sources of information. Text mining is the field of extraction of useful information from unstructured data by identifying and exploring interesting patterns. Text mining techniques may rely on simple bag of words text representations based on vector space. This study is intended to analyze accuracy of three different classifiers while working with three different corpus classification. The comparison of algorithm based on three different datasets, such as two movie review datasets and twitter dataset are presented in this study. The algorithms selected for this study are Naïve Bayes, J48 and Decision Table which are known classifiers applied highly structured and unstructured datasets.


Text Mining, Corpus, Naïve Bayes, J48, Decision Table.

