Open Access Open Access  Restricted Access Subscription or Fee Access

An Improved Expectation Maximization based Semi-Supervised Text Classification using Naïve Bayes and Support Vector Machine

Purvi Rekh, Amit Thakkar, Amit Ganatra


With the development of Internet and the emergence of a large number of text resources, the automatic text classification has become a research hotspot. As number of training documents increases, accuracy of Text Classification increases. Traditional classifiers (Supervised learning) use only labeled data for training. Labeled instances are often difficult, expensive, or time consuming to obtain. Meanwhile unlabeled data may be relatively easy to collect. Semi-Supervised Learning makes use of both labeled and unlabeled data. Several researchers have given algorithms for Text Classification using Semi-Supervised Learning. But still improving accuracy of Text Classification using Semi-Supervised Learning is a challenge. In the iterative process in the standard Expectation Maximization (EM) based semi-supervised learning, some unlabeled samples are misclassified by the current classifier because the initial labeled samples are not enough. To overcome this limitation, an EM based Semi-Supervised Learning algorithm using Naïve Bayesian and Support vector machine is proposed in this paper to improve accuracy of text classification using semi-supervised learning.


Expectation Maximization (EM), Naïve Bayes (NB), Support Vector Machine (SVM), Semi-Supervised Machine (SSL).

Full Text:



Vishal Gupta, “A Survey of Text Mining Techniques and Applications”, Journal of Emerging Technologies in Web Intelligence, Vol. 1, No. 1, August 2009.

Kamal Nigam, Andrew Kachites Mccallum,“ Text classification from Labeled and Unlabeled Data using EM”, Machine Learning, Kluwer Academic Publishers, Boston. Manufactured in The Netherlands, 2002.

Xiaojin Zhu, “Semi-Supervised Learning Literature Survey”, Computer Sciences TR 1530, University of Wisconsin – Madison, 2005.

Yutaka Sasaki, “Automatic Text Classification”, NaCTeM, School of Computer Science.

Wen Han, Xiao Nan-feng, “An Enhanced EM Method of Semi-supervised Classification Based on Naive Bayesian”, Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), 15- Sep- 2011.

YueHong Cai, Qian Zhu; “Semi-Supervised Short Text Categorization based on Random Subspace”- Computer Science and Information Technology (ICCSIT), 3rd IEEE International Conference on Page(s): 470 – 473 , 2010.

Xinghua Fan, Zhiyi Guo; “A semi-supervised Text Classification Method based on Incremental EM Algorithm”, WASE International Conference on Information Engineering, Page(s): 211 - 214, 2010.

Xinghua Fan, Zhiyi Guo, Houfeng Ma. “An improved EM-based Semi-supervised Learning Method” ,International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, page(s): 529 - 532, August - 2009.

Manabu Sassano, “Virtual Examples for Text Classification with Support Vector Machines”, Fujitsu Laboratories Ltd., 2011.

Bei Yu, “An Evaluation of Text Classification Methods for Literary Study”, GSLIS, UIUC, January 04, 2006.

Bei Yu, “An evaluation of text classification methods for literary study”, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, USA, 2007.

Ion Alexandru Muslea, “MVL with active learning”, 2002.

Shoushan Li ,“A Framework of Feature Selection Methods for Text Categorization”, Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pages 692–700, Suntec, Singapore, 2-7 August 2009.

Anirban Dasgupta, “Feature Selection Methods for Text Classification”, KDD‟07, San Jose, California, USA. ACM 978-1-59593-609-7/07/0008 ,August 12–15, 2007.

K. Nigam, A. McCallum, and T. Mitchell, “Semi-supervised Text Classification Using EM,” Semi-Supervised Learning, MIT Press, Boston, 2006.

F. Dellaert, “The Expectation Maximization Algorithm,” College of Computing, Georgia Institute of Technology, Technical Report number GIT-GVU-02-20, 2002.

Thorsten Joachims, “Text Categorization with Support Vector Machines: Learning with Many Relevant Features”, 2004.

Saurav Sahay, “Support Vector Machines and Document Classification”

Kamber, Jiawei Han and Micheline, "Data Mining: Concepts and Techniques, 2nd ed," 2006.

Text classification and Naïve Bayes, Cambridge University Press, 2009.

Long Qiu,“Tutorial on Expectation-Maximization Algorithm”, School of Computing, National University of Singapore, 2006.

Bakkalaureatsarbeit, “Semi-supervised Learning With Support Vector Machines”,von Andre Guggenberger, 2008.


  • There are currently no refbacks.