Open Access Open Access  Restricted Access Subscription or Fee Access

A Spam Detection Study of Tweets in Indian Healthcare

Sramana Mukherjee, Arijit Sarkar, Saptarsi Goswami, Amit Kumar Das

Abstract


One of the rapidly growing social network, twitter has been infiltrated by large amounts of spam.  Twitter has many potential applications across diverse areas, however the signal to noise ratio is very high because of spam, which is a major obstacle of meaningful analysis and action. It is a well-studied problem in emails; however, for tweets, it is relatively less researched.  In this paper we have a set up a focused study consisting of nearly 5000 Tweets related to Indian Healthcare. An extensive study has been conducted where six classifiers have been evaluated and compared for spam detection.  A simple term frequency based feature selection technique has been shown to reduce the model building time significantly.  Ensemble method based on top five classifiers improve the accuracy as well as the stability of the results.


Keywords


Spam Detection, Twitter, Healthcare, Ensemble Learning.

Full Text:

PDF

References


Bae, Jung-Hwan, Ji-Eun Son, and Min Song. "Analysis of twitter for 2012 South Korea presidential election by text mining techniques." Journal of Intelligence and Information Systems 19, no. 3 (2013): 141-156.

Sang, Erik Tjong Kim, and Johan Bos. "Predicting the 2011 dutch senate election results with twitter." In Proceedings of the Workshop on Semantic Analysis in Social Media, pp. 53-60. Association for Computational Linguistics, 2012.

Ghiassi, M., J. Skinner, and D. Zimbra. "Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network."Expert Systems with applications 40, no. 16 (2013): 6266-6282.

Kongthon, Alisa, Choochart, Haruechaiyasak, Jaruwat Pailai, and Sarawoot Kongyoung. "The role of Twitter during a natural disaster: Case study of 2011 Thai Flood." In Technology Management for Emerging Technologies (PICMET), 2012 Proceedings of PICMET'12:, pp. 2227-2232. IEEE, 2012.

Saurav Kumar, Siddartha Maskara, Nitin Chandak, Saptarsi Goswami. Empirical Study of Relationship between Twitter Mood and Stock Market from an Indian Context (International Journal of Applied Information Systems (IJAIS) – ISSN: 2249-0868). Foundation of Computer Science FCS, New York, USAVolume 8– No.7, May 2015

Chen, Ray, and Marius Lazer. "Sentiment analysis of twitter feeds for the prediction of stock market movement." stanford. edu. Retrieved January 25 (2013): 2013.

Thomas, Kurt, Damon McCoy, Chris Grier, Alek Kolcz, and Vern Paxson. "Trafficking Fraudulent Accounts: The Role of the Underground Market in Twitter Spam and Abuse." In USENIX Security, pp. 195-210. 2013.

Ahmed, Faraz, and Muhammad Abulaish. "A generic statistical approach for spam detection in Online Social Networks." Computer Communications 36, no. 10 (2013): 1120-1129.

K. Thomas, C. Grier, J. Ma, V. Paxson, and D. Song, “Design and evaluation of a real-time url spam filtering service,” in IEEE Symposium on Security and Privacy, 2011.

G. Stringhini, C. Kruegel, and G. Vigna, “Detecting spammers on social networks,” in Proceedings of the 26th Annual Computer Security Applications Conference. ACM, 2010, pp. 1–9.

J. Jiang, C. Wilson, X. Wang, P. Huang, W. Sha, Y. Dai, and B. Zhao, “Understanding latent interactions in online social networks,” in Proceedings of the 10th Annual Conference on Internet Measurement Conference (IMC’10). ACM, 2010, pp. 369–382.

17. M. McCord, M. Chuah. Spam Detection on Twitter Using Traditional Classifiers (Bethlehem, PA 18015, USA)

M. McCord, M. Chuah. Spam Detection on Twitter Using Traditional Classifiers (Bethlehem, PA 18015, USA).

R Development Core Team (2008). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.

Ingo Feinerer, Kurt Hornik, David Meyer. Text Mining Infrastructure in R.

Jeff Gentry (2013). TwitteR: R based Twitter client. R package version 1.1.7. http://CRAN.R-project.org/package=twitteR


Refbacks

  • There are currently no refbacks.