Open Access Open Access  Restricted Access Subscription or Fee Access

Malicious Comment Classification using Bidirectional LSTM and Convolutional Neural Networks

K. V. Sarath Chandra, R. Sri Harsha, Srivathsa L Rao, C. Shreyas Gowda, Dr. K. S. S. Kavitha

Abstract


Social media has become an important part of our daily lives due to the recent trends. Sites are flooding with tons of posts and opinions of people and social media communication has been on the rise. Although this has mostly been a boon to us, unfortunately it involves enormous dangers, since online texts with high toxicity can cause personal attacks, online harassment and bullying behaviours. People hiding behind closed doors with anonymity can do whatever they want with a keyboard. Unfortunately, not enough means exist to tackle this issue. Recently the employment of Convolutional Neural Networks and Recurrent Neural Networks are approached for computational purposes for the text classification systems. This work utilizes this for finding foul and malicious comments using the Kaggle data set. The work aims to classify a comment into 6 labels of toxicity. This work also implements a completely functional frontend environment built using React JS and MongoDB, which classifies a user entered text into the mentioned labels of toxicity.


Keywords


Convolutional Neural Network, Toxic Text Classification, Bidirectional LSTM, Pre-trained Word Embeddings, GloVe, Text Classification, Natural Language Processing.

Full Text:

PDF

References


Georgakopoulos, Spiros V., Sotiris K. Tasoulis, Aristidis G. Vrahatis, and Vassilis P. Plagianakos. "Convolutional neural networks for toxic comment classification." In Proceedings of the 10th Hellenic Conference on Artificial Intelligence, pp. 1-6. 2018.

van Aken, Betty, Julian Risch, Ralf Krestel, and Alexander Löser. "Challenges for toxic comment classification: An in-depth error analysis." arXiv preprint arXiv: 1809.07572 (2018).

Sharma, Revati, and Meetkumar Patel. "Toxic comment classification using neural networks and machine learning." (2018): 47-52.

Zaheri, Sara, Jeff Leath, and David Stroud. "Toxic Comment Classification." SMU Data Science Review 3, no. 1 (2020): 13.

Gambäck, Björn, and Utpal Kumar Sikdar. "Using convolutional neural networks to classify hate-speech." In Proceedings of the first workshop on abusive language online, pp. 85-90. 2017.

“Kaggle Toxic Comment Classification Challenge.” [Online]. Available:http://www.kaggle.com/c/jigsaw-toxic-comment- classification-challenge

Mohammad, Fahim. "Is preprocessing of text really worth your time for online comment classification?." arXiv preprint arXiv:1806.02908 (2018).

Song, Ge, Yunming Ye, Xiaolin Du, Xiaohui Huang, and Shifu Bie. "Short text classification: A survey." Journal of multimedia 9, no. 5 (2014): 635.

Aggarwal, Charu C., and ChengXiang Zhai. "A survey of text classification algorithms." In Mining text data, pp. 163-222. Springer, Boston, MA, 2012.

de Bruijn, Alissa, Vesa Muhonen, Tommaso Albinonistraat, Wan Fokkink, Peter Bloem, and Business Analytics. "Detecting offensive language using transfer learning." (2019).

Ge Song, Yunming Ye, Xiaolin Du, Xiaohui Huang, and Shifu Bie. 2014. Short Text Classification: A Survey. Journal of Multimedia 9, 5 (2014).

Kim, Yoon. "Convolutional neural networks for sentence classification." arXiv preprint arXiv: 1408.5882 (2014).

Charu C Aggarwal and ChengXiang Zhai. 2012. A survey of textclassification algorithms. In Mining text data. Springer, 163–222.

M Ramakrishna Murty, JVR Murthy, and Prasad Reddy PVGD. 2011. Text Document Classification based-on Least Square Support Vector Machines with Singular Value Decomposition. International Journal of Computer Applications 27, 7 (2011).

Xiaojun Quan, Gang Liu, Zhi Lu, Xingliang Ni, and Liu Wenyin. 2010. Short text similarity based on probabilistic topics. Knowledge and information systems 25, 3 (2010), 473–491.

Maeve Duggan. 2014. Online harassment. Pew Research Center.

Ellery Wulczyn, Nithum Thain, and Lucas Dixon. 2017. Ex machina: Personal attacks seen at scale. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1391–1399.

Hossein Hosseini, Sreeram Kannan, Baosen Zhang, and Radha Poovendran. 2017. Deceiving Google’s Perspective API Built for Detecting Toxic Comments. arXiv preprint arXiv:1702.08138 (2017).

Eleni Tsironi, Pablo Barros, Cornelius Weber, and Stefan Wermter. 2017. An Analysis of Convolutional Long Short-Term Memory Recurrent Neural Networks for Gesture Recognition. Neurocomput. 268, C (Dec. 2017), 76–86. https://doi.org/10.1016/j.neucom.2016.12.088.

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 3111–3119.

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In In EMNLP.

Ge Song, Yunming Ye, Xiaolin Du, Xiaohui Huang, and Shifu Bie. 2014. Short Text Classification: A Survey. Journal of Multimedia 9, 5 (2014).

Mikael Henaff, Joan Bruna, and Yann LeCun. 2015. Deep Convolutional Networks on Graph-Structured Data. CoRR abs/1506.05163 (2015). arXiv:1506.05163 http://arxiv.org/abs/ 1506.05163.

Combination of Convolutional and Recurrent Neural Network for Sentiment Analysis of Short Texts Xingyou Wang, Weijie Jiang, Zhiyong Luo3. http://www.aclweb.org/anthology/C16-1229


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.