Open Access Open Access  Restricted Access Subscription or Fee Access

Supervised Technique for Arabic Sentiment Analysis Using Different Preprocessing Methods and Features

Haidy H. Mustafa, Khalid T. Wassif

Abstract


The widespread of social media websites resulted to produce a massive amount of data every single minute. This kind of data represents people’s opinions, attitudes and feedback about different topics, political decisions, and products. Processing and analyzing such kind of data in order to understand people’s thoughts, feedback, and needs is called sentiment analysis (opinion mining). Sentiment analysis becomes a hot area nowadays because of this rapid growth in social media websites. Sentiment analysis main task is classifying text/documents/words according to its polarity into positive/negative opinions. The traditional sentiment analysis techniques are supervised, unsupervised and semi-supervised. Theses traditional techniques still didn’t give a high quality while working with Arabic language. Due to the complexity of Arabic language and its high derivatives, the sentiment analysis task becomes complicated and not an easy task.  This paper applies different supervised techniques (machine learning algorithms) with different preprocessing methods in order to investigate their importance.


Keywords


Sentiment Analysis, Preprocessing, Stemming, Arabic Sentiment Analysis, NLP, Features.

Full Text:

PDF

References


P. Anderson, "What is Web 2.0? Ideas, technologies and implications for education," JISC Technology and Standards Watch, Bristol, 2007.

A. Ahmad, "A Short Description of Social Networking Websites And Its Uses," International Journal of Advanced Computer Science and Applications (IJACSA), vol. 2, no. 2, pp. 124-128, 2011.

Facebook, "Facebook newsroom," Facebook, 4 February 2004. [Online]. Available: https://newsroom.fb.com/company-info/. [Accessed 30 May 2018].

S. Aslam, "Twitter by the Numbers: Stats, Demographics & Fun Facts," Omnicore, 1 January 2018. [Online]. Available: https://www.omnicoreagency.com/twitter-statistics/. [Accessed 30 5 2018].

B. Liu, Sentiment Analysis and Opinion Mining, China: Morgan & Claypool Publishers, May 2012.

F. Salem, ""Social Media and the Internet of Things - Towards Data-Driven Policymaking in the Arab World: Potential, Limits and Concerns," Mohammed Bin Rashid Al Maktoum Global Initiatives, Dubai, 2017.

P. Patil and P. Yalagi, "Sentiment Analysis Levels and Techniques: A Survey," International Journal of Innovations in Engineering and Technology (IJIET), vol. 6, no. 4, pp. 523 - 528, 2016.

W. Medhat, A. Hassan and H. Korashy, "Sentiment analysis algorithms and applications: A survey," Ain Shams Engineering Journal, vol. 5, no. 4, pp. 1093-1113, 2014.

C. Vapnik and C. Vladimir, "Support-vector networks," Machine Learning, vol. 20, no. 3, p. 273–297, 1995.

N. Friedman, D. Geiger and M. Goldszmidt, "Bayesian Network Classifiers," Machine Learning - Springer, vol. 29, no. 2-3, p. 131–163, 1997.

E. Ahishakiye, E. O. Omulo, D. Taremwa and I. Niyonzima, "Crime Prediction Using Decision Tree (J48) Classification Algorithm," International Journal of Computer and Information Technology, vol. 6, no. 3, pp. 188-195, 2017.

H. Zou, X. Tang, B. Xie and B. Liu, "Sentiment Classification Using Machine Learning Techniques with Syntax Features," in Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 2015.

A. P. Jain and A. P. P. Dandannavar, "Application of machine learning techniques to sentiment analysis," in 2nd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT), Bangalore, India, 2017.

N. M. S and R. R, "Sentiment analysis in twitter using machine learning techniques," in Computing, Communications and Networking Technologies (ICCCNT), Tiruchengode, India, 2013.

A. Jeyapriya and C. S. K. Selvi, "Extracting aspects and mining opinions in product reviews using supervised learning algorithm," in 2nd International Conference on Electronics and Communication Systems (ICECS), Coimbatore, India, 2015.

T. H. A. Soliman, A. R. H. M. A. M and M. M. Doss, "MINING SOCIAL NETWORKS’ ARABIC SLANG COMMENTS," in In Proceedings of IADIS European Conference on Data Mining 2013 (ECDM'13), Prague, Czech Republic, 2013.

M. Abdul-Mageed, M. Diab and S. Kübler, "SAMAR: A system for subjectivity and sentiment analysis for Arabic social media," Computer Speech and Language, vol. 28, no. 1, pp. 20-37, 2014.

M. Diab, "Second generation AMIRA tools for Arabic processing: Fast and robust tokenization POS tagging, and base phrase chunking," in the Second International Conference on Arabic, Cairo, Egypt, 2009.

R. M. Duwairi and I. Qarqaz, "Arabic Sentiment Analysis using Supervised Classification," in The 1st International Workshop on Social Networks Analysis, Management and Security (SNAMS), Barcelona, Spain, 2014.

A. Shoukry and A. Rafea, "Sentence-level Arabic sentiment analysis," in Collaboration Technologies and Systems (CTS), Denver, CO, USA, 2012.

E. Rahm and H. H. Do, "Data Cleaning: Problems and Current Approaches," {IEEE} Data Eng. Bull., vol. 23, no. 4, pp. 3-13, 2000.

V. Singh and B. Saini, "An Effective Tokenization Algorithm for Information Retrieval Systems," in First International Conference on Data Mining (DMIN-2014), Royal Archid , Banglore, India, 2014.

P. Krishnamurthy, P. P. Talukdar, N. Sridhar, A. Ramakrishnan and K. Bali, "Hindi Text Normalization," in Fifth International Conference on Knowledge Based Computer Systems (KBCS), Hyderabad, India, 2004.

J. K. Raulji and J. R. Saini, "Stop-Word Removal Algorithm and its Implementation for Sanskrit Language," International Journal of Computer Applications, vol. 150, no. 2, p. 0975 – 8887, 2016.

C. Moral, A. d. Antonio, R. Imbert and J. Ramirez, "A survey of stemming algorithms in information retrieval," Information Research, vol. 19, no. 1, pp. 1-20, 2014.

Y. Song, L. Zhang and C. L. Giles, "Automatic tag recommendation algorithms for social recommender systems," ACM Transactions on the Web (TWEB), vol. 5, no. 1, p. 4:31, 2011.

S. Khoja, "Arabic Stemmer," Pacific University | 2043 College Way, Forest Grove, Oregon, 1999.

M. K. S. a. W. Ashour, "Arabic Morphological Tools for Text Mining," in 6th ArchEng International Symposiums, EEECS’10 the 6th International Symposium on Electrical and Electronics Engineering and Computer Science, European University of Lefke, Cyprus, 2010., 2010.

H. H. Mustafa, A. Mohamed and D. S. Elzanfaly, "An Enhanced Approach for Arabic Sentiment Analysis," International Journal of Artificial Intelligence and Applications (IJAIA), vol. 8, no. 5, pp. 1-14, 2017.

C. Nicholls and F. Song, "Improving sentiment analysis with Part-of-Speech weighting," in Machine Learning and Cybernetics, International Conference on, Hebei, China, 2009.

W. B. Cavnar and J. M. Trenkle, "N-Gram-Based Text Categorization," in In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, 1994.

S. G. K and S. Joseph, "Text Classification by Augmenting Bag of Words (BOW) Representation with Co-occurrence Feature," IOSR Journal of Computer Engineering (IOSR-JCE), vol. 16, no. 1, pp. 34-38, 2014.

R. Kohavi, "A Study of CrossValidation and Bootstrap for Accuracy Estimation and Model Selection," in IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence, Montreal, Quebec, Canada, 1995.

H. S. Ibrahim, S. M. Abdou and M. Gheith, "MIKA: A Tagged corpus for Modern Standard Arabic and Colloquial Sentiment Analysis," in Proceeding of the 2nd IEEE International Conference on Recent Trends in Information Systems (ReTIS-15), Kolkata, India, 2015.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.