Sentiment Analysis using Support Vector Machine and Random Forest

Talha Ahmed Khan; Rehan  Sadiq; Zeeshan  Shahid; Muhammad Mansoor Alam; Mazliham Bin  Mohd Su'ud

doi:10.33093/jiwe.2024.3.1.5

PDF

Published: Feb 14, 2024

DOI: https://doi.org/10.33093/jiwe.2024.3.1.5

Keywords:

Sentiment Analysis, Machine Learning, Opinion Mining, Natural Language Processing, Preprocessing Techniques, Feature Extraction

Talha Ahmed Khan

Multimedia University, Malaysia

https://orcid.org/0000-0001-6687-0920

Rehan Sadiq

Bahria University Kaarachi Campus, Pakistan

Zeeshan Shahid

Nazeer Hussain University, Pakistan

Muhammad Mansoor Alam

Riphah International University, Pakistan

Mazliham Bin Mohd Su'ud

Multimedia University, Malaysia

Abstract

Sentiment analysis, is commonly known as opinion mining, is a vital field in natural language processing (NLP) that claims to find out the sentiment or emotion expressed in a given text. This research paper demonstrates an exhaustive survey of sentiment analysis, focusing on the application of machine learning techniques. Comprehensive parametric literature review has been completed to determine the sentiment analysis using SVM and Random Forest. Additionally, the paper covers preprocessing techniques, feature extraction, model training, evaluation, and challenges encountered in sentiment analysis. The findings of this research contribute to a deeper understanding of sentiment analysis and provide insights into the effectiveness of machine learning approaches in this domain. Based on the results obtained, two machine learning algorithms named as Random Forest and SVM were evaluated based on their accuracy in a classification task. The Random Forest algorithm achieved an accuracy of 0.78564, while SVM outperformed it slightly with an accuracy of 0.80394. Both Random Forest and SVM have demonstrated their strengths in achieving respectable accuracies in the given classification task. These results suggest that SVM, with its slightly higher accuracy of 0.80394, may be a more suitable choice when accuracy is the primary concern. However, the basic configuration need and characteristics of the problem at hand should be considered when choosing the better algorithm with better results.

How to Cite

Khan, T. A., Sadiq, R. ., Shahid, Z. ., Alam, M. M., & Mohd Su’ud, M. B. . (2024). Sentiment Analysis using Support Vector Machine and Random Forest. Journal of Informatics and Web Engineering, 3(1), 67–75. https://doi.org/10.33093/jiwe.2024.3.1.5

Issue

Vol. 3 No. 1 (2024): February 2024

Section

Regular issue

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

All articles published in JIWE are licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) License. Readers are allowed to

Share — copy and redistribute the material in any medium or format under the following conditions:
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use;
NonCommercial — You may not use the material for commercial purposes;
NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.

References

B. Pang and L. Lee, “Opinion mining and sentiment analysis”, Foundations and Trends® in information retrieval, 2(1–2), pp 1-135, 2008.

B. Liu, “Sentiment analysis and opinion mining”, Synthesis Lectures on Human Language Technologies, 5(1), pp 1-167, 2012.

E. Cambria and A. Hussain, “Sentic Computing: Techniques, Tools, and Applications”, Springer, 2012.

S. Kiritchenko and S. Mohammad, “Examining the use of sarcasm on Twitter for sentiment analysis”, Proceedings of the conference on empirical methods in natural language processing, 2, pp.7-12, 2018.

C.D. Manning and H. Schütze, “Foundations of statistical natural language processing”, MIT Press, 1999.

S. Bird, E. Klein and E. Loper, “Natural language processing with Python”, O'Reilly Media Inc, 2009.

M. F. Porter, “An algorithm for suffix stripping”, Program, 14(3), pp.130-137, 1980.

W. Medhat, A. Hassan and H. Korashy, “Sentiment analysis algorithms and applications: A survey”, Ain Shams Engineering Journal, 5(4), pp.1093-1113, 2014.

S. Bird, E. Klein and E. Loper, “Natural language processing with Python”, O'Reilly Media Inc, 2009.

B. Pang and L. Lee, “Opinion mining and sentiment analysis”, Foundations and trends in information retrieval, 2(1-2), pp.1-135, 2008.

T. Joachims, “Text categorization with support vector machines: Learning with many relevant features”, In European conference on machine learning, Springer, pp. 137-142, 1998.

G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval”, Information processing & management, 24(5), pp.513-523, 1988.

T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado and J. Dean, “Distributed representations of words and phrases and their compositionality”, In Advances in neural information processing systems, pp.3111-3119, 2013.

R.W.Picard, “Affective computing”, MIT press, 2000.

A. Alfarrarjeh, S. Agrawal, S. H. Kim and C. Shahabi, “Geo-Spatial Multimedia Sentiment Analysis in Disasters”, IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 193-202, 2017, doi: 10.1109/DSAA.2017.77.

M. Maia, A. Freitas and S. Handschuh, “FinSSLx: A Sentiment Analysis Model for the Financial Domain Using Text Simplification”, IEEE 12th International Conference on Semantic Computing (ICSC), pp. 318-319, 2018, doi: 10.1109/ICSC.2018.00065.

S. Vanaja and M. Belwal, “Aspect-Level Sentiment Analysis on ECommerce Data”, International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 1275-1279, 2018. doi: 10.1109/ICIRCA.2018.8597286.

A. J. Nair, G. Veena and A. Vinayak, “Comparative study of Twitter Sentiment on COVID - 19 Tweets”, 5th International Conference on Computing Methodologies and Communication (ICCMC), pp. 1773-1778, 2021, doi: 10.1109/ICCMC51019.2021.9418320.

C. Whissell, “Using the Revised Dictionary of Affect in Language to Quantify the Emotional Undertones of Samples of Natural Language”, Psychological Reports, 105(2), pp.509–521, 2009, doi:10.2466/pr0.105.2.509- 521.

S. Park, and Y. Kim, “Building thesaurus lexicon using dictionary-based approach for sentiment classification”, IEEE 14th International Conference on Software Engineering Research, Management and Applications (SERA), 2016, doi:10.1109/sera.2016.7516126.

B. Pang, “Thumbs up? Sentiment Classification Using Machine Learning Techniques”, Proc. EMNLP, Philadelphia. PA, USA, July 2002.

S. Hemalatha and R. Ramathmika, “Sentiment Analysis of Yelp Reviews by Machine Learning”, International Conference on Intelligent Computing and Control Systems (ICCS), pp.700-704, 2019, doi: 10.1109/ICCS45141.2019.9065812.

Y. Sun, X. G. Zhou and W. Fu, “Unsupervised Topic and Sentiment Unification Model for Sentiment Analysis”, Acta Scientiarum Naturalium Universitatis Pekinensis, 49(1), pp.102-108, 2013.

Z. G. Jin and Y. Yang, “A semi-supervised short text sentiment analysis model based on social relationship strength”, Journal of Harbin Institute of Technology, 51(05), pp.50-56, 2019.

T. T. Nguyen and A. G. Kravets, “Analysis of the social network facebook comments”, 7th International Conference on Information, Intelligence, Systems & Applications (IISA), pp. 1- 5, 2016, doi: 10.1109/IISA.2016.7785412.

A. A. Gamova, A. A. Horoshiy and V. G. Ivanenko, “Detection of Fake and Provokative Comments in Social Network Using Machine Learning”, IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus), pp.309-311, 2020, doi: 10.1109/EIConRus49466.2020.9039423.

S. Mestry, H. Singh, R. Chauhan, V. Bisht and K. Tiwari, “Automation in Social Networking Comments With the Help of Robust fastText and CNN”, 1st International Conference on Innovations in Information and Communication Technology (ICIICT), pp.1-4, 2019, doi: 10.1109/ICIICT1.2019.8741503.

R. Meena and V. T. Bai, “Study on Machine learning based Social Media and Sentiment analysis for medical data applications”, Third International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp.603-607, 2019, doi: 10.1109/ISMAC47947.2019.9032580.

T. A. Khan et al., “An Implementation of Electroencephalogram Signals Acquisition to Control Manipulator through Brain Computer Interface”, IEEE International Conference on Innovative Research and Development (ICIRD), Jakarta, Indonesia, pp.1-6, 2019, doi: 10.1109/ICIRD47319.2019.9074722.

T. A. Khan, M. Alam, Z. Shahid and M. M. Suud, “Prior investigation for flash floods and hurricanes, concise capsulization of hydrological technologies and instrumentation: A survey”, 2017 IEEE 3rd International Conference on Engineering Technologies and Social Sciences (ICETSS), Bangkok, Thailand, 2017, pp. 1-6, doi: 10.1109/ICETSS.2017.8324170.

T. A. Khan, M. Alam, K. A. Kadir, Z. Shahid and M. Mazliham, "Artificial Intelligence based prediction of seizures for Epileptic Patients: IoT based Cost effective Solution," 2019 7th International Conference on Information and Communication Technology (ICoICT), Kuala Lumpur, Malaysia, pp.1-5, 2019, doi: 10.1109/ICoICT.2019.8835350.

T. A. Khan, M. M. Alam, Z. Shahid and M. M. Su’ud, “Prior Recognition of Flash Floods: Concrete Optimal Neural Network Configuration Analysis for Multi-Resolution Sensing,” in IEEE Access, vol. 8, pp.210006-210022, 2020, doi: 10.1109/ACCESS.2020.3038812.

T.A. Khan, M. Alam, K. Kadir, Z. Shahid, and M.S. Mazliham, “Prior Determination of Flash Floods: Artificial Intelligence Based Predictive Analysis using Modified Cuckoo Search”, J. Comput. Theor. Nanosci, 17(2-3), pp.990–995,Feb 2020.

T.A. Khan, M. Alam, Z. Shahi, and M.S. Mazliham, “Performance comparison of SVM and its variants for the early prognosis of breast cancer”, Sukkur IBA Journal of Computing and Mathematical Sciences, [S.l.], v.3, n.2, pp.1-8, mar. 2020. ISSN 2522-3003.

T.A. Khan, S. Ahmed, S.S.A. Rizvi, S. Ahmad and N. Khan, “Electromyography Based Gesture Recognition: An Implementation of Hand Gesture Analysis Using Sensors”, Sir Syed University Research Journal of Engineering and Technology, Vol.12, Issue.1, pp.70~77, July 2022, https://doi.org/10.33317/ssurj.424

H. Shelar, G. Kaur, N. Heda and P. Agrawal, “Named Entity Recognition Approaches and Their Comparison for Custom NER Model”, Science & Technology Libraries, 39(3), pp.324–337, 2020, doi:10.1080/0194262X.2020.1759479.

H. Panoutsopoulos, C. Brewster and B. Espejo-Garcia, “Developing a model for the automated identification and extraction of agricultural terms from unstructured text”, IOCAG 2022. doi:10.3390/iocag2022-12264.

Y. Huo, Y. Su and M. Lyu, “LogVm: Variable Semantics Miner for Log Messages”, IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), Charlotte, NC, USA, pp.124-125, 2022, doi: 10.1109/ISSREW55968.2022.00053.

J. Wan, M. Liu, J. Yi and X. Zhang, “Detecting spam webpages through topic and semantics analysis”, 2015 Global Summit on Computer & Information Technology (GSCIT), Sousse, Tunisia, pp.1-7, 2015, doi: 10.1109/GSCIT.2015.7353328.

Y. Lim, K.W. Ng, P. Naveen, and S.C. Haw, “Emotion Recognition by Facial Expression and Voice: Review and Analysis”, Journal of Informatics and Web Engineering, 1(2), pp.45-54, 2022.

C.Y. Seek, S.Y. Ooi, Y.H. Pang, S.L. Lew and X.Y. Heng, “Elderly and Smartphone Apps: Case Study with Lightweight MySejahtera”, Journal of Informatics and Web Engineering, 2(1), pp.13-24, 2023.

Article Sidebar

Main Article Content

Abstract

Article Details

References