Performance of Sentiment Classification on Tweets of Clothing Brands

Main Article Content

Muhammad Shafiq Jalani
Hu Ng
Timothy Tzen Vun Yap
Vik Tor Goh

Abstract

Social media such as Facebook, Instagram, LinkedIn, and Twitter ease the sharing of ideas, thoughts, videos, and photos and information through the building of virtual networks and communities. This has allowed companies and products to reach a wider audience in terms of marketing and advertising, and to gauge feedback from the public. This research investigates clothing brand mentions on Twitter to perform sentiment analysis on users’ thoughts on three clothing brands, namely Asos, Uniqlo and Topshop. The data is collected by applying python libraries, Tweepy to access data from the Twitter streaming API.  Following that, data pre-processing such as tokenization, filtering, stemming, and case normalization are performed to remove outliers.  Then, the TextBlob algorithm is applied to label the tweet data into three classes; Positive, Negative and Neutral based on the polarity of the tweets. Word embeddings are also created using Word2Vec with TF-IDF. The word embeddings are fed into classification models namely Support Vector Machine (SVM), Naïve Bayes (NB), Random Forest (RF), Logistic Regression (LR) and Multilayer Perceptron (MLP) by comparing their accuracy performances.  The models went through training and testing process on a curated tweet dataset comprising 24000 records with three clothing brands (Asos, Uniqlo, Topshop). The classification process was carried out by SVM, NB, RF, LR and MLP with a ratio of 50-50 and 70-30 train-test splits. Hyperparameter tuning was implemented by GridSearchCV to find the best parameters of classification models in order to optimize the best results.  The evaluation of performance was measured with accuracy, precision, recall and F1-Score. In the 50-50 train-test splits, LR achieved the highest accuracy by scoring 82%, 87% and 87% on Asos, Uniqlo and Topshop respectively. In the 70-30 train-test splits, LR also achieved highest accuracy by scoring 85%, 90% and 90% for the three clothing brands respectively.

Article Details

How to Cite
Jalani, M. S., Ng, H., Yap, T. T. V. ., & Goh, V. T. . (2022). Performance of Sentiment Classification on Tweets of Clothing Brands. Journal of Informatics and Web Engineering, 1(1), 16–22. https://doi.org/10.33093/jiwe.2022.1.1.2
Section
Regular issue

References

C. Chauhan, and S. Sehgal, “Sentiment analysis on product reviews”, IEEE International Conference on Computing, Communication and Automation (ICCCA), pp. 26-31, 2017.

N. Azzouza, K. Akli-Astouati, A. Oussalah, and S.A. Bachir, “A real-time Twitter sentiment analysis using an unsupervised method”, International Conference on Web Intelligence, Mining and Semantics, pp. 1-10, 2017.

G. Paltoglou, and M. Thelwall, “Twitter, MySpace, Digg: Unsupervised sentiment analysis in social media”, ACM Transactions on Intelligent Systems and Technology (TIST), vol. 3, no. 4, pp. 1-19, 2012.

N.S.D. Abdullah, and I.A. Zolkepli, “Sentiment analysis of online crowd input towards brand provocation in Facebook, Twitter, and Instagram”, International Conference on Big Data and Internet of Thing, pp. 67-74, 2017.

A. Fronzetti Colladon, F. Grippa, and L. Segneri, “A new system for evaluating brand importance: A use case from the fashion industry”, ACM Web Science Conference, pp. 132-136, 2021.

Y. Yuan, and W. Lam, “Sentiment Analysis of Fashion Related Posts in Social Media”, ACM International Conference on Web Search and Data Mining (WSDM '22), Virtual Event, AZ, USA, pp. 1310-1318, 2022.

C. Liu, S. Xia, and C. Lang, “Clothing Consumption during the COVID-19 Pandemic: Evidence from mining tweets”, Clothing and Textiles Research Journal, vol. 39, no. 4, pp. 314-330, 2021.

Y. H. Choi, S. Yoon, B. Xuan, S. Y. T. Lee, and K. H. Lee, “Fashion informatics of the Big 4 Fashion Weeks using topic modeling and sentiment analysis”, Fashion and Textiles, vol. 8, no. 1, pp. 1-27, 2021.

D.R. Cox, E.J. Snell, "Analysis of binary data", 2nd Edition, Routledge, New York, 2017. https://doi.org/10.1201/9781315137391

K. Kowsari, K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text classification algorithms: A survey”, Information, vol. 10, no. 4, pp. 150, 2019.

S. B. Kotsiantis, I. D. Zaharakis, and P. E. Pintelas, “Machine learning: a review of classification and combining techniques”, Artificial Intelligence Review, vol. 26, no. 3, pp. 159-190, 2006.

A. Havan, and M. Harshil, “Student Performance Prediction using Machine Learning”, International Journal of Engineering Research, vol. 4, no. 3, pp. 111-113, 2015, doi: 10.17577/ijertv4is030127.

M.V. Amazona, and A.A Hernandez, “Modelling student performance using data mining techniques: Inputs for academic program development”, ACM International Conference Proceeding Series, pp. 36–40, 2019, https://doi.org/10.1145/3330530.3330544

R. Katuwal, P.N. Suganthan, and L. Zhang, “Heterogeneous oblique random forest”, Pattern Recognition, vol. 99, 107078, 2020.

R. Tang, and X. Zhang, “CART Decision Tree Combined with Boruta Feature Selection for Medical Data Classification”, 5th IEEE International Conference on Big Data Analytics, pp. 80–84, 2020. https://doi.org/10.1109/ICBDA49040.2020.9101199

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality”, Advances in neural information processing systems, vol. 26, pp. 3111-3119, 2013.

Most read articles by the same author(s)