Performance of Sentiment Classification on Tweets of Clothing Brands

Muhammad Shafiq Jalani; Hu Ng; Timothy Tzen Vun  Yap; Vik Tor  Goh

doi:10.33093/jiwe.2022.1.1.2

PDF

Published: Mar 16, 2022

DOI: https://doi.org/10.33093/jiwe.2022.1.1.2

Keywords:

Sentiment analysis, Skin Type Classification, machine learning, clothing brand

Muhammad Shafiq Jalani

Multimedia University, Malaysia

Hu Ng

Multimedia University, Malaysia

https://orcid.org/0000-0002-9895-9978

Timothy Tzen Vun Yap

Multimedia University, Malaysia

Vik Tor Goh

Multimedia University, Malaysia

Abstract

Social media such as Facebook, Instagram, LinkedIn, and Twitter ease the sharing of ideas, thoughts, videos, and photos and information through the building of virtual networks and communities. This has allowed companies and products to reach a wider audience in terms of marketing and advertising, and to gauge feedback from the public. This research investigates clothing brand mentions on Twitter to perform sentiment analysis on users’ thoughts on three clothing brands, namely Asos, Uniqlo and Topshop. The data is collected by applying python libraries, Tweepy to access data from the Twitter streaming API. Following that, data pre-processing such as tokenization, filtering, stemming, and case normalization are performed to remove outliers. Then, the TextBlob algorithm is applied to label the tweet data into three classes; Positive, Negative and Neutral based on the polarity of the tweets. Word embeddings are also created using Word2Vec with TF-IDF. The word embeddings are fed into classification models namely Support Vector Machine (SVM), Naïve Bayes (NB), Random Forest (RF), Logistic Regression (LR) and Multilayer Perceptron (MLP) by comparing their accuracy performances. The models went through training and testing process on a curated tweet dataset comprising 24000 records with three clothing brands (Asos, Uniqlo, Topshop). The classification process was carried out by SVM, NB, RF, LR and MLP with a ratio of 50-50 and 70-30 train-test splits. Hyperparameter tuning was implemented by GridSearchCV to find the best parameters of classification models in order to optimize the best results. The evaluation of performance was measured with accuracy, precision, recall and F1-Score. In the 50-50 train-test splits, LR achieved the highest accuracy by scoring 82%, 87% and 87% on Asos, Uniqlo and Topshop respectively. In the 70-30 train-test splits, LR also achieved highest accuracy by scoring 85%, 90% and 90% for the three clothing brands respectively.

How to Cite

Jalani, M. S., Ng, H., Yap, T. T. V. ., & Goh, V. T. . (2022). Performance of Sentiment Classification on Tweets of Clothing Brands. Journal of Informatics and Web Engineering, 1(1), 16–22. https://doi.org/10.33093/jiwe.2022.1.1.2

Issue

Vol. 1 No. 1 (2022): March 2022

Section

Regular issue

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

All articles published in JIWE are licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) License. Readers are allowed to

Share — copy and redistribute the material in any medium or format under the following conditions:
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use;
NonCommercial — You may not use the material for commercial purposes;
NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.

References

C. Chauhan, and S. Sehgal, “Sentiment analysis on product reviews”, IEEE International Conference on Computing, Communication and Automation (ICCCA), pp. 26-31, 2017.

N. Azzouza, K. Akli-Astouati, A. Oussalah, and S.A. Bachir, “A real-time Twitter sentiment analysis using an unsupervised method”, International Conference on Web Intelligence, Mining and Semantics, pp. 1-10, 2017.

G. Paltoglou, and M. Thelwall, “Twitter, MySpace, Digg: Unsupervised sentiment analysis in social media”, ACM Transactions on Intelligent Systems and Technology (TIST), vol. 3, no. 4, pp. 1-19, 2012.

N.S.D. Abdullah, and I.A. Zolkepli, “Sentiment analysis of online crowd input towards brand provocation in Facebook, Twitter, and Instagram”, International Conference on Big Data and Internet of Thing, pp. 67-74, 2017.

A. Fronzetti Colladon, F. Grippa, and L. Segneri, “A new system for evaluating brand importance: A use case from the fashion industry”, ACM Web Science Conference, pp. 132-136, 2021.

Y. Yuan, and W. Lam, “Sentiment Analysis of Fashion Related Posts in Social Media”, ACM International Conference on Web Search and Data Mining (WSDM '22), Virtual Event, AZ, USA, pp. 1310-1318, 2022.

C. Liu, S. Xia, and C. Lang, “Clothing Consumption during the COVID-19 Pandemic: Evidence from mining tweets”, Clothing and Textiles Research Journal, vol. 39, no. 4, pp. 314-330, 2021.

Y. H. Choi, S. Yoon, B. Xuan, S. Y. T. Lee, and K. H. Lee, “Fashion informatics of the Big 4 Fashion Weeks using topic modeling and sentiment analysis”, Fashion and Textiles, vol. 8, no. 1, pp. 1-27, 2021.

D.R. Cox, E.J. Snell, "Analysis of binary data", 2nd Edition, Routledge, New York, 2017. https://doi.org/10.1201/9781315137391

K. Kowsari, K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text classification algorithms: A survey”, Information, vol. 10, no. 4, pp. 150, 2019.

S. B. Kotsiantis, I. D. Zaharakis, and P. E. Pintelas, “Machine learning: a review of classification and combining techniques”, Artificial Intelligence Review, vol. 26, no. 3, pp. 159-190, 2006.

A. Havan, and M. Harshil, “Student Performance Prediction using Machine Learning”, International Journal of Engineering Research, vol. 4, no. 3, pp. 111-113, 2015, doi: 10.17577/ijertv4is030127.

M.V. Amazona, and A.A Hernandez, “Modelling student performance using data mining techniques: Inputs for academic program development”, ACM International Conference Proceeding Series, pp. 36–40, 2019, https://doi.org/10.1145/3330530.3330544

R. Katuwal, P.N. Suganthan, and L. Zhang, “Heterogeneous oblique random forest”, Pattern Recognition, vol. 99, 107078, 2020.

R. Tang, and X. Zhang, “CART Decision Tree Combined with Boruta Feature Selection for Medical Data Classification”, 5th IEEE International Conference on Big Data Analytics, pp. 80–84, 2020. https://doi.org/10.1109/ICBDA49040.2020.9101199

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality”, Advances in neural information processing systems, vol. 26, pp. 3111-3119, 2013.

Article Sidebar

Main Article Content

Abstract

Article Details

References

Most read articles by the same author(s)