Analysis of Social Media Trends for Political Election Predictions
Main Article Content
Abstract
The traditional opinion polls are losing the capability to forecast election results because of the biasness in the sampling and the slowness in updating the polls. Although social media has a lot of real-time data, it has a considerable number of drawbacks, such as noise and demographic bias. This paper shows a new end-to-end forecasting pipeline, which is evaluated on a corpus of 1.75 million tweets related to the 2020 election in the U.S. Our approach adopts a dual-path sentiment extracted (VADER & RoBERTa) for a better accuracy and a new state level feature engineering to fix data bias. This plan transforms raw scores into 14 relative indicators, including sentiment differentials and volume ratios of tweets, which adjusts the regional activity imbalances. Average state-level prediction of a tuned Gradient Boosting Tree (GBT) classifier trained using these features was 70.6 per cent (ROC-AUC 0.69). Importantly, the cumulative prognosis was an exact duplicate of the Electoral College 306-232 majority. The feature analysis established that our relative indicators, especially the tweet volume ratio, were the strongest predictors that we engineered. The objectives in this paper are to present an alternative to traditional polling that is both powerful and easy to interpret in terms of bias reduction due to these relative characteristics. This framework has shown that it is a scalable, real-time process of political forecasting that has attained its goals of capturing the dynamics of the electoral process where conventional methods fail to do so.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
All articles published in JIWE are licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) License. Readers are allowed to
- Share — copy and redistribute the material in any medium or format under the following conditions:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use;
- NonCommercial — You may not use the material for commercial purposes;
- NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.
References
A. Gelman, “Failure and Success in Political Polling and Election Forecasting”, Statistics and Public Policy, vol. 8, no. 1, pp. 67–72, Jan. 2021, doi: 10.1080/2330443X.2021.1971126.
D. S. Hillygus, “The Evolution of Election Polling in the United States’, Public Opinion Quarterly, vol. 75, no. 5, pp. 962–981, Dec. 2011, doi: 10.1093/poq/nfr054.
M. Huberty, “Can we vote with our tweet? On the perennial difficulty of election forecasting with social media’, International Journal of Forecasting, vol. 31, no. 3, pp. 992–1007, Jul. 2015, doi: 10.1016/j.ijforecast.2014.08.005.
K. D. S. Brito, R. L. C. S. Filho, and P. J. L. Adeodato, “A Systematic Review of Predicting Elections Based on Social Media Data: Research Challenges and Future Directions”, IEEE Transactions on Computational Social Systems, vol. 8, no. 4, pp. 819–843, Aug. 2021, doi: 10.1109/TCSS.2021.3063660.
P. Chauhan, N. Sharma, and G. Sikka, “The emergence of social media data and sentiment analysis in election prediction”, J Ambient Intell Human Comput, vol. 12, no. 2, pp. 2601–2627, Feb. 2021, doi: 10.1007/s12652-020-02423-y.
D. Rousidis, P. Koukaras, and C. Tjortjis, “Social media prediction: a literature review”, Multimed Tools Appl, vol. 79, no. 9–10, pp. 6279–6311, Mar. 2020, doi: 10.1007/s11042-019-08291-9.
P. Sinha, A. Verma, P. Shah, J. Singh, and U. Panwar, “prediction for the 2020 united states presidential election using linear regression model”, 2020.
A. Yavari, H. Hassanpour, B. Rahimpour Cami, and M. Mahdavi, “Election Prediction Based on Sentiment Analysis using Twitter Data”, International Journal of Engineering, vol. 35, no. 2, pp. 372–379, Feb. 2022, doi: 10.5829/ije.2022.35.02b.13.
A. Khan, H. Zhang, N. Boudjellal, A. Ahmad, and M. Khan, “Improving Sentiment Analysis in Election-Based Conversations on Twitter with ElecBERT Language Model”, Computers, Materials & Continua, vol. 76, no. 3, pp. 3345–3361, 2023, doi: 10.32604/cmc.2023.041520.
K. Afifah, I. N. Yulita, and I. Sarathan, “Sentiment Analysis on Telemedicine App Reviews using XGBoost Classifier”, in 2021 International Conference on Artificial Intelligence and Big Data Analytics, Bandung, Indonesia: IEEE, Oct. 2021, pp. 22–27, doi: 10.1109/ICAIBDA53487.2021.9689762.
G. Feng, H. Cai, K. Chen, and Z. Li, “A Hybrid Method of Sentiment Analysis and Machine Learning Algorithm for the U.S. Presidential Election Forecasting”, Dec. 09, 2023, arXiv: arXiv:2312.05584, doi: 10.48550/arXiv.2312.05584.
Y. Mejova, “Sentiment Analysis: An Overview”, Comprehensive Exam Paper, University of Iowa.
C. Hui, “US Election 2020 Tweets,” Kaggle. Accessed: Jul. 15, 2025. [Online]. Available: https://www.kaggle.com/datasets/manchunhui/us-election-2020-tweets
C. Macpherson, “2020 US Presidential Election Results by State,” Kaggle. Accessed: Jul. 15, 2025. [Online]. Available: https://www.kaggle.com/datasets/callummacpherson14/2020-us-presidential-election-results-by-state
C. Hutto and E. Gilbert, “VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text”, Proceedings of the International AAAI Conference on Web and Social Media, vol. 8, no. 1, Art. no. 1, May 2014, doi: 10.1609/icwsm.v8i1.14550.
N. C. Dang, M. N. Moreno-García, and F. De La Prieta, “Sentiment Analysis Based on Deep Learning: A Comparative Study”, Electronics, vol. 9, no. 3, p. 483, Mar. 2020, doi: 10.3390/electronics9030483.
W. El-Hajj and H. Hajj, “An optimal approach for text feature selection”, Computer Speech & Language, vol. 74, p. 101364, Jul. 2022, doi: 10.1016/j.csl.2022.101364.
P. Mishra, D. Punia, G. Sikka, A. K. Sharma, and K. Sikka, “Evaluating Various Techniques for Twitter Sentiment Analysis for Election Results”, in 2024 First International Conference on Pioneering Developments in Computer Science & Digital Technologies (IC2SDT), Delhi, India: IEEE, Aug. 2024, pp. 52–57. doi: 10.1109/IC2SDT62152.2024.10696204.
M. Tabany and M. Gueffal, “Sentiment Analysis and Fake Amazon Reviews Classification Using SVM Supervised Machine Learning Model”, Journal of Advances in Information Technology, vol. 15, no. 1, pp. 49–58, 2024, doi: 10.12720/jait.15.1.49-58.
Z. Zhou, M. Serafino, L. Cohan, G. Caldarelli, and H. A. Makse, “Why polls fail to predict elections”, J Big Data, vol. 8, no. 1, p. 137, Dec. 2021, doi: 10.1186/s40537-021-00525-8.