Developing A Predictive Model for Football Players’ Market Value Using Machine Learning
Main Article Content
Abstract
Football is the world’s most popular sport, and evaluating the market value of players is crucial for clubs and managers in making informed decisions regarding transfers, contracts, and financial planning. This study aims to develop a predictive model to estimate the market value of football players using machine learning (ML) algorithms and real-life statistics performance data from the top five European leagues such as English Premier League, Italian Serie A, Spanish La Liga, German Bundesliga, and French Ligue 1 between the 2017/18 and 2019/20 seasons. By reviewing past research, various ML methods such as Random Forest, LightGBM, XGBoost, and Gradient Boosting Decision Tree (GBDT) are developed. Data preprocessing techniques, including data cleaning, feature selection, feature encoding, splitting, and standardization, are applied to ensure data quality and consistency. To tune the hyperparameter of the models, RandomizedSearchCV is applied alongside cross validation. The model evaluation is conducted using regression metrics such as mean absolute error (MAE), root mean squared error (RMSE), and coefficient of determination (R²), to determine the most accurate model. The best-performing model is further utilised to analyse the correlation between the features and market value, offering insights into the key features that significantly impact the market value for each position.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
All articles published in JIWE are licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) License. Readers are allowed to
- Share — copy and redistribute the material in any medium or format under the following conditions:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use;
- NonCommercial — You may not use the material for commercial purposes;
- NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.
References
I. Behravan, and S. M. Razavi, “A novel machine learning method for estimating football players” value in the transfer market,” Soft Computing, 2020, doi: 10.1007/s00500-020-05319-3.
C. Li, S. Kampakis, and P. Treleaven, “Machine learning modeling to evaluate the value of football players,” arXiv.org, 2022, doi: 10.48550/arXiv.2207.11361.
M. A. Al-Asadi, and S. Tasdemir, “Predict the value of football players using FIFA video game data and machine learning techniques,” IEEE Access, vol. 10, pp. 22631-22645, 2022, doi: 10.1109/access.2022.3154767.
G. P. K. Laros, “Predicting transfer value of professional football players based on player skills and characteristics using multiple linear regression, support vector regression, and random forest regression,” Tilburg University, 2020.
J. Almulla, and T. Alam, “Machine learning models reveal key performance metrics of football players to win matches in Qatar Stars League,” IEEE Access, vol. 8, pp. 213695–213705, 2020, doi: 10.1109/access.2020.3038601.
Q. Yi., M. Gomez-Ruano, H. Liu, S. Zhang, B. Gao, F. Wunderlich, and D. Memmert, “Evaluation of the technical performance of football players in the UEFA champions league,” International Journal of Environmental Research and Public Health, vol. 17, no. 2, pp. 604, 2020, doi: 10.3390/ijerph17020604.
W. R. Johnson, A. Mian, D. G. Lloyd, and J. A. Alderson, “On-field player workload exposure and knee injury risk monitoring via deep learning,” Journal of Biomechanics, vol. 93, pp. 185–193, 2019, doi: 10.1016/j.jbiomech.2019.07.002.
R. Tracy, H. Xia, A. Rasla, Y.-F. Wang, and A. Singh, “Graph encoding and neural network approaches for volleyball analytics: From game outcome to individual play predictions,” arXiv.org, 2023, doi: 10.48550/arXiv.2308.11142.
N. Chmait and H. Westerbeek, “Artificial Intelligence and machine learning in sport research: An introduction for non-data scientists,” Frontiers in Sports and Active Living, vol. 3, pp. 682287, 2021, doi: 10.3389/fspor.2021.682287.
H. Al-Shari, Y. A. Saleh, and Alper Odabas, “Comparison of gradient boosting decision tree algorithms for CPU performance,” Erciyes Medical Journal, vol. 37, pp. 157–168, 2021.
J. Prathuri, A. Kulkarni, A. Kamath, A. Menon, P. Dhatwalia, and D. Rishabh, "Prediction of player price in IPL auction using machine learning regression algorithms", 2020 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT), pp. 1-6, 2020, doi: 10.1109/conecct50063.2020.9198668.
A. Jana, and S. Hemalatha, “Football player performance analysis using particle swarm optimization and player value calculation using regression,” Journal of Physics: Conference Series, vol. 1911, no. 1, pp. 012011, 2021, doi: 10.1088/1742-6596/1911/1/012011.
M. Elahi, S. Pandey, and S. S. Malhi, “Market value prediction of football players,” SSRN Electronic Journal, 2024, doi: 10.2139/ssrn.4485449.
H. Lee, B. A. Tama, and M. Cha, “Prediction of football player value using bayesian ensemble approach,” arXiv.org, 2022, doi: 10.48550/arXiv.2206.13246.
Alessia, “European soccer dataset,” Kaggle, 2023. [Online]. Available: https://www.kaggle.com/datasets/alessiasimone/european-soccer-dataset-season-20172020.
N. Tamboli, “Tackling missing value in dataset,” Analytics Vidhya, 2021. [Online]. Available: https://www.analyticsvidhya.com/blog/2021/10/handling-missing-value/.
M. S. Jalani, H. Ng, T. T. V. Yap, and V. T . Goh, “Performance of Sentiment Classification on Tweets of Clothing Brands”, Journal of Informatics and Web Engineering, vol. 1, no. 1, pp. 16–22, Mar. 2022, doi: 10.33093/jiwe.2022.1.1.2.
S. B. Pinjosovsky, “Normalize data before or after split of training and testing data?,” Medium, 2023. [Online]. Available: https://medium.com/@spinjosovsky/normalize-data-before-or-after-split-of-training-and-testing-data-7b8005f81e26.
C. M. Chituru, S.-B. Ho, and I. Chai, “Diabetes Risk Prediction using Shapley Additive Explanations for Feature Engineering”, Journal of Informatics and Web Engineering, vol. 4, no. 2, pp. 18–35, Jun. 2025, doi: 10.33093/jiwe.2025.4.2.2.
C. Lee, P. Hsu, M. Cheng, J. Leu, N. Xu, and B. Kan, "Using machine learning to predict salaries of major league baseball players", Lecture Notes in Computer Science, pp. 28-33, 2021, doi: 10.1007/978-3-030-79463-7_3.