Loan Default Prediction Using Machine Learning Algorithms

Main Article Content

Zhi Zheng Kang
Sin Yin Teh
Samuel Yong Guang Tan
Wei Chien Ng

Abstract

Financial institutions constantly face at the risk of default by borrowers which can result in significant financial losses. It is essential to develop an appropriate predictive model for loan default to reduce these risks and minimise financial losses. The objective of this study is to identify the most suitable machine learning model to predict loan default by comparing four models which are Random Forest, Decision Tree, Extreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM). Additionally, it also examines the key features influencing loan default prediction. The dataset used in this study is sourced from Kaggle and it consists of 148,670 rows with 34 features. As class imbalance is common in the model prediction, Synthetic Minority Over-sampling Technique (SMOTE) is applied during model training to enhance predictive performance. Model performance is evaluated using five significant assessment metrics: accuracy, precision, F1-score, recall, and the area under the receiver operating characteristic curve (ROC AUC). The outcomes indicate that LightGBM performs the best among the other models with the highest accuracy (0.9764), in addition to precision (0.9747) and recall (0.9503) scores. Feature importance analysis is conducted by using permutation importance. It identifies interest, credit type, interest rate spread, and upfront charges as the four most significant features of loan default. These findings provide useful information for financial institutions aiding risk assessment and decision-making to mitigate potential losses.

Article Details

How to Cite
Kang, Z. Z., Teh, S. Y., Tan, S. Y. G., & Ng, W. C. (2025). Loan Default Prediction Using Machine Learning Algorithms. Journal of Informatics and Web Engineering, 4(3), 232–244. https://doi.org/10.33093/jiwe.2025.4.3.14
Section
Regular issue

References

"Lending global market report 2025." The Business Research Company. [Online]. Available: https://www.thebusinessresearchcompany.com/report/lending-global-market-report

"Commercial lending market size, share, and growth analysis." SkyQuest Technology Group. [Online]. Available: https://www.skyquestt.com/report/commercial-lending-market

S. A. Aziz, R. Jayanti, and A. Dinaseviani, "The role of bank and startup fintech P2P lending in supporting financial credit for Indonesian farmers," Jurnal Perspektif Pembiayaan dan Pembangunan Daerah, vol. 12, no. 1, pp. 47-66, 2024, doi: 10.22437/ppd.v12i1.23575.

"The role of AI in credit risk management." JurisTech. [Online]. Available: https://juristech.net/juristech/the-role-of-ai-in-credit-risk-management/ (accessed.

E. B. Ntiamoah, E. Oteng, B. Opoku, and A. Siaw, "Loan default rate and its impact on profitability in financial institutions," Research Journal of Finance and accounting, vol. 5, no. 14, pp. 67-72, 2014.

V. Ivashina and D. Scharfstein, "Bank lending during the financial crisis of 2008," Journal of Financial Economics, vol. 97, no. 3, pp. 319-338, 2010, doi: 10.1016/j.jfineco.2009.12.001.

D. S. Nkambule, B. Twala, and J. H. C. Pretorius, "Effective machine learning techniques for dealing with poor credit data," Risks, vol. 12, no. 11, 2024, doi: 10.3390/risks12110172.

M. Bansal, A. Goyal, and A. Choudhary, "A comparative analysis of K-Nearest Neighbor, genetic, support vector machine, decision tree, and long short term memory algorithms in machine learning," Decision Analytics Journal, vol. 3, 2022, doi: 10.1016/j.dajour.2022.100071.

A. K. C, H. S. Samreen, T. GP, P. S, and Y. B, "Forecasting loan suitability with machine learning," Journal of Emerging Technologies and Innovative Research, vol. 11, no. 4, 2024.

V. Padimi, V. Sravan, and D. D. Ningombam, "Applying machine learning techniques to maximize the performance of loan default prediction," Journal of Neutrosophic and Fuzzy Systems, pp. 44-56, 2022, doi: 10.54216/jnfs.020204.

L. Sathish kumar, V. Pandimurugan, D. Usha, M. Nageswara Guptha, and M. S. Hema, "Random forest tree classification algorithm for predicating loan," Materials Today: Proceedings, vol. 57, pp. 2216-2222, 2022, doi: 10.1016/j.matpr.2021.12.322.

W.-W. Tay, S.-C. Chong, and L.-Y. Chong, "DDoS attack detection with machine learning," Journal of Informatics and Web Engineering, vol. 3, no. 3, pp. 190-207, 2024, doi: 10.33093/jiwe.2024.3.3.12.

W. Wu, "Machine learning approaches to predict loan default," Intelligent Information Management, vol. 14, no. 05, pp. 157-164, 2022, doi: 10.4236/iim.2022.145011.

J. Gao, W. Sun, X. Sui, and A. Farouk, "Research on default prediction for credit card users based on XGBoost-LSTM Model," Discrete Dynamics in Nature and Society, vol. 2021, pp. 1-13, 2021, doi: 10.1155/2021/5080472.

Y. Cheng, "Research on credit strategy based on XGBoost Algorithm and optimization problem" Journal of Physics: Conference Series, 2021, doi: 10.1088/1742-6596/1865/4/042137.

Y. Zhou, "Loan default prediction based on machine learning methods," in Proceedings of the 3rd International Conference on Big Data Economy and Information Management, BDEIM, 2022, pp. 2-3.

J. Gu and J. Lin, "Research on loan default prediction based on logistic regression, RandomForest, XGBoost and AdaBoost," SHS Web of Conferences, vol. 181, 2024, doi: 10.1051/shsconf/202418102008.

J. Liu, Y. Gao, and F. Hu, "A fast network intrusion detection system using adaptive synthetic oversampling and LightGBM," Computers & Security, vol. 106, 2021, doi: 10.1016/j.cose.2021.102289.

X. Hao, Z. Zhang, Q. Xu, G. Huang, and K. Wang, "Prediction of f-CaO content in cement clinker: A novel prediction method based on LightGBM and Bayesian optimization," Chemometrics and Intelligent Laboratory Systems, vol. 220, 2022, doi: 10.1016/j.chemolab.2021.104461.

X. Zhu, Q. Chu, X. Song, P. Hu, and L. Peng, "Explainable prediction of loan default based on machine learning models," Data Science and Management, vol. 6, no. 3, pp. 123-133, 2023, doi: 10.1016/j.dsm.2023.04.003.

S. Albahra et al., "Artificial intelligence and machine learning overview in pathology & laboratory medicine: A general review of data preprocessing and basic supervised concepts," Seminars in Diagnostic Pathology, vol. 40, no. 2, pp. 71-87, Mar 2023, doi: 10.1053/j.semdp.2023.02.002.

A. Juna et al., "Water quality prediction using KNN imputer and multilayer perceptron," Water, vol. 14, no. 17, 2022, doi: 10.3390/w14172592.

A. Altamimi et al., "An automated approach to predict diabetic patients using KNN imputation and effective data mining techniques," BMC Medical Research Methodology, vol. 24, no. 1, p. 221, Sep 27 2024, doi: 10.1186/s12874-024-02324-0.

Q. H. Nguyen et al., "Influence of data splitting on performance of machine learning models in prediction of shear strength of soil," Mathematical Problems in Engineering, vol. 2021, pp. 1-15, 2021, doi: 10.1155/2021/4832864.

D. Ojo, M. Al-Mhiqani, H. Al-Aqrabi, and T. Al-Shehari, "Evaluation of machine learning algorithm and SMOTE for Insider threat detection," in International Symposium on Intelligent Computing Systems, 2024: Springer, pp. 303-318.

N. L. S. S. Seemakurthi, "Parkinson’s disease detection using existing machine learning algorithms," Bachelor of Science in Computer Science, Blekinge Institute of Technology, 2024.

H. Chen, L. Yang, and Q. Wu, "Enhancing land cover mapping and monitoring: an interactive and explainable machine learning approach using Google Earth Engine," Remote Sensing, vol. 15, no. 18, 2023, doi: 10.3390/rs15184585.

M. Peplinski, B. Dilkina, M. Chen, S. J. Silva, G. A. Ban-Weiss, and K. T. Sanders, "A machine learning framework to estimate residential electricity demand based on smart meter electricity, climate, building characteristics, and socioeconomic datasets," Applied Energy, vol. 357, 2024, doi: 10.1016/j.apenergy.2023.122413.

S. A. Lashari, M. M. Khan, A. Khan, S. Salahuddin, and M. N. Ata, "Comparative evaluation of machine learning models for mobile phone price prediction: Assessing accuracy, robustness, and generalization performance," Journal of Informatics and Web Engineering, vol. 3, no. 3, pp. 147-163, 2024, doi: 10.33093/jiwe.2024.3.3.9.