Predicting Diabetes Mellitus with Machine Learning Techniques Manuscript Received: 6 October 2023, Accepted: 20 December 2023, Published: 15 March 2024, ORCiD: 0000-0002-3128-585X,

Main Article Content

Tong Hau Lee
Ng Hu
Harannesh Arul Ananthan


This study addresses the challenge of accurately identifying diabetes mellitus in individuals. Utilizing accessible online and real-world diagnostic data, we employ machine learning models, including Support Vector Machine, Random Forest, Naïve Bayes, eXtreme Gradient Boosting, and Deep Neural Network, on the PIMA Indian Diabetes and NHANES 1999-2016 datasets. Rigorous data pre-processing steps were conducted, handling null values, outliers, and imbalanced data together with data normalization. Our results reveal that the RF model achieves a 79% accuracy for binary classification on the PIMA Indian Diabetes dataset, using a 60:40 train-test split with BORUTA selected features. Meanwhile, the XGBoost model excels on the NHANES 1999-2016 dataset, achieving 92% accuracy for binary and 91% for multiclass classification respectively.

Article Details



Diabetes, World Health Organisation (WHO), [Retrieved 13 April 2023]

S. U. Jeong, D. G. Kang, D. H. Lee, K. W. Lee, D. M. Lim, B. J. Kim, K. Y. Park, H. J. Chin, G. P. Koh, “Clinical Characteristics of Type 2 Diabetes Patients According to Family History of Diabetes,” Korean Diabetes J., 34, pp. 222-228, 2010.

R. Deo, S. Panigrahi, “Performance Assessment of Machine Learning Based Models for Diabetes Prediction,” in IEEE Healthcare Innov. and Point of Care Technol., Bethesda, USA, pp. 147-150, 2019.

K.A. Hasan, M. A. M. Hasan, “Prediction of Clinical Risk Factors of Diabetes Using Multiple Machine Learning Techniques Resolving Class Imbalance,” in 23rd Int. Conf. Comp. and Inform. Technol., Bangladesh, pp. 1-6, 2020.

M. F. Faruque, Asaduzzaman, I. H. Sarker, (2019). “Performance Analysis of Machine Learning Techniques to Predict Diabetes Mellitus,” in 2019 Int. Conf. Electr., Comp. and Commun. Eng., pp. 1-4, 2019.

M. Rady, K. Moussa, M. Mostafa, A. Elbasry, Z. Ezzat, W. Medhat, “Diabetes Prediction Using Machine Learning: A Comparative Study,” in 3rd Novel Intellig. and Leading Emerging Sci. Conf., Giza, Egypt, pp. 279-282, 2021.

A. C. Lyngdoh, N. A. Choudhury, S. Moulik, “Diabetes Disease Prediction Using Machine Learning Algorithms,” in 2020 IEEE-EMBS Conf. Biomedical Eng. and Sci., Langkawi Island, Malaysia, pp. 517-521, 2020.

P. K. Saha, N. S. Patwary, I. Ahmed, “A Widespread Study of Diabetes Prediction Using Several Machine Learning Techniques,” in 22nd Int. Conf. Comp. and Inform. Technol., Dhaka, Bangladesh, pp. 1-5, 2019.

M. A. Sarwar, N. Kamal, W. Hamid, M. A. Shah, “Prediction of Diabetes Using Machine Learning Algorithms in Healthcare,” in 24th Int. Conf. Automat. and Comput., Newcastle Upon Tyne, UK, pp. 1-6, 2018.

G. Tripathi, R. Kumar, “Early Prediction of Diabetes Mellitus Using Machine Learning,” in 8th Int. Conf. Reliability, Infocom Technol. and Optimiz. (Trends and Future Directions), Noida, India, pp. 1009-1014, 2020.

P. S. Kohli, S. Arora, “Application of Machine Learning in Disease Prediction,” in 4th Int. Conf. Comput. Commun. and Automat., Greater Noida, India, pp. 1-4, 2018.

M. Rahman, L. Islam, “Diabetes Recognition in Pregnant Women by Extracting Features Using PCA and Data Mining Algorithms,” in IEEE Pune Sect. Int. Conf., Pune, India, pp. 1-6, 2019.

P. Sonar, K. JayaMalini, “Diabetes Prediction Using Different Machine Learning Approaches,” in 3rd Int. Conf. Comput. Methodol. and Commun., Erode, India, pp. 367-371, 2019.

C. Charitha, A. Devi Chaitrasree, P. C. Varma, C. Lakshmi, “Type-II Diabetes Prediction Using Machine Learning Algorithms,” in Int. Conf. Comp. Commun. and Inform., Coimbatore, India, pp. 1-5, 2022.