Ensemble-SMOTE: Mitigating Class Imbalance in Graduate on Time Detection

Main Article Content

Theng-Jia Law
Choo-Yee Ting
Hu Ng
Hui-Ngo Goh
Albert Quek

Abstract

In education, detecting students graduating on time is difficult due to high data complexity. Researchers have employed various approaches in identifying on-time graduation with Machine Learning, but it remains a challenging task due to the class imbalance in the dataset. This study has aimed to (i) compare various class imbalance treatment methods with different sampling ratios, (ii) propose an ensemble class imbalance treatment method in mitigating the problem of class imbalance, and (iii) develop and evaluate predictive models in identifying the likelihood of students graduating on time during their studies in university. The dataset is collected from 4007 graduates of a university from year 2021 and 2022 with 41 variables. After feature selection, various class imbalance treatment methods were compared with different sampling ratios ranging from 50% to 90%. Moreover, Ensemble-SMOTE is proposed to aggregate the dataset generated by Synthetic Minority Oversampling Technique variants in mitigating the problem of class imbalance effectively. The dataset generated by class imbalance treatment methods were used as the input of the predictive models in detecting on-time graduation. The predictive models were evaluated based on accuracy, precision, recall, F0.5-score, F1-score, F2-score, Area under the Curve, and Area Under the Precision-Recall Curve. Based on the findings, Logistic Regression with Ensemble-SMOTE outperformed other predictive models, and class imbalance treatment methods by achieving the highest average accuracy (87.24), recall (92.50%), F1-score (91.30%), and F2-score (92.02%) from 6th until 10th trimester. To assess the effectiveness of class imbalance treatment methods, Friedman test is performed to determine on significant difference between the models after applying Shapiro-Wilk test in normality test. Consequently, Ensemble-SMOTE is ranked as the top-performers by achieving the lowest value in the average rank based on the performance metrics. Additional research could incorporate and examine more complicated approaches in mitigating class imbalance when the dataset is highly imbalanced.

Article Details

How to Cite
Law, T.-J., Ting, C.-Y., Ng, H., Goh, H.-N., & Quek, A. (2024). Ensemble-SMOTE: Mitigating Class Imbalance in Graduate on Time Detection. Journal of Informatics and Web Engineering, 3(2), 229–250. https://doi.org/10.33093/jiwe.2024.3.2.17
Section
Regular issue

References

K. Anwar, H. Hanafiah, and A. Ebun, “Predicting Student Graduation Using Artificial Neural Network: A Preliminary study of Diploma In Accountancy Program at UiTM Sabah,” 2020.

G. Sidhu, S. Kannan, A. S. Samsul Kamil, and R. Du, “Sustaining Students’ Quality Learning Environment by Reviewing Factors to Graduate-on-Time: A case study,” Environment-Behaviour Proceedings Journal, vol. 8, pp. 127–133, 2023, doi: 10.21834/ebpj.v8i24.4649.

A. Anggrawan, H. Hairani, and C. Satria, “Improving SVM Classification Performance on Unbalanced Student Graduation Time Data Using SMOTE,” International Journal of Information and Education Technology, vol. 13, no. 2, pp. 289–295, 2023.

R. Garcia-Ros, F. Perez-Gonzalez, F. Cavas-Martinez, and J. M. Tomas, “Effects of pre-college variables and first-year engineering students’ experiences on academic achievement and retention: a structural model,” International Journal of Technology and Design Education, vol. 29, pp. 915–928, 2019.

N. Mohammad Suhaimi, S. Abdul-Rahman, S. Mutalib, N. H. Abdul Hamid, and A. Md Ab Malik, “Predictive model of graduate-on-time using machine learning algorithms,” in Soft Computing in Data Science: 5th International Conference, SCDS 2019, Iizuka, Japan, August 28–29, 2019, Proceedings 5, 2019, pp. 130–141.

K. T. Chui, D. C. L. Fung, M. D. Lytras, and T. M. Lam, “Predicting at-risk university students in a virtual learning environment via a machine learning algorithm,” Computers in Human Behavior, vol. 107, p. 105584, 2020, doi: https://doi.org/10.1016/j.chb.2018.06.032.

F. T. Anggraeny, A. K. Darmawan, A. Anekawati, I. Yudhisari, and others, “Early Prediction for Graduation of Private High School Students with Machine Learning Approach,” 2023.

A. Desfiandi and B. Soewito, “Student Graduation Time Prediction using Logistic Regression, Decision Tree, Support Vector Machine, and AdaBoost Ensemble Learning,” International Journal of Information System and Computer Science, vol. 7, no. 3, pp. 195–199, 2023.

J. M. Aiken, R. De Bin, M. Hjorth-Jensen, and M. D. Caballero, “Predicting time to graduation at a large enrollment American university,” Public Library of Science One, vol. 15, no. 11, p. e0242334, 2020.

T. Wongvorachan, S. He, and O. Bulut, “A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining,” Information, vol. 14, no. 1, 2023, doi: 10.3390/info14010054.

D. A. Rachmawati, N. A. Ibadurrahman, J. Zeniarja, and N. Hendriyanto, “Implementation of the Random Forest Algorithm in Classifying the Accuracy of Graduation Time for Computer Engineering Students at Dian Nuswantoro University,” Jurnal Teknik Informatika (Jutif), vol. 4, no. 3, pp. 565–572, 2023, doi: 10.52436/1.jutif.2023.4.3.920.

N. Buniyamin and others, “Mitigating imbalanced classification problems in academic performance with resampling methods/A’zraa Afhzan Ab Rahim and Norlida Buniyamin,” Journal of Electrical and Electronic Systems Research (JEESR), vol. 23, no. 1, pp. 45–56, 2023.

R. Ghorbani and R. Ghousi, “Comparing Different Resampling Methods in Predicting Students’ Performance Using Machine Learning Techniques,” Institute of Electrical and Electronics Engineers Access, vol. 8, pp. 67899–67911, 2020, doi: 10.1109/ACCESS.2020.2986809.

Y. T. Samuel, J. J. Hutapea, and B. Jonathan, “Predicting the Timeliness of Student Graduation Using Decision Tree C4.5 Algorithm in Universitas Advent Indonesia,” in 2019 12th International Conference on Information & Communication Technology and System (ICTS), 2019, pp. 276–280. doi: 10.1109/ICTS.2019.8850948.

M. Ben Said, Y. Hadj Kacem, A. Algarni, and A. Masmoudi, “Early prediction of Student academic performance based on Machine Learning algorithms: A case study of bachelor’s degree students in KSA,” Education and Information Technologies (Dordr), pp. 1–24, 2023.

R. Al-Shabandar, A. J. Hussain, P. Liatsis, and R. Keight, “Detecting At-Risk Students With Early Interventions Using Machine Learning Techniques,” Institute of Electrical and Electronics Engineers Access, vol. 7, pp. 149464–149478, 2019, doi: 10.1109/ACCESS.2019.2943351.

N. Mduma, “Data Balancing Techniques for Predicting Student Dropout Using Machine Learning,” Data (Basel), vol. 8, no. 3, 2023, doi: 10.3390/data8030049.

S. Verma, R. K. Yadav, and K. Kholiya, “A scalable machine learning-based ensemble approach to enhance the prediction accuracy for identifying students at-risk,” International Journal of Advanced Computer Science and Applications, vol. 13, no. 8, 2022.

S. Hutt, M. Gardner, A. L. Duckworth, and S. K. D’Mello, “Evaluating fairness and generalizability in models predicting on-time graduation from college applications.,” International Educational Data Mining Society, 2019.

E. P. Jiang, “Applying a Hybrid Sampling and Boosting Approach to Predict Student Retention,” International Journal of Machine Learning and Computing, vol. 12, no. 5, 2022.

Y. Alshamaila et al., “An automatic prediction of students’ performance to support the university education system: a deep learning approach,” Multimedia Tools and Applications, pp. 1–28, 2024.

A. Gonzalez-Nucamendi, J. Noguez, L. Neri, V. Robledo-Rella, and R. M. G. García-Castelan, “Predictive analytics study to determine undergraduate students at risk of dropout,” Frontiers in Education (Lausanne), vol. 8, 2023, doi: 10.3389/feduc.2023.1244686.

D. K. Dake, C. Buabeng-Andoh, and others, “Using machine learning techniques to predict learner drop-out rate in higher educational institutions,” Mobile Information Systems, vol. 2022, 2022.

V. Flores, S. Heras, and V. Julian, “Comparison of Predictive Models with Balanced Classes Using the SMOTE Method for the Forecast of Student Dropout in Higher Education,” Electronics (Basel), vol. 11, no. 3, 2022, doi: 10.3390/electronics11030457.

G. Pratape, K. Rao Meesala, S. Panda, and P. Goyal, “Predicting Graduation and Dropout Rates: A Machine Learning Approach,” in 2023 International Conference on Advanced Computing & Communication Technologies (ICACCTech), 2023, pp. 603–609. doi: 10.1109/ICACCTech61146.2023.00103.

S. Alwarthan, N. Aslam, and I. U. Khan, “An Explainable Model for Identifying At-Risk Student at Higher Education,” Institute of Electrical and Electronics Engineers Access, vol. 10, pp. 107649–107668, 2022, doi: 10.1109/ACCESS.2022.3211070.

T. A. Khan, R. Sadiq, Z. Shahid, M. M. Alam, and M. B. M. Su'ud, “Sentiment Analysis using Support Vector Machine and Random Forest,” Journal of Informatics and Web Engineering, vol. 3, no. 1, pp. 67–75, 2024.

M. M. Hussain, S. Akbar, S. A. Hassan, M. W. Aziz, and F. Urooj, “Prediction of Student’s Academic Performance through Data Mining Approach,” Journal of Informatics and Web Engineering, vol. 3, no. 1, pp. 241–251, 2024.

Ministry of Education Malaysia (2024) MOE. [Online]. Available: https://www.moe.gov.my/