Enhancing Imbalanced Data Augmentation: A Comparative Study of GANified-SMOTE and Latent Factor Integration

Rusma Anieza Ruslan; Nureize Arbaiy; Pei-Chun Lin

doi:10.33093/jiwe.2025.4.3.30

PDF

Published: Oct 14, 2025

DOI: https://doi.org/10.33093/jiwe.2025.4.3.30

Keywords:

Imbalanced Dataset, SMOTE, Generative Adversarial Networks, Latent Factor, Accuracy, Classification

Rusma Anieza Ruslan

Universiti Tun Hussein Onn Malaysia, Malaysia

Nureize Arbaiy

Universiti Tun Hussein Onn Malaysia, Malaysia

Pei-Chun Lin

Feng Chia University, Taiwan

https://orcid.org/0000-0003-0735-2693

Abstract

One such serious problem in machine learning (ML) is imbalanced datasets. Minority class samples are usually sparse but hold significant meaning. The model can become biased toward the majority class due to unbalanced class distribution. This results in fraudulently high accuracy without being able to detect minority cases. This bias is also most perilous in critical applications, where ignoring minority cases can be highly destructive. To overcome this problem, the Synthetic Minority Oversampling Technique (SMOTE) is one of the most widely used. SMOTE creates balanced class distribution by interpolating between existing minority samples. It creates samples that are too close to one another and can lead to overfitting and limit the generalization of the model. Recent advancements in generative modeling, especially Generative Adversarial Networks (GANs), offer a more effective solution to handle class imbalance. GANs utilizes a generative discriminator structure to produce synthetic data highly similar to real data. A hybrid technique called GANified-SMOTE combines the power of SMOTE with the generation power of GANs to produce more diverse and realistic minority class samples. The technique improves the model strength and eliminates the limitations of traditional oversampling. This paper presents the incorporation of latent factors into the architecture of GANified-SMOTE framework. Latent variables reveal hidden structures and relations in the data, leading to a closer synthetic sample and improving classification accuracy. By incorporating latent factors, this research aims to build a better oversampling method for imbalanced classification sets.

How to Cite

Ruslan, R. A., Arbaiy, N., & Lin, P.-C. (2025). Enhancing Imbalanced Data Augmentation: A Comparative Study of GANified-SMOTE and Latent Factor Integration. Journal of Informatics and Web Engineering, 4(3), 483–493. https://doi.org/10.33093/jiwe.2025.4.3.30

Issue

Vol. 4 No. 3 (2025): October 2025

Section

Thematic (AI-Enhanced Computing and Digital Transformation)

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

All articles published in JIWE are licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) License. Readers are allowed to

Share — copy and redistribute the material in any medium or format under the following conditions:
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use;
NonCommercial — You may not use the material for commercial purposes;
NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.

References

Y. F. Zhang, H. L. Lu, H. F. Lin, X. C. Qiao, and H. Zheng, “The optimized anomaly detection models based on an approach of dealing with imbalanced dataset for credit card fraud detection,” Mobile Information Systems, vol. 2022, 2022, doi: 10.1155/2022/8027903.

E. A. Felix, and S. P. Lee, “Systematic literature review of preprocessing techniques for imbalanced data,” IET Software, vol. 13, no. 6, pp. 479–496, 2019, doi: 10.1049/iet-sen.2018.5211.

S. Makki, Z. Assaghir, Y. Taher, R. Haque, M. S. Hacid, and H. Zeineddine, “An experimental study with imbalanced classification approaches for credit card fraud detection,” IEEE Access, vol. 7, pp. 93010–93022, 2019, doi: 10.1109/ACCESS.2019.2927257.

J. K. Paulus, and D. M. Kent, “Predictably unequal: Understanding and addressing concerns that algorithmic clinical prediction may increase health disparities,” NPJ Digital Medicine, vol. 3, no. 1, p. 99, 2020, doi: 10.1038/s41746-020-0290-y.

E. A. Anaam, S. C. Haw, and P. Naveen, “Applied fuzzy and analytic hierarchy process in hybrid recommendation approaches for E-CRM,” International Journal on Informatics Visualization, vol. 6, no. 2, 2022, doi: 10.30630/joiv.6.2-2.1043.

H. Haixiang, Y. Yijing, Z. Shang, G. Mingyun, Y. Yuanyue, and F. Bing, “Learning from class-imbalanced data: Review of methods and applications,” Expert Systems with Applications, vol. 73, pp. 220–239, 2017, doi: 10.1016/j.eswa.2016.12.035.

R. Mohammed, J. Rawashdeh, and M. Abdullah, “Machine learning with oversampling and undersampling techniques: Overview study and experimental results,” in 2020 11th International Conference on Information and Communication Systems (ICICS), pp. 243–248, 2020, doi: 10.1109/ICICS49469.2020.239556.

J. Brandt, and E. Lanzén, “A comparative review of SMOTE and ADASYN in imbalanced data classification,” 2021.

A. Sharma, P. K. Singh, and R. Chandra, “SMOTified-GAN for class imbalanced pattern classification problems,” IEEE Access, vol. 10, pp. 30655–30665, 2022, doi: 10.1109/ACCESS.2022.3152607.

N. S. Rahmi, M. F. M. Fudholi, R. Hidayat, and R. R. Isnanto, “SMOTE classification and random oversampling Naive Bayes in imbalanced data: (Case study of early detection of cervical cancer in Indonesia),” in 2022 IEEE 7th International Conference on Information Technology and Digital Applications (ICITDA), pp. 1–6, 2022, doi: 10.1109/ICITDA56564.2022.10058378.

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems, vol. 27, 2014.

H. Petzka, T. Kronvall, and C. Sminchisescu, “Discriminating against unrealistic interpolations in generative adversarial networks,” arXiv:2203.01035, 2022.

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002, doi: 10.1613/jair.953.

G. A. Pradipta, R. A. R. Hidayat, and S. A. Kusumawardani, “SMOTE for handling imbalanced data problem: A review,” in 2021 Sixth International Conference on Informatics and Computing (ICIC), pp. 1–8, 2021, doi: 10.1109/ICIC54025.2021.9673451.

H. Han, W. Y. Wang, and B. H. Mao, “Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning,” in International Conference on Intelligent Computing, pp. 878–887, Springer, 2005, doi: 10.1007/11538059_91.

C. Bunkhumpornpat, K. Sinapiromsaran, and C. Lursinsap, “Safe-level-SMOTE: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem,” in Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 475–482, Springer, 2009, doi: 10.1007/978-3-642-01307-2_43.

H. M. Nguyen, E. W. Cooper, and K. Kamei, “Borderline over-sampling for imbalanced data classification,” International Journal of Knowledge Engineering and Soft Data Paradigms, vol. 3, no. 1, pp. 4–21, 2011, doi: 10.1504/IJKESDP.2011.039875.

E. Elyan, C. F. Moreno-Garcia, and C. Jayne, “Cdsmote: Class decomposition and synthetic minority class oversampling technique for imbalanced-data classification,” Neural Computing and Applications, vol. 33, no. 7, pp. 2839–2851, 2021, doi: 10.1007/s00521-020-05122-1.

D. Dablain, B. Krawczyk, and N. V. Chawla, “DeepSMOTE: Fusing deep learning and SMOTE for imbalanced data,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 9, pp. 6390–6404, 2022, doi: 10.1109/TNNLS.2021.3074578.

A. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in International Conference on Machine Learning, pp. 214–223, PMLR, 2017.

M. Mirza, and S. Osindero, “Conditional generative adversarial nets,” arXiv:1411.1784, 2014.

F. Ferreira, A. Soares, and P. Cortez, “When two are better than one: Synthesizing heavily unbalanced data,” IEEE Access, vol. 9, pp. 150459–150469, 2021, doi: 10.1109/ACCESS.2021.3125685.

A. Langevin, L. O. Hall, and R. Woods, “Synthetic data augmentation of imbalanced datasets with generative adversarial networks under varying distributional assumptions: A case study in credit card fraud detection,” Journal of the Operational Research Society, 2021, doi: 10.1080/01605682.2021.2006124.

Z. Zhao, R. K. K. Lau, S. Singh, and Y. Wang, “CTAB-GAN: Effective table data synthesizing,” in Proceedings of the Asian Conference on Machine Learning, pp. 97-112. PMLR, 2021.

C. Charitou, S. Dragicevic, and A. D. A. Garcez, “Synthetic data generation for fraud detection using GANs,” arXiv:2109.12546, 2021.

J. Engelmann, and S. Lessmann, “Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning,” Expert Systems with Applications, vol. 174, pp. 114582, 2021, doi: 10.1016/j.eswa.2021.114582.

J. Li, X. Liu, Y. Xu, and Y. Zhang, “Data preprocessing and machine learning modeling for rockburst assessment,” Sustainability, vol. 15, no. 18, pp. 13282, 2023, doi: 10.3390/su151813282.

Janiobachmann, “Credit Fraud || Dealing with Imbalanced Datasets,” Kaggle, Jul. 3, 2019. [Online]. Available: https://www.kaggle.com/code/janiobachmann/credit-fraud-dealing-with-imbalanced-datasets/input

Mragpavank, “PIMA Indians Diabetes Database,” Kaggle, Mar. 24, 2021. [Online]. Available: https://www.kaggle.com/code/mragpavank/pima-indians-diabetes-database/input

T. Wongvorachan, S. He, and O. Bulut, “A comparison of undersampling, oversampling, and SMOTE methods for dealing with imbalanced classification in educational data mining,” Information, vol. 14, no. 1, pp. 54, 2023, doi: 10.3390/info14010054.

Article Sidebar

Main Article Content

Abstract

Article Details

References