Synthetic Data Generation for Healthcare and Wellness: Methods, Applications, and Future Directions

Main Article Content

Sellappan Palaniappan
Kasthuri Subaramaniam
Oras Baker
Bui Ngoc Dung
Sumit Dhariwal

Abstract

Artificial intelligence in healthcare relies heavily upon the availability of high-quality datasets, but very rigid privacy regimes, institutional silos, ethical issues, and heterogeneous data still serve to limit the availability of real-world clinical data. To combat these limitations, you focus on synthetic data generation as a privacy-preserving and scalable alternative to traditional data generation for healthcare and wellness studies. More relevant is a work on a broad and clear framework of synthesis methods based on statistical modelling, rule-based generation and domain-specific clinical logic, in contrast to recent studies that mainly target sophisticated methods for the deep learning architectures. The paper reviews the major synthetic data generation approaches like statistical distributions, machine learning techniques, Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), diffusion models, and hybrid methods. Furthermore, the paper demonstrates how physiological constraints, time structures, prevalence modelling, and clinically meaningful associations could be incorporated into synthetic datasets in a range of healthcare fields including clinical risk prediction, wearable analytics, mental health text generation, genomics, epidemiological modeling and medical imaging. To ensure reproducibility and accessibility for both research and practice, practical Python-based examples along with domain-aware probabilistic models are provided. Further, it discusses evaluation approaches for evaluating statistical accuracy, downstream utility, and privacy in the process, and stresses their trade-offs on realism, technical efficiency, and disclosure potential. Future avenues for research are also presented, (e.g., digital twins, multimodal patient simulation, long-term disease progression modelling, differentially private generative systems). This work would be a theoretical basis as well as a practical guide for researchers, clinicians, and educators striving to create safe, transparent, and trustworthy synthetic healthcare data for advance of healthcare artificial intelligence.

Article Details

How to Cite
Palaniappan, S., Subaramaniam, K., Baker, O., Ngoc Dung, B., & Dhariwal, S. (2026). Synthetic Data Generation for Healthcare and Wellness: Methods, Applications, and Future Directions. Journal of Informatics and Web Engineering, 5(2), 343–360. https://doi.org/10.33093/jiwe.2026.5.2.21
Section
(Thematic) AI in Health and Wellness

References

V. C. Pezoulas, D. I. Zaridis, E. Mylona, C. Androutsos, K. Apostolidis, N. S. Tachos, and D. I. Fotiadis, “Synthetic data generation methods in healthcare: A review on open-source tools and methods,” Computational and Structural Biotechnology Journal, vol. 23, pp. 2892–2910, 2024, doi: 10.1016/j.csbj.2024.07.005.

H. A. Ahmed, J. A. Nepomuceno, B. Vega-Marquez, and I. A. Nepomuceno-Chamorro, “Synthetic data generation for healthcare: Exploring generative adversarial networks variants for medical tabular data,” International Journal of Data Science and Analytics, pp. 1–16, 2025, doi: 10.1007/s41060-025-00816-w.

M. Miletic, and M. Sariyar, “Synthetic data generation methods for longitudinal and time series health data,” Studies in health technology and informatics, vol. 328, pp. 367–371, 2025, doi:10.3233/SHTI250740.

S. Capuozzo, “Datasets: A trustworthy approach to generate guided synthetic biomedical image samples,” in Image Analysis and Processing – ICIAP 2025 Workshops: 23rd International Conference, Rome, Italy, September 15–19, 2025, Proceedings, Part II, Springer Nature, pp. 400, 2026, doi: 10.1007/978-3-032-11381-8_34.

M. Goyal, and Q. H. Mahmoud, “A systematic review of synthetic data generation techniques using generative AI,” Electronics, vol. 13, no. 17, Art. no. 3509, 2024, doi: 10.3390/electronics13173509.

M. Rujas, R. M. Gomez del Moral Herranz, G. Fico, and B. Merino-Barbancho, “Synthetic data generation in healthcare: A scoping review of reviews on domains, motivations, and future applications,” International Journal of Medical Informatics, vol. 195, Art. no. 105763, 2025, doi: 10.1016/j.ijmedinf.2024.105763.

H. Murtaza, M. Ahmed, N. F. Khan, G. Murtaza, S. Zafar, and A. Bano, “Synthetic data generation: State of the art in health care domain,” Computer Science Review, vol. 48, Art. no. 100546, 2023, doi: 10.1016/j.cosrev.2023.100546.

M. Hernandez, G. Epelde, A. Alberdi, R. Cilla, and D. Rankin, “Synthetic data generation for tabular health records: A systematic review,” Neurocomputing, vol. 493, pp. 28–45, 2022, doi: 10.1016/j.neucom.2022.04.053.

B. van Breugel, T. Liu, D. Oglic, and M. van der Schaar, “Synthetic data in biomedicine via generative artificial intelligence,” Nature Reviews Bioengineering, vol. 2, no. 12, pp. 991–1004, 2024, doi: 10.1038/s44222-024-00245-7.

A. Jadon, and S. Kumar, “Leveraging generative AI models for synthetic data generation in healthcare: Balancing research and privacy,” in Proceeding of 2023 International Conference on Smart Applications, Communications and Networking (SmartNets), IEEE, pp. 1–4, 2023, doi: 10.1109/SmartNets58706.2023.10215825.

A. Gonzales, G. Guruswamy, and S. R. Smith, “Synthetic data in healthcare: A narrative review,” PLOS Digital Health, vol. 2, Art. no. e0000082, 2023, doi: 10.1371/journal.pdig.0000082.

J.-F. Rajotte, R. Bergen, D. L. Buckeridge, K. El Emam, R. Ng, and E. Strome, “Synthetic data as an enabler for machine learning applications in medicine,” iScience, vol. 25, no. 11, 2022, doi: 10.1016/j.isci.2022.105331.

M. Z. Uddin, Machine Learning and Python for Human Behavior, Emotion, and Health Status Analysis. CRC Press, 2024, doi: 10.1201/9781003425908.

Y. Zhang et. al., “GAN-based one dimensional medical data augmentation,” Soft Computing – A Fusion of Foundations, Methodologies & Applications, vol. 27, no. 15, 2023, doi: 10.1007/s00500-023-08345-z.

O. Mazumder, R. Banerjee, D. Roy, S. Bhattacharya, A. Ghose, and A. Sinha, “Synthetic PPG signal generation to improve coronary artery disease classification: Study with physical model of cardiovascular system,” IEEE Journal of Biomedical and Health Informatics, vol. 26, no. 5, pp. 2136–2146, 2022, doi: 10.1109/JBHI.2022.3147383.

T. Das, Z. Wang, and J. Sun, “Twin: Personalized clinical trial digital twin generation,” in Proceeding 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 402–413, 2023, doi: 10.1145/3580305.3599534.

O. Baker, Z. Ziran, M. Mecella, and K. Subaramaniam, “AI-driven melanoma detection in New Zealand: A ResNet50-based approach,” Informatics in Medicine Unlocked, vol. 58, Art. no. 101697, 2025, doi: doi.org/10.1016/j.imu.2025.101697

O. Baker, Z. Ziran, M. Mecella, K. Subaramaniam, and S. Palaniappan, “Predictive modeling for pandemic forecasting: A COVID-19 study in New Zealand and partner countries,” International Journal of Environmental Research and Public Health, vol. 22, no. 4, Art. no. 562, 2025, doi: 10.3390/ijerph22040562.

S. Palaniappan, R. Logeswaran, K. Subaramaniam, O. Baker, and B. N. Dung, “Training the brain: A machine learning approach to predicting wellbeing through intentional thought pattern modification,” Journal of Informatics and Web Engineering, vol. 4, no. 3, pp. 64–89, 2025, doi: 10.33093/jiwe.2025.4.3.4.

O. Baker, K. Subaramaniam, A. S. Shibghatullah, Z. A. Shaffiei, and A. S. S. Amir Hamzah, “A collaborative framework for disease prediction using machine learning,” in 2025 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET), Kota Kinabalu, Malaysia, pp. 650–654, 2025, doi: 10.1109/IICAIET67254.2025.11265309.

C. Lu, C. K. Reddy, P. Wang, D. Nie, and Y. Ning, “Multi-label clinical time-series generation via conditional GAN,” IEEE Transactions on Knowledge and Data Engineering, vol. 36, no. 4, pp. 1728–1740, 2023, doi: 10.1109/TKDE.2023.3310909.

R. Osuala et al., “medigan: A Python library of pretrained generative models for medical image synthesis,” Journal of Medical Imaging, vol. 10, no. 6, pp. 061403-1–061403-11, 2023, doi: 10.1117/1.JMI.10.6.061403.

T. N. Arvanitis, S. White, S. Harrison, R. Chaplin, and G. Despotou, “A method for machine learning generation of realistic synthetic datasets for validating healthcare applications,” Health Informatics Journal, vol. 28, no. 2, Art. no. 14604582221077000, 2022, doi: 10.1177/14604582221077000.

S. Dey, P. Basuchowdhuri, D. Mitra, R. Augustine, S. K. Saha, and T. Chakraborti, “Blimsr: Blind degradation modelling for generating high-resolution medical images,” in Annual Conference on Medical Image Understanding and Analysis, Cham: Springer Nature Switzerland, pp. 64–78, 2023, doi: 10.1007/978-3-031-48593-0_5.

J. Shi, D. Wang, G. Tesei, and B. Norgeot, “Generating high-fidelity privacy-conscious synthetic patient data for causal effect estimation with multiple treatments,” Frontiers in Artificial Intelligence, vol. 5, Art. no. 918813, 2022, doi: 10.3389/frai.2022.918813.

P. Singhal and M. Singh, “Robust medical image prediction via adaptive reconstruction: bridging the gap in low-quality data”, Journal of Informatics and Web Engineering, vol. 5, no. 1, pp. 1–17, Feb. 2026, doi: 10.33093/jiwe.2026.5.1.1.

Most read articles by the same author(s)

1 2 > >>