Social Engineering Threat Analysis Using Large-Scale Synthetic Data

Sellappan Palaniappan; Rajasvaran Logeswaran; Shapla Khanam; Pulasthi Gunawardhana

doi:10.33093/jiwe.2025.4.1.6

PDF

Published: Feb 14, 2025

DOI: https://doi.org/10.33093/jiwe.2025.4.1.6

Keywords:

Social Engineering, Threats, Phishing Attacks, Maching Learning, Synthetic Data

Sellappan Palaniappan

HELP University, Malaysia

https://orcid.org/0009-0009-1168-2864

Rajasvaran Logeswaran

HELP University, Malaysia

Shapla Khanam

HELP University, Malaysia

Pulasthi Gunawardhana

University of Sri Jayewardenepura, Sri Lanka

Abstract

We frequently hear news about compromised systems, virus attacks, spam emails, stolen bank account numbers, and loss of money. Safeguarding and protecting digital assets against these and other cyber-attacks are extremely important in our digital connected world today. Many organizations spend substantial amounts of money to protect their digital assets. One type of cyber threat that is rampant these days is social engineering attacks that work on human psychology. These attacks typically persuade, convince, trick and threaten naïve and innocent individuals to divulge sensitive information to the attackers. Consequently, traditional approaches have not been effective or successful in preventing these attack types. In this paper, we propose a machine learning model to detect these types of threats. The model is trained using a large synthetic dataset of 10,000 samples to simulate various types of real-world social engineering threats such as phishing, spear phishing, whaling, vishing, smishing, baiting, and pretexting. Our analysis on attack types, patterns, and characteristics revealed interesting insights. Our model achieved an accuracy of 0.8984 and an F1 score of 0.9253, demonstrating its effectiveness in detecting social engineering attacks. The use of synthetic data overcomes the problem of lack of availability of real-world data due to privacy issues, and is demonstrated in this work to be safe, scalable, ethics friendly and effective.

How to Cite

Palaniappan, S., Logeswaran , R., Khanam, S., & Gunawardhana, P. (2025). Social Engineering Threat Analysis Using Large-Scale Synthetic Data. Journal of Informatics and Web Engineering, 4(1), 70–80. https://doi.org/10.33093/jiwe.2025.4.1.6

Issue

Vol. 4 No. 1 (2025): February 2025

Section

Regular issue

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

All articles published in JIWE are licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) License. Readers are allowed to

Share — copy and redistribute the material in any medium or format under the following conditions:
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use;
NonCommercial — You may not use the material for commercial purposes;
NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.

References

K. Krombholz, H. Hobel, M. Huber, and E. Weippl, “Advanced social engineering attacks”, Journal of Information Security and Applications, vol. 22, 2015, pp. 113-122.

H. S. Lallie, L. A. Shepherd, J. R. Nurse, A. Erola, G. Epiphaniou, C. Maple, and X. Bellekens, “Cyber security in the age of COVID-19: A timeline and analysis of cyber-crime and cyber-attacks during the pandemic”, Computers and Security, vol. 105, 2021, p. 102248.

R. Kaur, D. Gabrijelcic, and T. Klobucar, “Artificial intelligence for cybersecurity: Literature review and future research directions”, Information Fusion, vol. 97, no. C, 2023, doi: 10.1016/j.inffus.2023.101804.

S. K. Birthriya, P. Ahlaway, and A. K. Jain, “A comprehensive survey of social engineering attacks: Taxonomy of attacks, prevention, and mitigation strategies”, Journal of Applied Research, 2024, pp. 1–49, doi: 10.1080/19361610.2024.2372986.

A. Ejaz, A. N. Mian, and S. Manzoor, “Life-long phishing attack detection using continual learning”, Scientific Reports, vol. 13, 2023, p. 11488. DOI: 10.1038/s41598-023-37552-9.

P. Kumaraguru, Y. Rhee, A. Acquisti, L. F. Cranor, J. Hong, and E. Nunge, “Protecting people from phishing: the design and evaluation of an embedded training email system”, in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2007, pp. 905-914, doi: 10.1145/1240624.1240760.

E. de Cristofaro, “Synthetic data: Methods, use cases, and risks”. IEEE Security and Privacy, vol. 22, no. 3, 2024, pp. 62–67, doi: 10.1109/MSEC.2024.3371505.

M. Concannon, “AI in social engineering: the next generation of cyber threats”, Ntiva, 2024, https://www.ntiva.com/blog/ai-social-engineering-attacks.

S. Gupta, M. Pritwani, A. Shrivastava, M. Mohana, M., Moharir, and A. Kumar, "A comprehensive analysis of social engineering attacks: from phishing to prevention - Tools, techniques and strategies", in Proceedings of the 2024 Second International Conference on Intelligent Cyber Physical Systems and Internet of Things (ICoICI), 2024, pp. 1-8, doi: 10.1109/ICoICI62503.2024.10696444.

V. Kumar, and D. Sinha, "Synthetic attack data generation model applying generative adversarial network for intrusion detection", Computers and Security, vol. 125, 2023, p. 103054, doi: 10.1016/j.cose.2022.103054.

F. Salahdine, and N. Kaabouch, “Social engineering attacks: A survey”, Future Internet, vol. 11, no. 4, 2019, p. 89.

A. Aleroud, and L. Zhou, “Phishing environments, techniques, and countermeasures: A survey”, Computers and Security, vol. 68, 2017, pp. 160-196.

N. K. Thawait, “Machine learning in cybersecurity: Applications, challenges and future directions”, International Journal of Scientific Research in Computer Science Engineering and Information Technology, vol. 10, no. 3, 2024, pp. 16-27, doi: 10.32628/CSEIT24102125.

F. P. E. Putra, Ubaidi, A. Zulfikri, G. Arifin, and R. M. Ilhamsyah, “Analysis of phishing attack trends, impacts and prevention methods: literature study”, Brilliance: Research of Artificial Intelligence, vol. 4, no. 1, 2024, pp. 413-421, doi: 10.47709/brilliance.v4i1.435.

H. C. Pham, D. D. Pham, L. Brennan, and J. Richardson, “Information security and people: A conundrum for compliance”, Australasian Journal of Information Systems, vol. 21, 2017, doi: 10.3127/ajis.v21i0.1321.

R. Bhakta, and I. G. Harris, “Semantic analysis of dialogs to detect social engineering attacks”, in Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015) , 2015, pp. 424-427, doi: 10.1109/ICOSC.2015.7050843.

Huang, K., Siegel, M., and Madnick, S. (2018). “Systematically understanding the cyber attack business: a survey”. ACM Computing Surveys (CSUR), 51(4), 1-36. DOI: 10.1145/31996.

R. Heartfield, and G. Loukas, “A taxonomy of attacks and a survey of defence mechanisms for semantic social engineering attacks”, ACM Computing Surveys (CSUR), vol. 48, no. 3, 2015, pp. 1-39, doi: /10.1145/2835375.

A. Vishwanath, B. Harrison, and Y. J. Ng, “Suspicion, cognition, and automaticity model of phishing susceptibility”, Communication Research, vol. 45, no. 8, 2018, pp. 1146-1166, doi: 10.1177/0093650215627483.

T. Munusamy, and T. Khodadi, “Building cyber resilience: Key factors for enhancing organizational cyber security”, Journal of Informatics and Web Engineering, vol. 2, no. 2, 2023, pp. 59–71, doi: 10.33093/jiwe.2023.2.2.5.

S. Mushtaq, T. Javed, and M. Mohd Su’ud, “Ensemble learning-powered URL phishing detection: A performance driven approach”, Journal of Informatics and Web Engineering, vol. 3, no. 2, pp. 134–145, doi: 10.33093/jiwe.2024.3.2.10.

Article Sidebar

Main Article Content

Abstract

Article Details

References

Most read articles by the same author(s)