Social Engineering Threat Analysis Using Large-Scale Synthetic Data

Main Article Content

Sellappan Palaniappan
Rajasvaran Logeswaran
Shapla Khanam
Pulasthi Gunawardhana

Abstract

We frequently hear news about compromised systems, virus attacks, spam emails, stolen bank account numbers, and loss of money. Safeguarding and protecting digital assets against these and other cyber-attacks are extremely important in our digital connected world today. Many organizations spend substantial amounts of money to protect their digital assets. One type of cyber threat that is rampant these days is social engineering attacks that work on human psychology. These attacks typically persuade, convince, trick and threaten naïve and innocent individuals to divulge sensitive information to the attackers. Consequently, traditional approaches have not been effective or successful in preventing these attack types. In this paper, we propose a machine learning model to detect these types of threats. The model is trained using a large synthetic dataset of 10,000 samples to simulate various types of real-world social engineering threats such as phishing, spear phishing, whaling, vishing, smishing, baiting, and pretexting. Our analysis on attack types, patterns, and characteristics revealed interesting insights. Our model achieved an accuracy of 0.8984 and an F1 score of 0.9253, demonstrating its effectiveness in detecting social engineering attacks. The use of synthetic data overcomes the problem of lack of availability of real-world data due to privacy issues, and is demonstrated in this work to be safe, scalable, ethics friendly and effective.

Article Details

How to Cite
Palaniappan, S., Logeswaran , R., Khanam, S., & Gunawardhana, P. (2025). Social Engineering Threat Analysis Using Large-Scale Synthetic Data. Journal of Informatics and Web Engineering, 4(1), 70–80. https://doi.org/10.33093/jiwe.2025.4.1.6
Section
Regular issue

References

K. Krombholz, H. Hobel, M. Huber, and E. Weippl, “Advanced social engineering attacks”, Journal of Information Security and Applications, vol. 22, 2015, pp. 113-122.

H. S. Lallie, L. A. Shepherd, J. R. Nurse, A. Erola, G. Epiphaniou, C. Maple, and X. Bellekens, “Cyber security in the age of COVID-19: A timeline and analysis of cyber-crime and cyber-attacks during the pandemic”, Computers and Security, vol. 105, 2021, p. 102248.

R. Kaur, D. Gabrijelcic, and T. Klobucar, “Artificial intelligence for cybersecurity: Literature review and future research directions”, Information Fusion, vol. 97, no. C, 2023, doi: 10.1016/j.inffus.2023.101804.

S. K. Birthriya, P. Ahlaway, and A. K. Jain, “A comprehensive survey of social engineering attacks: Taxonomy of attacks, prevention, and mitigation strategies”, Journal of Applied Research, 2024, pp. 1–49, doi: 10.1080/19361610.2024.2372986.

A. Ejaz, A. N. Mian, and S. Manzoor, “Life-long phishing attack detection using continual learning”, Scientific Reports, vol. 13, 2023, p. 11488. DOI: 10.1038/s41598-023-37552-9.

P. Kumaraguru, Y. Rhee, A. Acquisti, L. F. Cranor, J. Hong, and E. Nunge, “Protecting people from phishing: the design and evaluation of an embedded training email system”, in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2007, pp. 905-914, doi: 10.1145/1240624.1240760.

E. de Cristofaro, “Synthetic data: Methods, use cases, and risks”. IEEE Security and Privacy, vol. 22, no. 3, 2024, pp. 62–67, doi: 10.1109/MSEC.2024.3371505.

M. Concannon, “AI in social engineering: the next generation of cyber threats”, Ntiva, 2024, https://www.ntiva.com/blog/ai-social-engineering-attacks.

S. Gupta, M. Pritwani, A. Shrivastava, M. Mohana, M., Moharir, and A. Kumar, "A comprehensive analysis of social engineering attacks: from phishing to prevention - Tools, techniques and strategies", in Proceedings of the 2024 Second International Conference on Intelligent Cyber Physical Systems and Internet of Things (ICoICI), 2024, pp. 1-8, doi: 10.1109/ICoICI62503.2024.10696444.

V. Kumar, and D. Sinha, "Synthetic attack data generation model applying generative adversarial network for intrusion detection", Computers and Security, vol. 125, 2023, p. 103054, doi: 10.1016/j.cose.2022.103054.

F. Salahdine, and N. Kaabouch, “Social engineering attacks: A survey”, Future Internet, vol. 11, no. 4, 2019, p. 89.

A. Aleroud, and L. Zhou, “Phishing environments, techniques, and countermeasures: A survey”, Computers and Security, vol. 68, 2017, pp. 160-196.

N. K. Thawait, “Machine learning in cybersecurity: Applications, challenges and future directions”, International Journal of Scientific Research in Computer Science Engineering and Information Technology, vol. 10, no. 3, 2024, pp. 16-27, doi: 10.32628/CSEIT24102125.

F. P. E. Putra, Ubaidi, A. Zulfikri, G. Arifin, and R. M. Ilhamsyah, “Analysis of phishing attack trends, impacts and prevention methods: literature study”, Brilliance: Research of Artificial Intelligence, vol. 4, no. 1, 2024, pp. 413-421, doi: 10.47709/brilliance.v4i1.435.

H. C. Pham, D. D. Pham, L. Brennan, and J. Richardson, “Information security and people: A conundrum for compliance”, Australasian Journal of Information Systems, vol. 21, 2017, doi: 10.3127/ajis.v21i0.1321.

R. Bhakta, and I. G. Harris, “Semantic analysis of dialogs to detect social engineering attacks”, in Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015) , 2015, pp. 424-427, doi: 10.1109/ICOSC.2015.7050843.

Huang, K., Siegel, M., and Madnick, S. (2018). “Systematically understanding the cyber attack business: a survey”. ACM Computing Surveys (CSUR), 51(4), 1-36. DOI: 10.1145/31996.

R. Heartfield, and G. Loukas, “A taxonomy of attacks and a survey of defence mechanisms for semantic social engineering attacks”, ACM Computing Surveys (CSUR), vol. 48, no. 3, 2015, pp. 1-39, doi: /10.1145/2835375.

A. Vishwanath, B. Harrison, and Y. J. Ng, “Suspicion, cognition, and automaticity model of phishing susceptibility”, Communication Research, vol. 45, no. 8, 2018, pp. 1146-1166, doi: 10.1177/0093650215627483.

T. Munusamy, and T. Khodadi, “Building cyber resilience: Key factors for enhancing organizational cyber security”, Journal of Informatics and Web Engineering, vol. 2, no. 2, 2023, pp. 59–71, doi: 10.33093/jiwe.2023.2.2.5.

S. Mushtaq, T. Javed, and M. Mohd Su’ud, “Ensemble learning-powered URL phishing detection: A performance driven approach”, Journal of Informatics and Web Engineering, vol. 3, no. 2, pp. 134–145, doi: 10.33093/jiwe.2024.3.2.10.

Most read articles by the same author(s)