AI-Powered Threat Hunting for Email Phishing Attack Detection Using Natural Language Processing (NLP)

Mohd Azriy Akmalhazim Mohd Nazariee; S Prabha Kumaresan; Alaa Haddad; Mohamed Uvaze Ahamed Ayoobkhan

doi:10.33093/jiwe.2026.5.2.10

PDF

Published: 14 June 2026

DOI: https://doi.org/10.33093/jiwe.2026.5.2.10

Keywords:

Phishing, Natural Language Processing, Machine Learning, Artificial Intelligence, Application

Mohd Azriy Akmalhazim Mohd Nazariee

Multimedia University, Malaysia

S Prabha Kumaresan

Multimedia University, Malaysia

https://orcid.org/0000-0002-0969-7428

Alaa Haddad

Multimedia University, Malaysia

Mohamed Uvaze Ahamed Ayoobkhan

American University of technology, Uzbekistan

https://orcid.org/0000-0001-9120-4516

Abstract

Phishing attacks remain as a significant cybersecurity threat, aiming to steal sensitive information by exploiting human vulnerability. Traditional phishing email detection often struggles to keep up with the latest attack strategies developed by the attackers which results in high false positive rates and the limited contextual understanding on the email contents. Therefore, to address these challenges, this research proposes a solution via an AI-powered threat-hunting model integrating Natural Language Processing (NLP) techniques for phishing email detection in English through the development of PhishGuard AI application. The application is developed as a web-based software solution designed to be accessible to both users with and without technical expertise. The model leverages Word2Vec with TF-IDF weighting for feature extraction and uses an XGBoost classifier. A comprehensive testing process using various metrics will evaluate the computational efficiency and effectiveness of the model. The model's robustness and generalisability were rigorously tested using two distinct datasets which are CEAS_08.csv for in-distribution training and SpamAssasin.csv for out-of-distribution evaluation. The primary value of this model lies in its proactive threat-hunting capability, which distinguishes it from reactive systems that rely on known threat examples. The findings derived from the study aim to enhance to the domain of phishing email detection and contributing to the development of a more robust cybersecurity solution that can help in safeguarding both the individuals and organisations safety in our country.

How to Cite

Mohd Nazariee, M. A. A., Kumaresan, S. P., Haddad, A., & Ahamed Ayoobkhan, M. U. (2026). AI-Powered Threat Hunting for Email Phishing Attack Detection Using Natural Language Processing (NLP) . Journal of Informatics and Web Engineering, 5(2), 160–183. https://doi.org/10.33093/jiwe.2026.5.2.10

Issue

Vol. 5 No. 2 (2026): June 2026

Section

Regular issue

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

All articles published in JIWE are licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) License. Readers are allowed to

Share — copy and redistribute the material in any medium or format under the following conditions:
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use;
NonCommercial — You may not use the material for commercial purposes;
NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.

References

P. H. Kyaw, J. Gutierrez, and A. Ghobakhlou, “A systematic review of deep learning techniques for phishing email detection.” Electronics, vol. 13, no. 19, pp. 3823, Sept. 2024, doi: 10.3390/electronics13193823.

V. Malik, V. Rattan, J. Singh, R. Mittal, and U. Tandon, “Performance comparison of data mining classifiers on web log data.” Journal of Computational and Theoretical Nanoscience, vol. 17, no. 11, pp. 5113–16, Nov. 2020, doi:10.1166/jctn.2020.9349.

M. G. Ames, “Hackers, computers, and cooperation: a critical history of logo and constructionist learning.” Proceedings of the ACM on Human-Computer Interaction, vol. 2, no. CSCW, pp. 1–19, Nov. 2018, doi:10.1145/3274287.

H. R. Arabnia, Ed., in Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 1999), Las Vegas, NV, USA: CSREA Press, 1999.

O. A. Lamina, W. A. Ayuba, O. E. Adebiyi, G. E. Michael, O. O. D. Samuel, and K. O. Samuel, “AI-powered phishing detection and prevention.” Path of Science, vol. 10, no. 12, pp. 4001–10, Dec. 2024, doi:10.22178/pos.112-7.

M. Cannice, “Q3 2023 Silicon Valley Venture Capitalist Confidence IndexTM Quarterly Research Report.” SSRN Electronic Journal, 2024, doi:10.2139/ssrn.4686324.

M. Alanezi, “Phishing detection methods: A review.” Technium: Romanian Journal of Applied Sciences and Technology, vol. 3, no. 9, pp. 19–35, Nov. 2021, doi:10.47577/technium.v3i9.4973.

P. Kumar, D. Javeed, A. N. Islam, and X. R. Luo, “DeepSecure: A computational design science approach for interpretable threat hunting in cybersecurity decision making.” Decision Support Systems, vol. 188, pp. 114351, Jan. 2025, doi:10.1016/j.dss.2024.114351.

R. Rahman, and F. F. Abdulloh, “Performance of various naïve bayes using gridsearch approach in phishing email dataset.” Sinkron, vol. 8, no. 4, pp. 2336–44, Oct. 2023, doi:10.33395/sinkron.v8i4.12958.

B. M. Olukoya, G. O. Ogunleye, P. O. Olabisi, and A. S. Adegoke, “Heterogeneous ensemble feature selection: An enhancement approach to machine learning for phishing detection.” International Journal of Software Engineering and Computer Systems, vol. 10, no. 1, pp. 60–74, Oct. 2024, doi:10.15282/ijsecs.10.1.2024.6.0124.

A. Bezerra, I. Pereira, M. A. Rebelo, D. Coelho, D. A. D. Oliveira, J. F. P. Costa, and R. P. Cruz, “A case study on phishing detection with a machine learning net.” International Journal of Data Science and Analytics, vol. 20, no. 3, pp. 2001-20, Sept. 2025, doi:10.1007/s41060-024-00579-w.

D. Narciandi-Rodriguez, J. Aveleira-Mata, M. T. Garcia-Ordas, J. Alfonso-Cendon, C. Benavides, and H. Alaiz-Moreton, “A cybersecurity review in IoT 5G Networks.” Internet of Things, vol. 30, pp. 101478, Mar. 2025, doi:10.1016/j.iot.2024.101478.

D. O. Otieno, A. S. Namin, and K. S. Jones, “The Application of the BERT transformer model for phishing email classification.” in 2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC) [Torino, Italy], pp. 1303–10, 2023, doi:10.1109/COMPSAC57700.2023.00198.

O. Abdelaziz, S. Deb, R. Hodhod, and L. Ray, “A novel phishing email detection algorithm based on multinomial Naive Bayes classifier and natural language processing:” Proceedings of the 1st International Conference on Computing and Emerging Sciences [Erbil, Iraq], pp. 69–73, 2020, doi:10.5220/0010412600690073.

A. Chien, and P. Khethavath, “Email feature classification and analysis of phishing email detection using machine learning techniques.” 2023 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE) [Nadi, Fiji], pp. 1–8, 2023, doi:10.1109/CSDE59766.2023.10487729.

A. Yasin, and A. Abuhasan, “An intelligent classification model for phishing email detection.” International Journal of Network Security & Its Applications, vol. 8, no. 4, pp. 55–72, July 2016, doi:10.5121/ijnsa.2016.8405.

A. Abraham, G. Gressel, and K. Achuthan, “Temporal resilience of phishing detection models in machine learning.” SSRN Electronic Journal, 2019, doi:10.2139/ssrn.3511056.

T. A. Khan, R. Sadiq, Z. Shahid, M. M. Alam, and M. Mohd Su’ud, “Sentiment analysis using support vector machine and random forest.” Journal of Informatics and Web Engineering, vol. 3, no. 1, pp. 67–75, Feb. 2024, doi:10.33093/jiwe.2024.3.1.5.

A. Khan, K. Khan, W. Khan, S. N. Khan, and R. Haq, “Knowledge-based word tokenization system for Urdu.” Journal of Informatics and Web Engineering, vol. 3, no. 2, pp. 86–97, June 2024, doi:10.33093/jiwe.2024.3.2.6.

T. S. Tajamul, and K. Aman, “Unveiling the efficacy of AI-based algorithms in phishing attack detection.” Journal of Informatics and Web Engineering, vol. 3, no. 2, pp. 116–33, June 2024, doi:10.33093/jiwe.2024.3.2.9.

M. A. Daniel, S. C. Chong, L. Y. Chong, and K. K. Wee, “Optimising phishing detection: A comparative analysis of machine learning methods with feature selection.” Journal of Informatics and Web Engineering, vol. 4, no. 1, pp. 200–12, Feb. 2025, doi:10.33093/jiwe.2025.4.1.15.

Article Sidebar

Main Article Content

Abstract

Article Details

References

Most read articles by the same author(s)