AI-Powered Threat Hunting for Email Phishing Attack Detection Using Natural Language Processing (NLP)

Main Article Content

Mohd Azriy Akmalhazim Mohd Nazariee
S Prabha Kumaresan
Alaa Haddad
Mohamed Uvaze Ahamed Ayoobkhan

Abstract

Phishing attacks remain as a significant cybersecurity threat, aiming to steal sensitive information by exploiting human vulnerability. Traditional phishing email detection often struggles to keep up with the latest attack strategies developed by the attackers which results in high false positive rates and the limited contextual understanding on the email contents. Therefore, to address these challenges, this research proposes a solution via an AI-powered threat-hunting model integrating Natural Language Processing (NLP) techniques for phishing email detection in English through the development of PhishGuard AI application. The application is developed as a web-based software solution designed to be accessible to both users with and without technical expertise. The model leverages Word2Vec with TF-IDF weighting for feature extraction and uses an XGBoost classifier. A comprehensive testing process using various metrics will evaluate the computational efficiency and effectiveness of the model. The model's robustness and generalisability were rigorously tested using two distinct datasets which are CEAS_08.csv for in-distribution training and SpamAssasin.csv for out-of-distribution evaluation. The primary value of this model lies in its proactive threat-hunting capability, which distinguishes it from reactive systems that rely on known threat examples. The findings derived from the study aim to enhance to the domain of phishing email detection and contributing to the development of a more robust cybersecurity solution that can help in safeguarding both the individuals and organisations safety in our country.

Article Details

How to Cite
Mohd Nazariee, M. A. A., Kumaresan, S. P., Haddad, A., & Ahamed Ayoobkhan, M. U. (2026). AI-Powered Threat Hunting for Email Phishing Attack Detection Using Natural Language Processing (NLP) . Journal of Informatics and Web Engineering, 5(2), 160–183. https://doi.org/10.33093/jiwe.2026.5.2.10
Section
Regular issue

References

P. H. Kyaw, J. Gutierrez, and A. Ghobakhlou, “A systematic review of deep learning techniques for phishing email detection.” Electronics, vol. 13, no. 19, pp. 3823, Sept. 2024, doi: 10.3390/electronics13193823.

V. Malik, V. Rattan, J. Singh, R. Mittal, and U. Tandon, “Performance comparison of data mining classifiers on web log data.” Journal of Computational and Theoretical Nanoscience, vol. 17, no. 11, pp. 5113–16, Nov. 2020, doi:10.1166/jctn.2020.9349.

M. G. Ames, “Hackers, computers, and cooperation: a critical history of logo and constructionist learning.” Proceedings of the ACM on Human-Computer Interaction, vol. 2, no. CSCW, pp. 1–19, Nov. 2018, doi:10.1145/3274287.

H. R. Arabnia, Ed., in Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 1999), Las Vegas, NV, USA: CSREA Press, 1999.

O. A. Lamina, W. A. Ayuba, O. E. Adebiyi, G. E. Michael, O. O. D. Samuel, and K. O. Samuel, “AI-powered phishing detection and prevention.” Path of Science, vol. 10, no. 12, pp. 4001–10, Dec. 2024, doi:10.22178/pos.112-7.

M. Cannice, “Q3 2023 Silicon Valley Venture Capitalist Confidence IndexTM Quarterly Research Report.” SSRN Electronic Journal, 2024, doi:10.2139/ssrn.4686324.

M. Alanezi, “Phishing detection methods: A review.” Technium: Romanian Journal of Applied Sciences and Technology, vol. 3, no. 9, pp. 19–35, Nov. 2021, doi:10.47577/technium.v3i9.4973.

P. Kumar, D. Javeed, A. N. Islam, and X. R. Luo, “DeepSecure: A computational design science approach for interpretable threat hunting in cybersecurity decision making.” Decision Support Systems, vol. 188, pp. 114351, Jan. 2025, doi:10.1016/j.dss.2024.114351.

R. Rahman, and F. F. Abdulloh, “Performance of various naïve bayes using gridsearch approach in phishing email dataset.” Sinkron, vol. 8, no. 4, pp. 2336–44, Oct. 2023, doi:10.33395/sinkron.v8i4.12958.

B. M. Olukoya, G. O. Ogunleye, P. O. Olabisi, and A. S. Adegoke, “Heterogeneous ensemble feature selection: An enhancement approach to machine learning for phishing detection.” International Journal of Software Engineering and Computer Systems, vol. 10, no. 1, pp. 60–74, Oct. 2024, doi:10.15282/ijsecs.10.1.2024.6.0124.

A. Bezerra, I. Pereira, M. A. Rebelo, D. Coelho, D. A. D. Oliveira, J. F. P. Costa, and R. P. Cruz, “A case study on phishing detection with a machine learning net.” International Journal of Data Science and Analytics, vol. 20, no. 3, pp. 2001-20, Sept. 2025, doi:10.1007/s41060-024-00579-w.

D. Narciandi-Rodriguez, J. Aveleira-Mata, M. T. Garcia-Ordas, J. Alfonso-Cendon, C. Benavides, and H. Alaiz-Moreton, “A cybersecurity review in IoT 5G Networks.” Internet of Things, vol. 30, pp. 101478, Mar. 2025, doi:10.1016/j.iot.2024.101478.

D. O. Otieno, A. S. Namin, and K. S. Jones, “The Application of the BERT transformer model for phishing email classification.” in 2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC) [Torino, Italy], pp. 1303–10, 2023, doi:10.1109/COMPSAC57700.2023.00198.

O. Abdelaziz, S. Deb, R. Hodhod, and L. Ray, “A novel phishing email detection algorithm based on multinomial Naive Bayes classifier and natural language processing:” Proceedings of the 1st International Conference on Computing and Emerging Sciences [Erbil, Iraq], pp. 69–73, 2020, doi:10.5220/0010412600690073.

A. Chien, and P. Khethavath, “Email feature classification and analysis of phishing email detection using machine learning techniques.” 2023 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE) [Nadi, Fiji], pp. 1–8, 2023, doi:10.1109/CSDE59766.2023.10487729.

A. Yasin, and A. Abuhasan, “An intelligent classification model for phishing email detection.” International Journal of Network Security & Its Applications, vol. 8, no. 4, pp. 55–72, July 2016, doi:10.5121/ijnsa.2016.8405.

A. Abraham, G. Gressel, and K. Achuthan, “Temporal resilience of phishing detection models in machine learning.” SSRN Electronic Journal, 2019, doi:10.2139/ssrn.3511056.

T. A. Khan, R. Sadiq, Z. Shahid, M. M. Alam, and M. Mohd Su’ud, “Sentiment analysis using support vector machine and random forest.” Journal of Informatics and Web Engineering, vol. 3, no. 1, pp. 67–75, Feb. 2024, doi:10.33093/jiwe.2024.3.1.5.

A. Khan, K. Khan, W. Khan, S. N. Khan, and R. Haq, “Knowledge-based word tokenization system for Urdu.” Journal of Informatics and Web Engineering, vol. 3, no. 2, pp. 86–97, June 2024, doi:10.33093/jiwe.2024.3.2.6.

T. S. Tajamul, and K. Aman, “Unveiling the efficacy of AI-based algorithms in phishing attack detection.” Journal of Informatics and Web Engineering, vol. 3, no. 2, pp. 116–33, June 2024, doi:10.33093/jiwe.2024.3.2.9.

M. A. Daniel, S. C. Chong, L. Y. Chong, and K. K. Wee, “Optimising phishing detection: A comparative analysis of machine learning methods with feature selection.” Journal of Informatics and Web Engineering, vol. 4, no. 1, pp. 200–12, Feb. 2025, doi:10.33093/jiwe.2025.4.1.15.