Optimising Phishing Detection: A Comparative Analysis of Machine Learning Methods with Feature Selection
Main Article Content
Abstract
Phishing is an act of cybersecurity attack that tricks people into sharing sensitive data. Due to the inefficiency of the current security technologies, researchers have been paying much attention to employing machine learning methods for phishing detection lately. In our proposed solution, the effectiveness of machine learning techniques with feature selection techniques for phishing detection is investigated. To be specific, Random Forest (RF) and Artificial Neural Network (ANN) are integrated with feature selection techniques, Principal Component Analysis (PCA) and Recursive Feature Elimination (RFE). The goal was to identify and classify the model with the highest accuracy. The experiments were evaluated using a dataset of 4,898 phishing sites and 6,157 legitimate sites, with the phishing data sourced from Kaggle.com. Our experiments demonstrate that the combination of RF model with PCA achieved 95.83% accuracy, while the ANN model with PCA reached 95.07% accuracy. The incorporation of PCA and RFE not only optimised the models' predictive performance but also improved computational efficiency. Overfitting can also be reduced. The experimental results also demonstrate that the proposed ANN with PCA method outperforms the state-of-the-art methods. Consequently, this research highlights the potential of combining advanced feature selection techniques with machine learning algorithms to develop robust solutions for phishing detection. Yet, this undoubtedly contributes to a safer internet environment.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
All articles published in JIWE are licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) License. Readers are allowed to
- Share — copy and redistribute the material in any medium or format under the following conditions:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use;
- NonCommercial — You may not use the material for commercial purposes;
- NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.
References
A. K. Dutta, “Phishing website detection by machine learning techniques,” PloS One, vol. 16, no. 10, p. e0258361, 2021, doi: 10.1371/journal.pone.0258361.
W.-H. Chong, S.-C. Chong, and L.-Y. Chong, “The assistance of eye blink detection for two-factor authentication,” Journal of Informatics and Web Engineering, vol. 2, no. 2, pp. 111–121, 2023, doi: 10.33093/jiwe.2023.2.2.8.
Y. J. Chew, S. Y. Ooi, K. S. Wong, Y. H. Pang, and S. O. Hwang, “Evaluation of black-marker and bilateral classification with J48 decision tree in anomaly-based intrusion detection system,” Journal of Intelligent and Fuzzy Systems, vol. 35, no. 6, pp. 5927–5937, 2018, doi: 10.3233/JIFS-169834.
F. Salahdine, Z. El Mirabet, and N. Kaabouch, “Phishing attacks detection: A machine learning-based approach,” International Journal of Computer Science and Information Technology, vol. 9, no. 3, pp. 1–8, 2022, doi: 10.48550/arXiv.2201.10752.
L. Torrealba Aravena, P. Casas, J. Bustos-Jiménez, G. Capdehourat, and M. Findrik, “Phish Me If You Can—Lexicographic analysis and machine learning for phishing websites detection with PHISHWEB,” in 2023 IEEE 9th International Conference on Network Softwarization (NetSoft), 2023, pp. 1–6, doi: 10.1109/NetSoft57336.2023.10175503.
A. Alswailem, B. Abdullah, and N. Almamary, “Detecting phishing websites using machine learning,” in 2019 2nd International Conference on Computer Applications & Information Security (ICCAIS), 2019, pp. 1–6, doi: 10.1109/CAIS.2019.8769571.
F. Yahya, M. B. Anai, R. I. W. Mahibol, S. A. Frankie, C. K. Ying, R. G. Utomo, and E. L. N. Wei, “Detection of phishing websites using machine learning approaches,” in 2021 International Conference on Data Science and Its Applications (ICoDSA), 2021, doi: 10.1109/ICoDSA53588.2021.9617482.
Z. Fan, “A joint feature selection and integrated learning algorithm for phishing website detection,” in 2021 International Conference on Applied Machine Learning (ICAML), 2021, pp. 1–6, doi: 10.1109/1CAML54311.2021.00018.
S. Ramireddi, T. N. Pandey, and V. A. Woonna, “Classification of phishing websites using machine learning models,” International Journal of Computer Science and Information Technology, vol. 9, no. 3, pp. 1–8, 2023, doi: 10.1109/AISP57993.2023.10134944.
A. Mandalik, R. Sankararajan, V. V. Raveendran, and P. K. Sivakumar, “Phishing website detection using machine learning,” in 2022 IEEE International Conference for Convergence in Technology (ICT), 2022, pp. 1–6, doi: 10.1109/I2CT54291.2022.9824801.
A. Zamir, H. U. Khan, T. Iqbal, N. Yousaf, F. Aslam, A. Anjum, and M. Hamdani, “Phishing website detection using diverse machine learning algorithms,” The Electronic Library, vol. 38, no. 1, pp. 65–80, 2020, doi: 10.1108/EL-05-2019-0118.
S. Alnemari and M. Alshammari, “Detecting phishing domains using machine learning,” Applied Sciences, vol. 13, no. 8, p. 4649, 2023, doi: 10.3390/app13084649.
E. Zhu, Y. Ju, Z. Chen, F. Liu, and X. Fang, “DTOF-ANN: An artificial neural network phishing detection model based on decision tree and optimal features,” Applied Soft Computing, vol. 95, p. 106505, 2020, doi: 10.1016/j.asoc.2020.106505.
S. Akashkr, “Phishing website dataset,” Kaggle, 2023. [Online]. Available: www.kaggle.com/datasets/akashkr/phishing-website-dataset.
X. Yang, L. Yan, B. Yang, and Y.-F. Li, “Phishing website detection using C4.5 decision tree,” in DEStech Transactions on Computer Science and Engineering, 2017, doi: 10.12783/dtcse/itme2017/7975.
Q. Zhang, “Practical thinking on neural network phishing website detection research based on decision tree and optimal feature selection,” in Journal of Physics: Conference Series, vol. 2031, no. 1, p. 012062, 2021, doi: 10.1088/1742-6596/2031/1/012062.
E. A. Wibowo, “Phishing website detection using neural network and PCA based on feature selection,” International Journal of Recent Technology and Engineering (IJRTE), vol. 9, no. 2, pp. 1150–1153, 2020, doi: 10.35940/ijrte.2277-3878.
G. Alshammari, M. Alshammari, T. S. Almurayziq, A. Alshammari, and M. Alsaffar, “Hybrid phishing detection based on automated feature selection using the chaotic dragonfly algorithm,” Electronics, vol. 12, no. 13, p. 2823, 2023, doi: 10.3390/electronics12132823.
F. Salahdine, Z. Elmrabet, and N. Kaabouch, “Phishing attacks detection: A machine learning-based approach,” in 2021 IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), 2021, doi: 10.1109/UEMCON53757.2021.9666627.
P. Ganesh and S. Kalaiarasi, “SVM and random forest algorithm are used well to detect phishing attacks for enhanced accuracy,” Studies in Fuzziness and Soft Computing, vol. 10, no. 1S, pp. 531, 2023, doi: 10.17762/sfs.v10i1S.531.
T. Shahzad and K. Aman, “Unveiling the efficacy of AI-based algorithms in phishing attack detection,” Journal of Informatics and Web Engineering, vol. 3, no. 2, pp. 116–133, 2024, doi: 10.33093/jiwe.2024.3.2.9.