Optimising Phishing Detection: A Comparative Analysis of Machine Learning Methods with Feature Selection

Main Article Content

Mohamad Asraf Daniel
Siew-Chin Chong
Lee-Ying Chong
Kuok-Kwee Wee

Abstract

Phishing is an act of cybersecurity attack that tricks people into sharing sensitive data. Due to the inefficiency of the current security technologies, researchers have been paying much attention to employing machine learning methods for phishing detection lately. In our proposed solution, the effectiveness of machine learning techniques with feature selection techniques for phishing detection is investigated. To be specific, Random Forest (RF) and Artificial Neural Network (ANN) are integrated with feature selection techniques, Principal Component Analysis (PCA) and Recursive Feature Elimination (RFE). The goal was to identify and classify the model with the highest accuracy. The experiments were evaluated using a dataset of 4,898 phishing sites and 6,157 legitimate sites, with the phishing data sourced from Kaggle.com. Our experiments demonstrate that the combination of RF model with PCA achieved 95.83% accuracy, while the ANN model with PCA reached 95.07% accuracy. The incorporation of PCA and RFE not only optimised the models' predictive performance but also improved computational efficiency. Overfitting can also be reduced. The experimental results also demonstrate that the proposed ANN with PCA method outperforms the state-of-the-art methods. Consequently, this research highlights the potential of combining advanced feature selection techniques with machine learning algorithms to develop robust solutions for phishing detection. Yet, this undoubtedly contributes to a safer internet environment.

Article Details

How to Cite
Mohamad Asraf Daniel, Chong, S.-C., Chong, L.-Y., & Wee, K.-K. (2025). Optimising Phishing Detection: A Comparative Analysis of Machine Learning Methods with Feature Selection . Journal of Informatics and Web Engineering, 4(1), 200–212. https://doi.org/10.33093/jiwe.2025.4.1.15
Section
Regular issue

References

A. K. Dutta, “Phishing website detection by machine learning techniques,” PloS One, vol. 16, no. 10, p. e0258361, 2021, doi: 10.1371/journal.pone.0258361.

W.-H. Chong, S.-C. Chong, and L.-Y. Chong, “The assistance of eye blink detection for two-factor authentication,” Journal of Informatics and Web Engineering, vol. 2, no. 2, pp. 111–121, 2023, doi: 10.33093/jiwe.2023.2.2.8.

Y. J. Chew, S. Y. Ooi, K. S. Wong, Y. H. Pang, and S. O. Hwang, “Evaluation of black-marker and bilateral classification with J48 decision tree in anomaly-based intrusion detection system,” Journal of Intelligent and Fuzzy Systems, vol. 35, no. 6, pp. 5927–5937, 2018, doi: 10.3233/JIFS-169834.

F. Salahdine, Z. El Mirabet, and N. Kaabouch, “Phishing attacks detection: A machine learning-based approach,” International Journal of Computer Science and Information Technology, vol. 9, no. 3, pp. 1–8, 2022, doi: 10.48550/arXiv.2201.10752.

L. Torrealba Aravena, P. Casas, J. Bustos-Jiménez, G. Capdehourat, and M. Findrik, “Phish Me If You Can—Lexicographic analysis and machine learning for phishing websites detection with PHISHWEB,” in 2023 IEEE 9th International Conference on Network Softwarization (NetSoft), 2023, pp. 1–6, doi: 10.1109/NetSoft57336.2023.10175503.

A. Alswailem, B. Abdullah, and N. Almamary, “Detecting phishing websites using machine learning,” in 2019 2nd International Conference on Computer Applications & Information Security (ICCAIS), 2019, pp. 1–6, doi: 10.1109/CAIS.2019.8769571.

F. Yahya, M. B. Anai, R. I. W. Mahibol, S. A. Frankie, C. K. Ying, R. G. Utomo, and E. L. N. Wei, “Detection of phishing websites using machine learning approaches,” in 2021 International Conference on Data Science and Its Applications (ICoDSA), 2021, doi: 10.1109/ICoDSA53588.2021.9617482.

Z. Fan, “A joint feature selection and integrated learning algorithm for phishing website detection,” in 2021 International Conference on Applied Machine Learning (ICAML), 2021, pp. 1–6, doi: 10.1109/1CAML54311.2021.00018.

S. Ramireddi, T. N. Pandey, and V. A. Woonna, “Classification of phishing websites using machine learning models,” International Journal of Computer Science and Information Technology, vol. 9, no. 3, pp. 1–8, 2023, doi: 10.1109/AISP57993.2023.10134944.

A. Mandalik, R. Sankararajan, V. V. Raveendran, and P. K. Sivakumar, “Phishing website detection using machine learning,” in 2022 IEEE International Conference for Convergence in Technology (ICT), 2022, pp. 1–6, doi: 10.1109/I2CT54291.2022.9824801.

A. Zamir, H. U. Khan, T. Iqbal, N. Yousaf, F. Aslam, A. Anjum, and M. Hamdani, “Phishing website detection using diverse machine learning algorithms,” The Electronic Library, vol. 38, no. 1, pp. 65–80, 2020, doi: 10.1108/EL-05-2019-0118.

S. Alnemari and M. Alshammari, “Detecting phishing domains using machine learning,” Applied Sciences, vol. 13, no. 8, p. 4649, 2023, doi: 10.3390/app13084649.

E. Zhu, Y. Ju, Z. Chen, F. Liu, and X. Fang, “DTOF-ANN: An artificial neural network phishing detection model based on decision tree and optimal features,” Applied Soft Computing, vol. 95, p. 106505, 2020, doi: 10.1016/j.asoc.2020.106505.

S. Akashkr, “Phishing website dataset,” Kaggle, 2023. [Online]. Available: www.kaggle.com/datasets/akashkr/phishing-website-dataset.

X. Yang, L. Yan, B. Yang, and Y.-F. Li, “Phishing website detection using C4.5 decision tree,” in DEStech Transactions on Computer Science and Engineering, 2017, doi: 10.12783/dtcse/itme2017/7975.

Q. Zhang, “Practical thinking on neural network phishing website detection research based on decision tree and optimal feature selection,” in Journal of Physics: Conference Series, vol. 2031, no. 1, p. 012062, 2021, doi: 10.1088/1742-6596/2031/1/012062.

E. A. Wibowo, “Phishing website detection using neural network and PCA based on feature selection,” International Journal of Recent Technology and Engineering (IJRTE), vol. 9, no. 2, pp. 1150–1153, 2020, doi: 10.35940/ijrte.2277-3878.

G. Alshammari, M. Alshammari, T. S. Almurayziq, A. Alshammari, and M. Alsaffar, “Hybrid phishing detection based on automated feature selection using the chaotic dragonfly algorithm,” Electronics, vol. 12, no. 13, p. 2823, 2023, doi: 10.3390/electronics12132823.

F. Salahdine, Z. Elmrabet, and N. Kaabouch, “Phishing attacks detection: A machine learning-based approach,” in 2021 IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), 2021, doi: 10.1109/UEMCON53757.2021.9666627.

P. Ganesh and S. Kalaiarasi, “SVM and random forest algorithm are used well to detect phishing attacks for enhanced accuracy,” Studies in Fuzziness and Soft Computing, vol. 10, no. 1S, pp. 531, 2023, doi: 10.17762/sfs.v10i1S.531.

T. Shahzad and K. Aman, “Unveiling the efficacy of AI-based algorithms in phishing attack detection,” Journal of Informatics and Web Engineering, vol. 3, no. 2, pp. 116–133, 2024, doi: 10.33093/jiwe.2024.3.2.9.