Ensemble Learning-Powered URL Phishing Detection: A Performance Driven Approach

Main Article Content

Shougfta Mushtaq
Tabassum Javed
Mazliham Mohd Su’ud

Abstract

With the rapid growth in the usage of the Internet, criminals have found new ways to engage in cyber-attacks. The most common and widespread attack is URL phishing. The proposed system focuses on improving phishing website detection using feature selection and ensemble learning. This model uses two datasets, DS-30 and DS-50, each with 30 and 50 features. Ensemble learning using a voting classifier was then applied to train the model, achieving more accuracy. The combination of HEFS with random forest distribution achieved 94.6% accuracy while minimizing the number of features used (20.8% of the base feature set). The classifier works in the proposed model, and the accuracy is 96% and 98% on the DS-30 and DS-50 datasets, respectively. The hybrid model uses a combination of different factors to distinguish phishing websites from legitimate websites.

Article Details

How to Cite
Mushtaq, S., Javed, T., & Mohd Su’ud, M. (2024). Ensemble Learning-Powered URL Phishing Detection: A Performance Driven Approach. Journal of Informatics and Web Engineering, 3(2), 134–145. https://doi.org/10.33093/jiwe.2024.3.2.10
Section
Regular issue

References

K. M. Pratt, “What is a Cyber Attack? Definition, Examples and Prevention TechTarget,” TechTarget. 2022. [Online]. Available: https://www.techtarget.com/searchsecurity/definition/cyber-attack

A. A. Alsufyani and S. M. Alzahrani, “Social engineering attack detection using machine learning: Text phishing attack,” Indian J. Comput. Sci. Eng., vol. 12, no. 3, pp. 743–751, 2021, doi: 10.21817/indjcse/2021/v12i3/211203298.

D. He, X. Lv, S. Zhu, S. Chan, and K.-K. R. Choo, “A Method for Detecting Phishing Websites Based on Tiny-Bert Stacking,” IEEE Internet Things J., vol. PP, p. 1, 2023, doi: 10.1109/JIOT.2023.3292171.

M. A. Musarat Hussain, Chi Cheng, Rui Xu, “CNN-Fusion: An effective and lightweight phishing detection method based on multi-variant ConvNet,” Inf. Sci. (Ny)., vol. 631, no. July 2022, pp. 328–345, 2023, doi: 10.1016/j.ins.2023.02.039.

K. S. N. Sushma, M. Jayalakshmi, and T. Guha, “Deep Learning for Phishing Website Detection,” MysuruCon 2022 - 2022 IEEE 2nd Mysore Sub Sect. Int. Conf., pp. 1–6, 2022, doi: 10.1109/MysuruCon55714.2022.9972621.

Pavansai and G. G. sai Ziaul Haque Choudhury, “Classification of Phishing Website Using Hybrid Machine Learning Techniques,” vol. 8, no. 7, pp. 1385–1390, 2023.

H. Abusaimeh and Y. Alshareef, “Detecting the Phishing Website with the Highest Accuracy,” TEM J., vol. 10, no. 2, pp. 947–953, 2021, doi: 10.18421/TEM102-58.

G.Ravi Kumar, Dr.S.Gunasekaran and Nivetha.R, “Url Phishing Data Analysis and Detecting Phishing Attacks Using Machine Learning in Nlp,” Int. J. Eng. Appl. Sci. Technol., vol. 3, no. 10, pp. 26–31, 2019, doi: 10.33564/ijeast.2019.v03i10.007.

S. Dangwa and A.-N. M. School, “Feature Selection for Machine Learning-based Phishing Websites Detection,” 2021 Int. Conf. Cyber Situational Awareness, Data Anal. Assessment, CyberSA 2021, pp. 1–6, 2021, doi: 10.1109/CyberSA52016.2021.9478242.

K. L. Chiew and W. K. T. Choon Lin Tan , KokSheik Wong , Kelvin S.C. Yong, “A new hybrid ensemble feature selection framework for machine learning-based phishing detection system,” Inf. Sci. (Ny)., vol. 484, pp. 153–166, 2019, doi: 10.1016/j.ins.2019.01.064.

A. Taha, “Intelligent ensemble learning approach for phishing website detection based on weighted soft voting,” Mathematics, vol. 9, no. 21, 2021, doi: 10.3390/math9212799.

P. Bountakas and C. Xenakis, “HELPHED: Hybrid Ensemble Learning PHishing Email Detection,” J. Netw. Comput. Appl., vol. 210, Jan. 2023, doi: 10.1016/j.jnca.2022.103545.

B. Maini, A., Kakwani, N., B, R., M K, S., & R, “Improving the Performance of Semantic-Based Phishing Detection System Through Ensemble Learning Method,” 2021 IEEE Mysore Sub Sect. Int. Conf. MysuruCon 2021, pp. 463–469, 2021, doi: 10.1109/MysuruCon52639.2021.9641614.

M. Alsaedi, M., Ghaleb, F. A., Saeed, F., Ahmad, J., & Alasli, “Model Using Ensemble Learning,” Sensors, pp. 1–20, 2022.

R. Singh, S., Singh, M. P., & Pandey, “2nd Phishing Detection from URLs Using Deep Learning.”

K. A. Galego Hernandes Jr., P. R., Floret, C. P., Cardozo de Almeida, K. F., Camargo da Silva, V., Papa, J. P., & Pontara da Costa, “Phishing Detection Using URL-based XAI Techniques,” in 2021 IEEE Symposium Series on Computational Intelligence, SSCI 2021 - Proceedings, Institute of Electrical and Electronics Engineers Inc., 2021. doi: 10.1109/SSCI50451.2021.9659981.

M. D. K. Hasane Ahammad, S. K., Kale, S. D., Upadhye, G. D., Pande, S. D., Venkatesh Babu, E., Dhumane, A. V., & Jang Bahadur, “Phishing URL detection using machine learning methods,” Adv. Eng. Softw., vol. 173, no. January, p. 103288, 2022, doi: 10.1016/j.advengsoft.2022.103288.

S. R. K. Karim, A., Shahroz, M., Mustofa, K., Belhaouari, S. B., & Joga, “2nd Phishing Detection System Through Hybrid.” IEEE, pp. 36805–36822, 2023. doi: 10.1109/ACCESS.2023.3252366.

V. Sanchez-Paniagua, M., Fidalgo Fernandez, E., Alegre, E., Al-Nabki, W., & Gonzalez-Castro, “Phishing URL Detection: A Real-Case Scenario Through Login URLs,” IEEE Access, vol. 10, pp. 42949–42960, 2022, doi: 10.1109/ACCESS.2022.3168681.

K. M. and S. A.-H. B. M. Abutaha, M. Ababneh, “2nd URL Phishing Detection using Machine Learning,” in URL Phishing Detection using Machine Learning Techniques based on URLs Lexical Analysis, Spain, 2021, pp. 147–152. doi: doi: 10.1109/ICICS52457.2021.9464539.

C. Qi, Q., Wang, Z., Xu, Y., Fang, Y., & Wang, “Enhancing Phishing Email Detection through Ensemble Learning and Undersampling,” Appl. Sci., vol. 13, no. 15, 2023, doi: 10.3390/app13158756.

M. Kaibassova, D., Saginov, A., Nurtay, M., Tau, A., & Kissina, “Solving the Problem of Detecting Phishing Websites Using Ensemble Learning Models,” Sci. J. Astana IT Univ., pp. 55–64, 2022, doi: 10.37943/12oyrs4391.

A. Pandey and J. Chadawar, “Phishing URL Detection using Hybrid Ensemble Model,” Artic. Int. J. Eng. Tech. Res., vol. 11, no. 04, pp. 479–482, 2022, [Online]. Available: https://www.researchgate.net/publication/360412387

A. Raja, S. A., Balasubaramanian, S., Al-Kaabi, A. S., Sharma, B., Chowdhury, S., Mehbodniya, A., Webber, J. L., & Bostani, “Analysis of the Performance Impact of Fine-Tuned Machine Learning Model for Phishing URL Detection,” Electron., vol. 12, no. 7, 2023, doi: 10.3390/electronics12071642.

Y. Wei and Y. Sekiya, “Sufficiency of Ensemble Machine Learning Methods for Phishing Websites Detection,” IEEE Access, vol. 10, no. November, pp. 124103–124113, 2022, doi: 10.1109/ACCESS.2022.3224781.

M. A. Yeasmin, M. N., Refat, M. A. R., Singh, B. C., Alom, Z., Aung, Z., & Azim, “EnLeM: An Ensemble Learning-based Model for Detecting Phishing Websites,” 2023.

E. Gandotra and D. Gupta, “An Efficient Approach for Phishing Detection using Machine Learning,” pp. 239–253, 2021, doi: 10.1007/978-981-15-8711-5_12.

M. S. Khatun, M., Mozumder, M. A. I., Polash, M. N. H., Hasan, M. R., Ahammad, K., & Shaiham, “An Approach to Detect Phishing Websites with Features Selection Method and Ensemble Learning,” Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 8, pp. 768–775, 2022, doi: 10.14569/IJACSA.2022.0130888.

T. Agarwal, G., Goel, C., Jindal, K., & Subbulakshmi, “Visualisation and Classification of Phishing URL using Ensemble Learning Algorithms and Hyper-Parameter Tuning,” ICSCCC 2023 - 3rd Int. Conf. Secur. Cyber Comput. Commun., pp. 13–18, 2023, doi: 10.1109/ICSCCC58608.2023.10176642.

U. Venugopal, S., Panale, S. Y., Agarwal, M., Kashyap, R., & Ananthanagu, “Detection of Malicious URLs through an Ensemble of Machine Learning Techniques,” 2021 IEEE Asia-Pacific Conf. Comput. Sci. Data Eng. CSDE 2021, pp. 1–6, 2021, doi: 10.1109/CSDE53843.2021.9718370.