Author : Anietie Ekong 1
Date of Publication :31st May 2023
Abstract: Phishing remains a major concern for security specialists over the years. Phishing attack are aimed at tricking people into giving out sensitive or confidential information using social engineering method. So far, machine learning (ML) algorithms like Artificial Neural Network (ANN), Decision Tree (DT), Support Vector Machines (SVM) Logistic Regression (LR) etc, have offered the most effective means of classifying scam Uniform Resource Locators (URLs). The main focus of this work is to classify URLs or sites into two classes: legitimate (0) or phishing (1). A total number of 8,391 legitimate URLs and 7,727 phishing URLs were sourced from phishTank. After data preprocessing, the dataset was split into training set and testing set in the ratio of 8:2. The trained dataset was fit into the LR and SVM algorithm. The performance of the SVM and LR algorithms was tested using the test dataset and their outcomes were used as an input to the Stacking Model in order to improve the classification accuracy. This model was trained and tested using tools developed from Python programing language, Jupyter notebook IDE and Python external libraries. A classification accuracy of 90% and 95% were recorded for LR and SVM respectively. The hybrid Model which is an enhanced model has an accuracy of 97%. Based on the above metrices, the Stacked Model can be used to effectively detect scam URLs with high accuracy.
Reference :