Classifying Cybersecurity Threats in URLs Using Decision Tree and Naive Bayes Algorithms: A Data Mining Approach for Phishing, Defacement, and Benign Threat Detection

Authors

  • Deshinta Arrova Dewi
  • Tri Basuki Kurniawan Faculty of Science and Technology, Universitas Bina Darma, Indonesia

Keywords:

URL Classification, Phishing Detection, Decision Tree, Naive Bayes, Cybersecurity Threat Detection

Abstract

This research focuses on the application of data mining techniques to classify URLs into multiple cybersecurity threat categories, including phishing, defacement, and benign URLs. Accurate classification of URLs is crucial in the current digital landscape, where cyber threats are increasing in both frequency and complexity. This study employs two popular machine learning algorithms, Decision Tree and Multinomial Naive Bayes, to analyze and classify URL data based on their textual content. The URLs were transformed using Term Frequency-Inverse Document Frequency (TF-IDF) vectorization, allowing the models to learn distinctive patterns within the URL strings that signify different threat types. The dataset used comprises 24,800 labeled URLs, representing a realistic mix of common and rare cyber threat categories. Both models demonstrated strong classification performance, with the Decision Tree achieving an accuracy of 94.01% and Naive Bayes reaching 92.36%. While both classifiers performed well on the dominant categories such as phishing and benign URLs, challenges remained in accurately detecting less frequent classes due to class imbalance. The Decision Tree model showed a slightly better ability to handle these imbalances and provided interpretability through feature importance analysis, highlighting key URL tokens influencing classification decisions. Naive Bayes, although efficient and effective for the majority classes, exhibited lower recall for minority classes. The results indicate that machine learning models can effectively support automated threat detection systems by classifying URLs with high accuracy, thereby enhancing cybersecurity defenses. Future work may explore advanced modeling techniques, such as ensemble methods or deep learning, alongside improved feature engineering and data augmentation to address class imbalance and improve detection of rare threats. Additionally, incorporating multi-source data could further strengthen threat classification. Overall, this research contributes valuable insights into URL-based cyber threat classification using accessible and interpretable machine learning approaches, supporting the development of proactive and scalable cybersecurity solutions.

Downloads

Published

2025-06-03

How to Cite

Dewi, D. A., & Kurniawan, T. B. (2025). Classifying Cybersecurity Threats in URLs Using Decision Tree and Naive Bayes Algorithms: A Data Mining Approach for Phishing, Defacement, and Benign Threat Detection . Journal of Cyber Law, 1(2), 175–189. Retrieved from https://jcl.mbicore.com/index.php/JCL/article/view/10