Journal of Cyber Law

Predicting Fraud Cases in E-Commerce Transactions Using Random Forest Regression: A Data Mining Approach for Enhancing Cybersecurity and Transaction Integrity

Yusuf Durachman, Abdul Wahab Bin Abdul Rahman — Tue, 03 Jun 2025 00:00:00 +0700

Fraudulent activities in e-commerce pose significant risks to businesses and consumers alike, resulting in financial losses and eroding trust in online transactions. This study aims to address this issue by developing a predictive model for fraud cases using Random Forest Regression, a robust machine learning technique known for handling nonlinear relationships and high-dimensional data. The dataset comprises daily transaction metrics such as fraud cases, transaction errors per million, transparency rating, security incidents, cyber attacks, audit compliance scores, transaction speeds, and customer trust indices, collected over multiple years. The methodology involves extensive data preprocessing, including temporal feature extraction from date information, and exploratory data analysis to identify key relationships among features. Correlation analysis revealed that transaction errors per million and security incidents are highly correlated with fraud cases, serving as important predictors. The dataset was split into training and testing sets, with the Random Forest model trained on 80% of the data and evaluated on the remaining 20%. Results indicate that the Random Forest model predicts fraud cases with high accuracy, achieving an R-squared score of 0.9832 and low error metrics (MAE of 21.07 and RMSE of 26.26). Feature importance analysis identified transaction errors per million as the most influential variable, confirming its critical role in fraud detection. Despite these promising results, limitations such as potential data imbalance and model interpretability challenges remain and warrant further research. This research contributes to the growing body of knowledge applying machine learning to cybersecurity and fraud detection, demonstrating practical applicability for improving e-commerce transaction security. The findings also have implications for cyberlaw, suggesting that advanced predictive tools can enhance regulatory enforcement and help develop more secure online commerce environments. Future work will explore incorporating additional features and alternative algorithms to further improve model robustness and transparency.

Detecting Threatening Content in Social Media: A Data Mining Approach Using Random Forest for Classification of Tweets in Cyberlaw Context

Husni Teja Sukmana, Lee Kyung Oh — Tue, 03 Jun 2025 00:00:00 +0700

The rapid growth of social media platforms has increased the prevalence of threatening and harmful content, raising significant challenges for online safety and legal enforcement. This study explores the application of data mining techniques, specifically the Random Forest algorithm, to detect threatening tweets based on numerical metadata features such as user follower count, retweet and favorite counts, hashtag usage, mentions, and emoticon presence. Using a dataset of 1,000 tweets with balanced classes of threatening and non-threatening posts, the research implements a structured workflow that includes exploratory data analysis, preprocessing, model training, and evaluation. The Random Forest classifier achieved moderate performance, with an accuracy of approximately 50.5%, precision and recall near 51%, and an F1-score of 51.2%. Feature importance analysis indicated that user engagement metrics—particularly user followers, favorite count, and retweet count—were the most influential in identifying threatening content. Despite these promising insights, the results also highlight limitations due to the absence of direct textual analysis and the inherent challenges of predicting threats solely from metadata. This research contributes to the Cyberlaw domain by demonstrating how machine learning can aid legal frameworks in automating the detection of online threats, potentially improving efficiency in monitoring social media for harmful content. However, the study emphasizes the necessity for combining metadata-driven models with natural language processing and human oversight to ensure balanced, accurate, and legally sound interventions. Future work should focus on expanding datasets, integrating textual features, and exploring advanced algorithms to enhance detection accuracy. Overall, this study provides foundational evidence for the role of data mining in supporting Cyberlaw enforcement, underscoring the importance of technological innovation in addressing the complex issues of online harassment and threats in the digital age.

Predicting Cyber Attack Types Using XGBoost: A Data Mining Approach to Enhance Legal Frameworks for Cybersecurity

I Gede Agus Krisna Warmayana, Yuichiro Yamashita, Nobuto Oka — Tue, 03 Jun 2025 00:00:00 +0700

Cybersecurity threats continue to evolve rapidly, posing significant risks to organizations and challenging existing legal frameworks. This study explores the application of machine learning, specifically the XGBoost algorithm, to predict types of cyber attacks using a comprehensive dataset of cybersecurity incidents. The dataset includes organizational attributes, attack characteristics, and mitigation responses, which are preprocessed through feature scaling and encoding to support model training. Initial exploratory data analysis revealed class imbalances and variability in feature distributions, highlighting the complexity of the prediction task. The XGBoost model was trained and evaluated on an 80:20 train-test split, achieving an overall accuracy of 22.5% in multi-class classification of five common cyber attack types: Phishing, SQL Injection, DDoS, Ransomware, and Zero-Day Exploit. While the model’s predictive performance was modest, feature importance analysis identified critical predictors such as geographical location, mitigation steps, and compliance standards, providing valuable interpretability. These findings underscore the potential for machine learning to support cybersecurity law enforcement by offering data-driven insights into attack patterns and organizational vulnerabilities. The ability to classify attack types can assist legal authorities and policymakers in developing targeted regulatory measures and prioritizing enforcement actions. Furthermore, the transparent nature of XGBoost’s feature contributions facilitates accountability in legal contexts where automated decision-making tools are increasingly employed. However, limitations such as data imbalance and missing values affected model accuracy, suggesting the need for enhanced data collection and advanced modeling techniques in future research. Expanding datasets, incorporating real-time threat intelligence, and leveraging ensemble or hybrid algorithms may improve prediction capabilities. This study contributes to the growing intersection of data mining and cyber law by demonstrating how machine learning models can enhance legal frameworks and cybersecurity strategies. The integration of predictive analytics into cyber law enforcement holds promise for strengthening defenses against increasingly sophisticated cyber threats.

Financial Loss Estimation in Cybersecurity Incidents: A Data Mining Approach Using Decision Tree and Linear Regression Models

Ika Maulita, B Herawan Hayadi — Tue, 03 Jun 2025 00:00:00 +0700

This study explores the application of data mining techniques to predict financial losses resulting from cybersecurity incidents. Using a dataset of 3,000 reported cyberattacks from 2015 to 2024, the research analyzes both numerical and categorical factors, including the number of affected users, incident resolution time, attack type, vulnerability exploited, and defense mechanisms employed. Through comprehensive exploratory data analysis and robust preprocessing methods, the study prepares the data for modeling using Linear Regression, Decision Tree, and Random Forest regressors. Among these, Random Forest offers reliable feature importance insights, revealing that the number of affected users, resolution time, and specific attack characteristics are the most influential predictors of financial loss. Model evaluation shows that both Linear Regression and Random Forest models achieve comparable predictive accuracy, with mean absolute errors around 24.7 million dollars and R-squared values close to zero, indicating challenges in fully explaining the variance in financial loss due to the complexity of cyber incidents. Decision Tree regression underperforms, likely due to overfitting. Visualizations comparing predicted and actual losses support these findings, highlighting areas for improvement in handling extreme loss values. The results underscore the multifaceted nature of cybersecurity risk, where both quantitative impacts and qualitative attack attributes must be considered. This research has practical implications for cybersecurity risk management and policy formulation. By identifying key drivers of financial loss, organizations can prioritize mitigation efforts on the most damaging attack types and vulnerabilities. The study also emphasizes the importance of rapid incident response to minimize financial damage. For policymakers, the findings provide data-driven evidence to guide the development of more effective cybersecurity regulations and compliance standards. Future work should extend this analysis by incorporating additional data sources and advanced machine learning techniques to enhance prediction accuracy and support proactive defense strategies. Overall, this study contributes to bridging the gap between cybersecurity data analysis and practical financial risk reduction.

Classifying Cybersecurity Threats in URLs Using Decision Tree and Naive Bayes Algorithms: A Data Mining Approach for Phishing, Defacement, and Benign Threat Detection

Deshinta Arrova Dewi, Tri Basuki Kurniawan — Tue, 03 Jun 2025 00:00:00 +0700

This research focuses on the application of data mining techniques to classify URLs into multiple cybersecurity threat categories, including phishing, defacement, and benign URLs. Accurate classification of URLs is crucial in the current digital landscape, where cyber threats are increasing in both frequency and complexity. This study employs two popular machine learning algorithms, Decision Tree and Multinomial Naive Bayes, to analyze and classify URL data based on their textual content. The URLs were transformed using Term Frequency-Inverse Document Frequency (TF-IDF) vectorization, allowing the models to learn distinctive patterns within the URL strings that signify different threat types. The dataset used comprises 24,800 labeled URLs, representing a realistic mix of common and rare cyber threat categories. Both models demonstrated strong classification performance, with the Decision Tree achieving an accuracy of 94.01% and Naive Bayes reaching 92.36%. While both classifiers performed well on the dominant categories such as phishing and benign URLs, challenges remained in accurately detecting less frequent classes due to class imbalance. The Decision Tree model showed a slightly better ability to handle these imbalances and provided interpretability through feature importance analysis, highlighting key URL tokens influencing classification decisions. Naive Bayes, although efficient and effective for the majority classes, exhibited lower recall for minority classes. The results indicate that machine learning models can effectively support automated threat detection systems by classifying URLs with high accuracy, thereby enhancing cybersecurity defenses. Future work may explore advanced modeling techniques, such as ensemble methods or deep learning, alongside improved feature engineering and data augmentation to address class imbalance and improve detection of rare threats. Additionally, incorporating multi-source data could further strengthen threat classification. Overall, this research contributes valuable insights into URL-based cyber threat classification using accessible and interpretable machine learning approaches, supporting the development of proactive and scalable cybersecurity solutions.