Enhancing Botnet Detection with Machine Learning and Explainable AI: A Step Towards Trustworthy AI Security
Author(s):Vishva Patel1, Hitasvi Shukla2,Aashka Raval3
Affiliation: 1,2,3Department of Computer Engineering, Pandit Deendayal Energy University, India
Page No: 12-26
Volume issue & Publishing Year: Volume 2 Issue 4,April-2025
Journal: International Journal of Advanced Engineering Application (IJAEA)
ISSN NO: 3048-6807
DOI: https://doi.org/10.5281/zenodo.17658863
Abstract:
The rapid proliferation of botnets, armies of compromised machines controlled by malicious actors remotely, has played a pivotal role in the increase in cyber-attacks, such as Distributed Denial-of-Service (DDoS) attacks, credential theft, data exfiltration, command-and-control (C2) activity, and automated exploitation of vulnerabilities. Legacy botnet detection methods, founded on signature matching and deep packet inspection (DPI), are rapidly becoming a relic of the past because of the prevalence of encryption schemes like TLS 1.3, DNS-over-HTTPS (DoH), and encrypted VPN tunneling. These encryption mechanisms conceal packet payloads, making traditional network monitoring technology unsuitable for botnet detection. Faced with the challenge, ML-based botnet detection mechanisms have risen to the top. Existing ML-based approaches, however, are marred by two inherent weaknesses: (1) Lack of granularity in detection because most models are based on binary classification, with no distinction of botnet attack variants, and (2) Uninterpretability, where high-performing AI models behave like black-box mechanisms, which limits trust in security automation and leads to high false positives, thereby making threat analysis difficult for security practitioners.
To overcome these challenges, this study proposes an AI-based, multi-class classification botnet detection system for encrypted network traffic that includes Explainable AI (XAI) techniques for improving model explainability and decision transparency. Two datasets, CICIDS-2017 and CTU-NCC, are used in this study, where a systematic data preprocessing step was employed to maximise data quality, feature representation, and model performance. Preprocessing included duplicate record removal, missing and infinite value imputation, categorical feature transformation, and removal of highly correlated and zero-variance features to minimise model bias. Dimensionality reduction was performed using Principal Component Analysis (PCA), lowering features of CICIDS-2017 from 70 to 34 and those of CTU-NCC from 17 to 4 for maximizing computational efficiency. Additionally, to deal with skewed class distributions, Synthetic Minority Over-Sampling Technique (SMOTE) was employed to synthesise minority class samples to offer balanced representation of botnet attack types.
For CICIDS-2017, we used three machine learning algorithms: Random Forest (RF) with cross-validation (0.98 accuracy, 100K samples per class), eXtreme Gradient Boosting (XGB) with Bayesian optimisation (0.997 accuracy, 180K samples per class), and our recently introduced Hybrid K-Nearest Neighbours(KNN) + Random Forest (RF) model, resulting in state-of-the-art accuracy of 0.99 (180K samples per class). The CTU-NCC dataset was divided across three network sensors and processed separately. Random Forest (RF), Decision Tree (DT), and KNN models were trained independently for each sensor, and to enhance performance, ensemble learning methods such as stacking and voting were applied to combine the results from each of the sensors. The resulting accuracies were as follows: (Random Forest Stacking: 99.38%, Random Forest Voting: 99.35% ), (Decision Tree Stacking: 99.68%, Decision Tree Voting: 91.65%), and (KNN Stacking: 97.53%, KNN Voting: 97.11%). Explainable AI (XAI) techniques like SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model agnostic Explanation) were integrated to provide enhanced interpretability in eXtreme Gradient Boosting and our Hybrid KNN+Random Forest model, which provided explanations for model decisions and enhanced analyst confidence in the system prediction.
Our key contribution is the Hybrid KNN+Random Forest system with 0.99 accuracy and provision of explainability. We illustrate an accurate, scalable, and deployable AI-based solution for botnet attacks. Our experimentation shows that the multi-class classification method greatly assists in botnet attack discrimination, and Explainable AI (XAI) helps enhance clarity and is thus a strong, practical solution in the real case of botnet detection in an encrypted network scenario.
Keywords: Botnet Detection, Encrypted Networks, Ensemble Models, Explainable AI
Reference:
- [1] D. Zhao et al., “Botnet detection based on traffic behavior analysis and flow intervals,” Comput Secur, vol. 39, no. PARTA, pp. 2–16, 2013, doi: 10.1016/j.cose.2013.04.007.
- [2] C. Wei, G. Xie, and Z. Diao, “A lightweight deep learning framework for botnet detecting at the IoT edge,” Jun. 01, 2023, Elsevier Ltd. doi: 10.1016/j.cose.2023.103195.
- [3] Z. Wang, K.-W. Fok, and V. L. L. Thing, “Machine Learning for Encrypted Malicious Traffic Detection: Approaches, Datasets and Comparative Study,” Mar. 2022, doi: 10.1016/j.cose.2021.102542.
- [4] M. A. Hossain and M. S. Islam, “A novel hybrid feature selection and ensemble-based machine learning approach for botnet detection,” Sci Rep, vol. 13, no. 1, Dec. 2023, doi: 10.1038/s41598-023-48230-1.
- [5] I. Sharafaldin, A. Habibi Lashkari, and A. A. Ghorbani, “Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization,” in Proceedings of the 4th International Conference on Information Systems Security and Privacy, SCITEPRESS - Science and Technology Publications, 2018, pp. 108–116. doi: 10.5220/0006639801080116.
- [6] M. A. R. Putra, D. P. Hostiadi, and T. Ahmad, “Botnet dataset with simultaneous attack activity,” Data Brief, vol. 45, p. 108628, Dec. 2022, doi: 10.1016/j.dib.2022.108628.
- [7] Z. Wang and V. L. L. Thing, “Feature Mining for Encrypted Malicious Traffic Detection with Deep Learning and Other Machine Learning Algorithms,” Apr. 2023, doi: 10.1016/j.cose.2023.103143.
- [8] X. Meng, B. Lang, Y. Liu, and Y. Yan, “Deeply fused flow and topology features for botnet detection based on a pretrained GCN.”
- [9] A. K. Kumar et al., “Enhanced Hybrid Deep Learning Approach for Botnet Attacks Detection in IoT Environment,” in 2024 7th International Conference on Signal Processing and Information Security (ICSPIS), IEEE, Nov. 2024, pp. 1–6. doi: 10.1109/ICSPIS63676.2024.10812621.
- [10] N. Moustafa and J. Slay, “UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set),” in 2015 Military Communications and Information Systems Conference (MilCIS), IEEE, Nov. 2015, pp. 1–6. doi: 10.1109/MilCIS.2015.7348942.
- [11] S. I. Popoola, B. Adebisi, M. Hammoudeh, G. Gui, and H. Gacanin, “Hybrid Deep Learning for Botnet Attack Detection in the Internet-of-Things Networks,” IEEE Internet Things J, vol. 8, no. 6, pp. 4944–4956, Mar. 2021, doi: 10.1109/JIOT.2020.3034156.
- [12] F. Alizadeh and M. Khansari, “An Analysis of Botnet Detection Using Graph Neural Network,” in 2023 13th International Conference on Computer and Knowledge Engineering (ICCKE), IEEE, Nov. 2023, pp. 491–495. doi: 10.1109/ICCKE60553.2023.10326235.
- [13] X. Zang, T. Wang, X. Zhang, J. Gong, P. Gao, and G. Zhang, “Encrypted malicious traffic detection based on natural language processing and deep learning,” Computer Networks, vol. 250, Aug. 2024, doi: 10.1016/j.comnet.2024.110598.
- [14] F. Hussain et al., “A Two-Fold Machine Learning Approach to Prevent and Detect IoT Botnet Attacks,” IEEE Access, vol. 9, pp. 163412–163430, 2021, doi: 10.1109/ACCESS.2021.3131014.
- [15] A. A. korba, A. Diaf, and Y. Ghamri-Doudane, “AI-Driven Fast and Early Detection of IoT Botnet Threats: A Comprehensive Network Traffic Analysis Approach,” Jul. 2024, [Online]. Available: http://arxiv.org/abs/2407.15688