Domain Generation Algorithms (DGA) are typically used by recent botnets to communicate with their command-and-control server, thus exacerbating the complexity of detecting them compared to older botnets using static IP addresses. As such, recent studies have been experimenting with different approaches to detect algorithmically generated domains using a variety of methods, including Deep Learning. This paper presents a Deep Learning approach based on autoencoders as a semi-supervised method requiring only legitimate domains for training. Semi-supervised methods have an advantage over supervised methods in that they require no labelled DGA data. The proposed autoencoder structure is based on a Neural Network (NN) processing the frequency of 2-grams in domain names. The method has been compared with supervised machine learning methods and cross-validated on a second unseen dataset to evaluate the generalization of results. Results confirmed an F-score of 73% on DGA detection outperforming a NN based on letter frequencies and a Random Forest approach based on 𝑛-grams scoring 71% and 65% respectively.
|Title of host publication||AI-CyberSec 2021|
|Publisher||CEUR Workshop Proceedings|
|Publication status||Accepted/In press - 25 Nov 2021|
|Event||AI-Cybersec Workshop 2021: Workshop on Artificial Intelligence and Cyber Security - Virtual|
Duration: 14 Dec 2021 → 14 Dec 2021
|Workshop||AI-Cybersec Workshop 2021|
|Period||14/12/21 → 14/12/21|