Semi-supervised detection of Algorithmically Generated Domains using Neural Network-based Autoencoders

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

99 Downloads (Pure)

Abstract

Domain Generation Algorithms (DGA) are typically used by recent botnets to communicate with their command-and-control server, thus exacerbating the complexity of detecting them compared to older botnets using static IP addresses. As such, recent studies have been experimenting with different approaches to detect algorithmically generated domains using a variety of methods, including Deep Learning. This paper presents a Deep Learning approach based on autoencoders as a semi-supervised method requiring only legitimate domains for training. Semi-supervised methods have an advantage over supervised methods in that they require no labelled DGA data. The proposed autoencoder structure is based on a Neural Network (NN) processing the frequency of 2-grams in domain names. The method has been compared with supervised machine learning methods and cross-validated on a second unseen dataset to evaluate the generalization of results. Results confirmed an F-score of 73% on DGA detection outperforming a NN based on letter frequencies and a Random Forest approach based on 𝑛-grams scoring 71% and 65% respectively.
Original languageEnglish
Title of host publicationAI-CyberSec 2021 - Workshop on Artificial Intelligence and Cyber Security
PublisherCEUR Workshop Proceedings
Pages1-9
Number of pages9
Publication statusPublished online - 14 Dec 2021
EventAI-Cybersec Workshop 2021: Workshop on Artificial Intelligence and Cyber Security - Virtual
Duration: 14 Dec 202114 Dec 2021
https://sites.google.com/view/ai-cybersec-2021/programme?authuser=0

Publication series

NameCEUR Workshop Proceedings
ISSN (Electronic)1613-0073

Workshop

WorkshopAI-Cybersec Workshop 2021
Abbreviated titleAI-CyberSec
Period14/12/2114/12/21
Internet address

Fingerprint

Dive into the research topics of 'Semi-supervised detection of Algorithmically Generated Domains using Neural Network-based Autoencoders'. Together they form a unique fingerprint.

Cite this