Three-stage data generation algorithm for multiclass network intrusion detection with highly imbalanced dataset

Kwok Tai Chui, Brij Bhooshan Gupta, Priyanka Chaurasia, Varsha Arya, Ammar Almomani, Wadee Alhalabi

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)
1 Downloads (Pure)


The Internet plays a crucial role in our daily routines. Ensuring cybersecurity to Internet users will provide a safe online environment. Automatic network intrusion detection (NID) using machine learning algorithms has recently received increased attention recently. The NID model is prone to bias towards the classes with more training samples due to highly imbalanced datasets across different types of attacks. The challenge in generating additional training data for minority classes is the generation of insufficient data. The study's purpose is to address this challenge, which extends the data generation ability by proposing a three-stage data generation algorithm using the synthetic minority over-sampling technique, a generative adversarial network (GAN), and a variational autoencoder. A convolutional neural network is employed to extract the representative features from the data, which were fed into a support vector machine with a customised kernel function. An ablation study evaluated the effectiveness of the three-stage data generation, feature extraction, and customised kernel. This was followed by a performance comparison between our study and existing studies. The findings revealed that the proposed NID model achieved an accuracy of 91.9%–96.2% in the four benchmark datasets. In addition, it outperformed existing methods such as GAN-based deep neural networks, conditional Wasserstein GAN-based stacked autoencoder, synthesised minority oversampling technique-based random forest, and variational autoencoder-based deep neural network, by 1.51%–28.4%.
Original languageEnglish
Pages (from-to)202-210
Number of pages9
JournalInternational Journal of Intelligent Networks
Early online date5 Aug 2023
Publication statusPublished online - 5 Aug 2023


  • Convolutional neural network
  • Data generation
  • Generative adversarial network
  • Kernel function
  • Multiclass classification
  • Network intrusion detection
  • Support vector machi8ne
  • Synthetic minority over-sampling technique


Dive into the research topics of 'Three-stage data generation algorithm for multiclass network intrusion detection with highly imbalanced dataset'. Together they form a unique fingerprint.

Cite this