Abstract
Machine learning-based Network Intrusion Detection Systems (NIDS) are essential for identifying cyber threats in large-scale network environments. They are highly sensitive to the severe class imbalance typical of real-world network traffic. In such datasets, benign samples vastly outnumber malicious ones, resulting in biased models that struggle to detect rare but high-impact attacks. Generative approaches such as Conditional Tabular GANs (CTGANs) have emerged as effective tools for addressing this imbalance through synthetic data augmentation. However, existing CTGAN frameworks exhibit shortcomings that limit their ability to capture class relationships and efficiently learn from complex minority patterns. This thesis introduces Adaptive CTGAN, a novel generative framework that enhances both the conditioning mechanism and the training process of conventional CTGANs. The model integrates a learnable class embedding layer to encode semantic relationships among attack categories, and a dynamic conditional sampling strategy that adaptively adjusts the generator’s focus based on learning difficulty. Together, these enhancements enable the model to generate synthetic samples of higher fidelity and stronger diversity, particularly for the extreme minority classes. Using the CIC-IDS-2017 benchmark dataset, Adaptive CTGAN is evaluated against the standard CTGAN under the Train-on-Synthetic, Test-on-Real (TSTR) paradigm. Experimental results demonstrate notable improvements in data quality and minority-class detection, reflected in higher F1-scores achieved by downstream Random Forest classifiers. Beyond performance, the proposed method also supports privacy preservation of sensitive network data while maintaining model effectiveness.