Adaptive CTGAN: enhancing synthetic data generation for imbalanced cybersecurity datasets : a thesis in Data Science

Devcharan Krishna Naik

doi:10.62791/20544

Back

Thesis

Adaptive CTGAN: enhancing synthetic data generation for imbalanced cybersecurity datasets : a thesis in Data Science

Devcharan Krishna Naik

Master of Science (MS), University of Massachusetts Dartmouth

2026

DOI:

https://doi.org/10.62791/20544

Abstract

Machine learning-based Network Intrusion Detection Systems (NIDS) are essential for identifying cyber threats in large-scale network environments. They are highly sensitive to the severe class imbalance typical of real-world network traffic. In such datasets, benign samples vastly outnumber malicious ones, resulting in biased models that struggle to detect rare but high-impact attacks. Generative approaches such as Conditional Tabular GANs (CTGANs) have emerged as effective tools for addressing this imbalance through synthetic data augmentation. However, existing CTGAN frameworks exhibit shortcomings that limit their ability to capture class relationships and efficiently learn from complex minority patterns. This thesis introduces Adaptive CTGAN, a novel generative framework that enhances both the conditioning mechanism and the training process of conventional CTGANs. The model integrates a learnable class embedding layer to encode semantic relationships among attack categories, and a dynamic conditional sampling strategy that adaptively adjusts the generator’s focus based on learning difficulty. Together, these enhancements enable the model to generate synthetic samples of higher fidelity and stronger diversity, particularly for the extreme minority classes. Using the CIC-IDS-2017 benchmark dataset, Adaptive CTGAN is evaluated against the standard CTGAN under the Train-on-Synthetic, Test-on-Real (TSTR) paradigm. Experimental results demonstrate notable improvements in data quality and minority-class detection, reflected in higher F1-scores achieved by downstream Random Forest classifiers. Beyond performance, the proposed method also supports privacy preservation of sensitive network data while maintaining model effectiveness.

Files and links (1)

pdf

Krishna Naik D. COE MS Thesis 20261.69 MB

Embargoed Access, Embargo ends: 06/17/2026 CC BY-NC-ND V4.0

Metrics

8 Record Views

Details

Title: Adaptive CTGAN
Creators: Devcharan Krishna Naik
ORCID: 0009-0004-3171-6201
Contributors: Ashokkumar Ratilal Patel (Advisor) - University of Massachusetts Dartmouth, Department of Computer and Information Science
Yuchou Chang (Committee Member) - University of Massachusetts Dartmouth, Department of Computer and Information Science
Long Jiao (Committee Member) - University of Massachusetts Dartmouth, Department of Computer and Information Science
Number of pages: ix, 44 pages
Illustrations: illustrations (some color)
Table of contents: List of figures -- List of tables -- Abbreviations -- Chapter 1. Introduction -- The core problem: class imbalance in cybersecurity data -- State-of-the-art solutions and research gap -- Chapter 2. Related work -- Generative adversarial networks -- Conditional generative adversarial network (cGAN) -- Conditional tabular generative adversarial network (CTGAN) -- Motivation for improvement -- Chapter 3. Methodology -- The baseline model: standard CTGAN -- The proposed model: the adaptive CTGAN (A-CTGAN) -- Chapter 4. Experiments -- Experimental setup -- Dataset and preprocessing -- Model implementation and training -- Synthetic data generation -- Downstream classifier training and evaluation -- Statistical fidelity benchmarks -- Chapter 5. Results and discussion -- Downstream classifier performance (TSTR) -- Statistical and qualitative analysis -- Chapter 6. Conclusion -- References.
References: Includes bibliographical references (page 42).
Awarding Institution: University of Massachusetts Dartmouth
Degree Awarded: Master of Science (MS)
Degree in: Data Science
Academic Unit: Department of Computer and Information Science
Language: English
Resource Type: Thesis
DOI: https://doi.org/10.62791/20544
Record Identifier: 9914528160101301