Abstract
As cyber threats grow in complexity, understanding how adversaries operate and predicting their next moves has become essential for proactive defense. This thesis presents a hybrid modeling system that combines logic-driven simulation with machine learning–based sequence prediction to anticipate the progression of cyberattack chains. Using structured data from the MITREATT&CK framework, we parse techniques, tactics, threat groups, campaigns, and interrelationships to generate realistic, directed attack chains through both randomized simulation and campaign-informed permutations. To simulate how attackers might move through systems, we train a first-order Markov Chain model that captures transition probabilities between techniques and supports probabilistic multistep path generation. In comparison, a Long Short-Term Memory (LSTM) neural network learns to predict context-aware next-step techniques based on full sequence history, capturing deeper temporal and semantic patterns. We also introduce a geometric mean–based scoring method to evaluate the risk and coherence of each predicted path, categorizing them into low, medium, or high-risk levels. To reduce noise and improve interpretability, we apply a Chain Contraction algorithm that compresses redundant or semantically similar steps, producing cleaner and more meaningful representations of attacker behavior. An interactive interface allows users to select a starting technique from a dropdown menu and explore simulated attack paths in real time, complete with tabulated probabilities and visual graph representations. We evaluate both models by comparing their predictions to real-world campaigns such as Operation Ghost (C0023) and SolarWinds (C0024) from the APT29 threat group (G0016), using STIX-based datasets from online repositories from the ATT&CK. This work demonstrates how structured threat intelligence and generative modeling can work together to support red teaming, threat hunting, and campaign attribution. It provides hands-on tools for analysts to simulate, visualize, and compare attacker behavior in a dynamic, data-driven environment.