Attack chain contraction and prediction using Markov model and LSTM on MITRE ATT&CK data: a thesis in Data Science

Ruksar Rafique Lukade

doi:10.62791/20489

Back

Thesis

Open access

Attack chain contraction and prediction using Markov model and LSTM on MITRE ATT&CK data: a thesis in Data Science

Ruksar Rafique Lukade

Master of Science (MS), University of Massachusetts Dartmouth

2025

DOI:

https://doi.org/10.62791/20489

Abstract

As cyber threats grow in complexity, understanding how adversaries operate and predicting their next moves has become essential for proactive defense. This thesis presents a hybrid modeling system that combines logic-driven simulation with machine learning–based sequence prediction to anticipate the progression of cyberattack chains. Using structured data from the MITREATT&CK framework, we parse techniques, tactics, threat groups, campaigns, and interrelationships to generate realistic, directed attack chains through both randomized simulation and campaign-informed permutations. To simulate how attackers might move through systems, we train a first-order Markov Chain model that captures transition probabilities between techniques and supports probabilistic multistep path generation. In comparison, a Long Short-Term Memory (LSTM) neural network learns to predict context-aware next-step techniques based on full sequence history, capturing deeper temporal and semantic patterns. We also introduce a geometric mean–based scoring method to evaluate the risk and coherence of each predicted path, categorizing them into low, medium, or high-risk levels. To reduce noise and improve interpretability, we apply a Chain Contraction algorithm that compresses redundant or semantically similar steps, producing cleaner and more meaningful representations of attacker behavior. An interactive interface allows users to select a starting technique from a dropdown menu and explore simulated attack paths in real time, complete with tabulated probabilities and visual graph representations. We evaluate both models by comparing their predictions to real-world campaigns such as Operation Ghost (C0023) and SolarWinds (C0024) from the APT29 threat group (G0016), using STIX-based datasets from online repositories from the ATT&CK. This work demonstrates how structured threat intelligence and generative modeling can work together to support red teaming, threat hunting, and campaign attribution. It provides hands-on tools for analysts to simulate, visualize, and compare attacker behavior in a dynamic, data-driven environment.

Files and links (1)

pdf

Lukade R.R. COE MS Thesis 2025940.29 kBDownload View

CC BY-NC-ND V4.0, Open Access

Metrics

863 File views/ downloads

86 Record Views

Details

Title: Attack chain contraction and prediction using Markov model and LSTM on MITRE ATT&CK data
Creators: Ruksar Rafique Lukade
ORCID: 0009-0002-0714-4974
Contributors: Gokhan Kul (Advisor) - University of Massachusetts Dartmouth, Department of Computer and Information Science
Ashokkumar Ratilal Patel (Committee Member) - University of Massachusetts Dartmouth, Department of Computer and Information Science
Long Jiao (Committee Member) - University of Massachusetts Dartmouth, Department of Computer and Information Science
Number of pages: viii, 50 pages
Illustrations: color illustrations
Table of contents: List of Figures -- Chapter 1. Introduction -- Problem statement -- Method overview and distinction from existing solutions -- Findings -- Main impact -- Chapter 2. Related work -- What is known: forecasting and understanding adversarial behavior -- What is unknown: gaps in forecasting and understanding -- Chapter 3. Methodology -- Overall methodology -- Baseline method -- Proposed method in detail -- Chapter 4. Dataset -- Dataset overview -- Rationale for dataset selection -- Structure and content of the dataset -- Parsing and preprocessing -- Dataset utilization in the framework -- Quantitative summary -- Ensuring reproducibility and future compatibility -- Chapter 5. Results -- Centered Markov tree simulation: account discovery -- LSTM-predicted attack tree from account discovery -- Distribution of LSTM transition probabilities -- Markov model transition probability distribution -- Per-technique MSE comparison between Markov and LSTM models -- Vocabulary overlap between LSTM and Markov models -- Chapter 6. Discussions -- Bridging the gap -- Limitations -- Chapter 7. Conclusion -- Summary of goals and achievements -- Implications of results -- Concluding remarks -- Reference.
References: Includes bibliographical references (pages 48-50).
Awarding Institution: University of Massachusetts Dartmouth
Degree Awarded: Master of Science (MS)
Degree in: Data Science
Academic Unit: Department of Computer and Information Science
Language: English
Resource Type: Thesis
DOI: https://doi.org/10.62791/20489
Record Identifier: 9914504161701301