Logo image
Enhancing static code analysis in reverse engineering using retrieval-augmented generation and large language models: a thesis in Computer Engineering
Thesis   Open access

Enhancing static code analysis in reverse engineering using retrieval-augmented generation and large language models: a thesis in Computer Engineering

Brendan C. Thibault
Master of Science (MS), University of Massachusetts Dartmouth
2025
DOI:
https://doi.org/10.62791/20450

Abstract

This thesis explores the application of Large Language Models, (LLMs), in a reverse engineering context to perform accurate static analysis of code. A novel approach was taken using retrieval-augmented generation (RAG) to enhance the model output with contextual metadata, improving the accuracy and relevance of the generated documentation. This methodology was implemented within an Artificial Intelligence, (AI), enabled reverse engineering platform that allows users to perform static analysis and includes features such as a linear disassembly view, graph-based navigation, and AI-driven code summaries, among others. The results show that a RAG-based approach outperforms previous methods of LLM assisted reverse engineering. This research demonstrates significant performance and accuracy improvements using an AI enabled framework, and highlights the potential of integrating LLMs into reverse engineering workflows. The results of this work have practical implications in reverse engineering, and show that a RAG enhanced LLM approach can significantly assist in the reverse engineering process. Future directions of this work consist of identifying the most optimal context to be resolved during the RAG retrieval process using an unsupervised machine learning approach, as well as incorporating more powerful reasoning models.
pdf
Thibault B.C. COE MS Thesis 20254.17 MBDownloadView
CC BY-NC-ND V4.0 Open Access

Metrics

56 File views/ downloads
250 Record Views

Details

Logo image