Abstract
This thesis explores the application of Large Language Models, (LLMs), in a reverse engineering context to perform accurate static analysis of code. A novel approach was taken using retrieval-augmented generation (RAG) to enhance the model output with contextual metadata, improving the accuracy and relevance of the generated documentation. This methodology was implemented within an Artificial Intelligence, (AI), enabled reverse engineering platform that allows users to perform static analysis and includes features such as a linear disassembly view, graph-based navigation, and AI-driven code summaries, among others. The results show that a RAG-based approach outperforms previous methods of LLM assisted reverse engineering. This research demonstrates significant performance and accuracy improvements using an AI enabled framework, and highlights the potential of integrating LLMs into reverse engineering workflows. The results of this work have practical implications in reverse engineering, and show that a RAG enhanced LLM approach can significantly assist in the reverse engineering process. Future directions of this work consist of identifying the most optimal context to be resolved during the RAG retrieval process using an unsupervised machine learning approach, as well as incorporating more powerful reasoning models.