Logo image
A hybrid decision tree with rule-based and deep learning nodes for automated medical coding: a thesis in Computer Science
Thesis   Open access

A hybrid decision tree with rule-based and deep learning nodes for automated medical coding: a thesis in Computer Science

Spoorthi Subramanya Bhat
Master of Science (MS), University of Massachusetts Dartmouth
2025
DOI:
https://doi.org/10.62791/20522

Abstract

With the growing digitization of healthcare data, automating the medical coding process has become increasingly important. Recent advances in machine learning and natural language processing (NLP) have led to promising approaches for automated medical coding using clinical notes and discharge summaries. Among these, deep learning excels at extracting complex patterns from unstructured text. However, it often requires large annotated datasets, significant computational resources, and lacks interpretability, which are key concerns in clinical settings. In this thesis, we adopt a hierarchical classification structure that mirrors the tree-like organization of the ICD coding system. To offer a scalable and efficient solution, we propose a hybrid decision tree (HDT) framework for automated ICD coding, which combines the efficiency of rule-based methods with the predictive power of deep learning models. Rather than relying on a single paradigm, the HDT approach determines, at each decision node, whether a lightweight rule-based classifier is sufficient or whether a more complex deep learning model is needed. For simpler nodes, where distinguishing features such as specific symptoms or keywords are easily identifiable, we classify medical codes using rule-based methods that apply statistical feature scoring based on term frequency and class-specific relevance. For more complex cases, where textual overlap between conditions makes rule-based classification unreliable, we employ deep learning models, particularly Long Short-Term Memory (LSTM) networks, to capture subtle semantic patterns in clinical text. We evaluate our approach using clinical notes and discharge summaries from the MIMIC-IV dataset. The results demonstrate that HDT offers a favorable trade-off by maintaining high prediction accuracy while significantly reducing inference time and resource consumption. Furthermore, its modular design facilitates system scalability and adaptation to updates in the ICD coding system, making it well-suited for real-world deployment.
pdf
Bhat S.S. COE MS Thesis 2025909.75 kBDownloadView
CC BY-NC-ND V4.0 Open Access

Metrics

179 File views/ downloads
21 Record Views

Details

Logo image