Abstract
With the growing digitization of healthcare data, automating the medical coding process has become increasingly important. Recent advances in machine learning and natural language processing (NLP) have led to promising approaches for automated medical coding using clinical notes and discharge summaries. Among these, deep learning excels at extracting complex patterns from unstructured text. However, it often requires large annotated datasets, significant computational resources, and lacks interpretability, which are key concerns in clinical settings. In this thesis, we adopt a hierarchical classification structure that mirrors the tree-like organization of the ICD coding system. To offer a scalable and efficient solution, we propose a hybrid decision tree (HDT) framework for automated ICD coding, which combines the efficiency of rule-based methods with the predictive power of deep learning models. Rather than relying on a single paradigm, the HDT approach determines, at each decision node, whether a lightweight rule-based classifier is sufficient or whether a more complex deep learning model is needed. For simpler nodes, where distinguishing features such as specific symptoms or keywords are easily identifiable, we classify medical codes using rule-based methods that apply statistical feature scoring based on term frequency and class-specific relevance. For more complex cases, where textual overlap between conditions makes rule-based classification unreliable, we employ deep learning models, particularly Long Short-Term Memory (LSTM) networks, to capture subtle semantic patterns in clinical text. We evaluate our approach using clinical notes and discharge summaries from the MIMIC-IV dataset. The results demonstrate that HDT offers a favorable trade-off by maintaining high prediction accuracy while significantly reducing inference time and resource consumption. Furthermore, its modular design facilitates system scalability and adaptation to updates in the ICD coding system, making it well-suited for real-world deployment.