Logo image
Rapid insight data engine: an open-source Python framework for automated analysis of tabular data : a thesis in data science
Thesis   Open access

Rapid insight data engine: an open-source Python framework for automated analysis of tabular data : a thesis in data science

Sudhanshu Mukherjee
Master of Science (MS), University of Massachusetts Dartmouth
2025
DOI:
https://doi.org/10.62791/20436

Abstract

The exponential growth in data generation has made data science and machine learning essential components of modern analytics. This thesis introduces the Rapid Insights Data Engine, an open-source Python framework and command line interface designed specifically for tabular data analysis. Rapid Insights Data Engine integrates fundamental machine learning methodologies and standardized workflows to streamline data investigation and interpretation. Unlike existing solutions, Rapid Insights Data Engine is available to users from diverse backgrounds including business analytics, researchers, and junior data scientists requiring minimal programming knowledge. While most emerging tools backed by large language models focus on natural language code generation, Rapid Insights Data Engine differentiates itself by providing true NoCode capabilities to data enthusiasts, accelerating their development cycle through integrated backend systems that handle universal data processing. Rapid Insights Data Engine controls backend code processing, allowing users to focus on analysis and improving understanding of data. It leverages Large Language Models specifically for result interpretation rather than code generation, ensuring both security and accessibility in the analytical process. RIDE helps prepare reports around a dataset more easily as users can generate results and interpretations with just a few clicks and without writing a single line of code. To validate Rapid Insights Data Engine’s effectiveness, we tested it across different datasets, evaluating its performance in data preprocessing, feature scaling, transformation, and AutoML for regression, classification, and clustering. Performance testing with datasets ranging from small to medium-sized datasets demonstrated promising results concerning processing time, memory utilization, and CPU efficiency.
pdf
Mukherjee S. CAS MS Thesis 20252.30 MBDownloadView
CC BY-NC-ND V4.0 Open Access

Metrics

38 File views/ downloads
63 Record Views

Details

Logo image