Logo image
Dataflow notebooks: enhancing computational notebooks : a thesis in Computer Science
Thesis   Open access

Dataflow notebooks: enhancing computational notebooks : a thesis in Computer Science

Jay S. Patel
Master of Science (MS), University of Massachusetts Dartmouth
2017
DOI:
https://doi.org/10.62791/19913

Abstract

Application software -- Development. Open source software.
Jupyter Notebook is an interactive computational environment widely adopted by scientists from many domains. It allows users to combine code, explanatory text, and textual & visual execution results in a single, editable document. The code is divided into blocks called cells that can be individually executed as well as edited. The code written in one cell may depend on code in other cells, and those dependencies are usually codified by global variables. However, these dependencies are not easily identified or discovered, and the order cells were executed in is not easily captured. This leads to a problem with reproducing the results of a previous analysis. Dataflows, on the other hand, explicitly encode dependencies between computational modules, leading to a well-defined order of execution. We introduce dataflow notebooks to blend the benefits of each paradigm. This solution introduces a persistent and unique identifier for each cell and encourages users to explicitly reference outputs of other cells via these identifiers. As will dataflows, we can ensure that results are never out-of-date by recursively updating dependencies as needed. The solution also enables new operations in the notebook, allowing users to see dependencies of a computation as well as selectively update downstream cells when upstream dependencies are modified. This framework improves reproducibility while maintaining the core features of notebooks..
pdf
Patel, J.S. COE Thesis 20171.16 MBDownloadView
CC BY-NC-ND V4.0 Open Access

Metrics

3 File views/ downloads
8 Record Views

Details

Logo image