Dataflow notebooks: enhancing computational notebooks : a thesis in Computer Science

Jay S. Patel

doi:10.62791/19913

Back

Thesis

Open access

Dataflow notebooks: enhancing computational notebooks : a thesis in Computer Science

Jay S. Patel

Master of Science (MS), University of Massachusetts Dartmouth

2017

DOI:

https://doi.org/10.62791/19913

Abstract

Application software -- Development.

Open source software.

Jupyter Notebook is an interactive computational environment widely adopted by scientists from many domains. It allows users to combine code, explanatory text, and textual & visual execution results in a single, editable document. The code is divided into blocks called cells that can be individually executed as well as edited. The code written in one cell may depend on code in other cells, and those dependencies are usually codified by global variables. However, these dependencies are not easily identified or discovered, and the order cells were executed in is not easily captured. This leads to a problem with reproducing the results of a previous analysis. Dataflows, on the other hand, explicitly encode dependencies between computational modules, leading to a well-defined order of execution. We introduce dataflow notebooks to blend the benefits of each paradigm. This solution introduces a persistent and unique identifier for each cell and encourages users to explicitly reference outputs of other cells via these identifiers. As will dataflows, we can ensure that results are never out-of-date by recursively updating dependencies as needed. The solution also enables new operations in the notebook, allowing users to see dependencies of a computation as well as selectively update downstream cells when upstream dependencies are modified. This framework improves reproducibility while maintaining the core features of notebooks..

Files and links (1)

pdf

Patel, J.S. COE Thesis 20171.16 MBDownload View

CC BY-NC-ND V4.0, Open Access

Metrics

3 File views/ downloads

8 Record Views

Details

Title: Dataflow notebooks
Creators: Jay S. Patel
ORCID: 0000-0002-1474-3476
Contributors: David Koop (Advisor) - University of Massachusetts Dartmouth
Haiping Xu (Committee Member) - University of Massachusetts Dartmouth, Department of Computer and Information Science
Ming Shao (Committee Member) - University of Massachusetts Dartmouth, Department of Computer and Information Science
Number of pages: ix, 53 pages
Illustrations: illustrations
Table of contents: Chapter 1 : Introduction -- Chapter 2 : Background. History ; Related work ; Current behavior -- Chapter 3 : Problems with Jupyter Notebook -- Chapter 4 : Solutions. Implementation of persistent and unique identifier ; Improving Reproducibility -- Chapter 5 : Dependency execution. Backward dependency ; Forward dependency ; Implementation ; Dependency graph -- Chapter 6 : Discussions and conclusion. Trade-offs ; Conclusion -- References.
References: Includes bibliographical references (pages 51-53).
Awarding Institution: University of Massachusetts Dartmouth
Degree Awarded: Master of Science (MS)
Degree in: Computer Science
Academic Unit: Department of Computer and Information Science
Language: English
Resource Type: Thesis
DOI: https://doi.org/10.62791/19913
Record Identifier: 9914424911001301