Abstract
Jupyter Notebook is an interactive computational environment widely adopted by scientists from many domains. It allows users to combine code, explanatory text, and textual & visual execution results in a single, editable document. The code is divided into blocks called cells that can be individually executed as well as edited. The code written in one cell may depend on code in other cells, and those dependencies are usually codified by global variables. However, these dependencies are not easily identified or discovered, and the order cells were executed in is not easily captured. This leads to a problem with reproducing the results of a previous analysis. Dataflows, on the other hand, explicitly encode dependencies between computational modules, leading to a well-defined order of execution. We introduce dataflow notebooks to blend the benefits of each paradigm. This solution introduces a persistent and unique identifier for each cell and encourages users to explicitly reference outputs of other cells via these identifiers. As will dataflows, we can ensure that results are never out-of-date by recursively updating dependencies as needed. The solution also enables new operations in the notebook, allowing users to see dependencies of a computation as well as selectively update downstream cells when upstream dependencies are modified. This framework improves reproducibility while maintaining the core features of notebooks..