Login Sign up

NBSafety: Fine-Grained Lineage Tracking for Safer Jupyter Notebooks

Stephen Macke

Audience level:
Novice

Brief Summary

Jupyter notebook state is difficult to reason about, inspiring colorful talks such as Joel Grus' "I don't like Notebooks" presented at JupyterCon 2018. To address this problem, we developed a custom kernel, https://github.com/nbsafety-project/nbsafety, that highlights cells that a) are likely unsafe to execute, and b) likely useful to re-execute, all without requiring changes to user behavior.

Outline

Jupyter notebooks have revolutionized the analytical workflows of scientists and engineers. By keeping intermediate program state in memory and segmenting units of execution into so-called “cells”, notebooks allow users to execute their workflows interactively and enjoy particularly tight feedback. However, as cells are added, removed, reordered, and rerun, this hidden intermediate state accumulates in a way that is not necessarily correlated with the code visible in the notebook’s cells, making execution behavior difficult to reason about. To address some of these shortcomings, we present NBSafety, a custom Jupyter kernel and frontend that uses a combination of runtime tracing and static analysis to automatically manage lineage metadata associated with cell execution and global state. NBSafety highlights cells that, if executed, would likely give counter-intuitive results due to stale references. At the same time, NBSafety also highlights cells that would resolve staleness issues.

Thesis and key takeaways: Our thesis is that NBSafety helps reduce cognitive overhead associated with automatically managing Jupyter notebooks' hidden, global state, and does so in a way that does not require users to give up the notebooks they already know and love. By analyzing a collection of more than 2000 real Jupyter sessions, we show that, indeed, a) the cells highlighted by NBSafety as unsafe are, in fact avoided by users, and b) the cells highlighted by NBSafety as re-execution suggestions are, in fact, often picked for re-execution.

Background knowledge: anybody who has used notebooks before will be able to follow our presentation and understand the benefits provided by our software, though some background in static program analysis may be helpful for understanding the more technical aspects.

Goals: After visiting our poster, we hope to convince audience members that NBSafety helps reduce some of the error-proneness in Jupyter notebooks identified by Joel Grus at JupyterCon 2018.

Our FOSS code is available at https://github.com/nbsafety-project/nbsafety, and we are excited to share it with the community, as we believe it represents an important step in making Jupyter notebooks easier to work with than they already are. Based on our personal experience, we think that NBSafety is "The Jupyter kernel you didn't know you needed".

Finally, here is a video where we presented an earlier version of NBSafety at the spring 2020 UC Berkeley RISELab retreat: https://www.dropbox.com/s/3okag0hvo160nli/nbsafety-stephen-macke.mp4?dl=0