Login Sign up

Tuesday Oct. 13, 2020, 4:45 p.m.–Oct. 13, 2020, 5:15 p.m. in Jupyter Community: Tools

ProvBook: Capturing and Visualizing Provenance in Jupyter Notebooks for Reproducibility

Sheeba Samuel

Audience level:
Intermediate

Brief Summary

We introduce ProvBook, an extension of Jupyter Notebooks, to capture, store, describe, visualize, query, and compare the provenance of Jupyter Notebooks for conducting reproducible research. It captures the prospective and retrospective provenance including the execution history of Jupyter Notebooks. It helps in comparing the results from different executions and authors of Jupyter Notebooks.

Outline

Jupyter Notebooks have gained widespread adoption in different scientific fields and education. These notebooks are used in various stages of the research data lifecycle, including data curation, exploration, processing, analyzing, and publication of results. One of the reasons for their widespread adoption is that these notebooks support and enable computational reproducibility. However, recent research (https://doi.org/10.1145/3173574.3173606, https://doi.org/10.1109/MSR.2019.00077) has suggested the need for support of provenance in Jupyter Notebooks. According to the Oxford Dictionary, provenance is defined as "the source or origin of an object; its history". In Jupyter Notebooks, overwriting and re-running of cells in any order can lead to the loss of previous results (provenance information).
We offer an overview of ProvBook, which offers a wide range of features to support provenance in Jupyter Notebooks. The aim of this tool is to support scientists in repeating and reproducing experiments. It also aims to help teachers and students in their daily educational activities by helping them to visualize and compare results. These features include capturing and storing the provenance of the execution of Jupyter Notebooks. It also provides a service to describe and download this provenance information in the Resource Description Framework (RDF) [https://www.w3.org/TR/rdf11-concepts/]. ProvBook provides users a facility to visualize the history of the execution of cells inside Jupyter Notebook. With ProvBook, users can compare the results of different executions, and also compare their own results with the results from the original author of a Notebook. It makes all of these features available through a user-friendly interface in Jupyter Notebook. It is developed as an extension of Jupyter Notebook and provides an easy installation for all the users.