JupyterCon 2023

Computational reproducibility of Jupyter notebooks from biomedical publications
05-11, 15:00–15:30 (Europe/Paris), Louis Armand 1

In this talk, we present a study that analyzed the computational reproducibility of Jupyter notebooks extracted from GitHub repositories associated with publications indexed in the biomedical literature repository PubMed Central. We will present the key steps of the pipeline we used for assessing the reproducibility of Jupyter Notebooks. The study is based on the metadata extracted from 1419 publications from PubMed Central published in 373 journals. From the 1117 GitHub repositories associated with these publications, a total of 9625 Jupyter Notebooks were downloaded for further reproducibility analysis. The code for the pipeline is adapted from Felipe et al., 2019. We will discuss the results of the study, including variables such as programming languages, notebook structure, naming practices, modules, dependencies etc. that we found in these notebooks. We will then zoom in on common problems and practices, highlight trends and discuss potential improvements to Jupyter-related workflows associated with biomedical publications. This talk is aimed at researchers who use Jupyter notebooks to publish their results in public repositories and help them to use best practices while documenting their research. The slides are available via the DOI 10.5281/zenodo.7854503.

A biophysicist interested in integrating open research and education workflows with the web.

ORCID: https://orcid.org/0000-0001-9488-1870

Scholia: https://scholia.toolforge.org/author/Q20895785

The work presented here is the result of an ongoing collaboration with Sheeba Samuel.