JupyterCon 2023

Thijs van der Plas

DPhil student at University of Oxford.

The speaker's profile picture


Reproducible figures for scientific publication
Thijs van der Plas

Reproducible Figures (RFs) are (multi-panel) figures that are produced completely and only by code, and can therefore be reproduced (by anyone). RFs are by definition not compiled or edited by illustration software, whose editing steps cannot be traced in the final product. RFs therefore have the potential to accelerate scientific progress, while putting the scientific principles of rigor and reproducibility at the forefront. Firstly, because a RF is uniquely defined by the code that created it, all underlying data analysis and statistical methods are traceable, aiding its reproducibility as well as facilitating (others) to extend the analysis to different data or parameters. Furthermore, avoiding repetition of figure-making saves time across different versions, projects and people, hence creating a resource for the community. These advantages are recognised by the broader scientific community [Samota & Davey, 2021], and scientific journals such as eLife have recently called for a move towards reproducible figures (e.g., through the use of executable publications) [Maciocci et al., 2019].

Even though modern programming languages and packages possess the capability to create RFs - demonstrated by a growing body of RF-featuring publications*, many scientists (who are not software developers) are unfamiliar, inexperienced or hesitant to use them, while guidelines remain scarce [Lasser, 2020]. In fact, although many advanced visualisation packages exist [Bokeh, 2018; Waskom, 2021], what is often holding back scientists from creating fully RFs is the ability to easily fine-tune basic figure elements. These tasks, such as adding panel labels or aligning elements across panels, are often simple in illustration software, but non-trivial in code.

We’ve created a Python package of functions that perform these basic tasks, essential for creating RFs that should be at the level of scientific publication, based on matplotlib [Hunter, 2007]. Our package includes a complete walk-through tutorial of all functions, as well as demonstrations of common customisation operations (that are too user- or project-specific to warrant new functions). These include; aligning axis limits, eliminating unnecessary and duplicate items, aligning panel labels, customising labels and legends etc.

Hence, our contribution is not a new set of advanced visualisations, but rather short, low-level functions and tutorials that perform actions that are easy in illustration software but unintuitive to many entry-level programmers. Time is wasted on creating manually compiled figures that slow down scientific progress by their lack of reproducibility. This package will enable more scientists, who are not software developers by training, to easily create RFs for their research purposes.

*: For a collection see https://github.com/jupyter/jupyter/wiki#reproducible-academic-publications, as well as publications by the authors:


Bokeh Development Team (2018). Bokeh: Python library for interactive visualization
URL http://www.bokeh.pydata.org.

J. D. Hunter (2007), "Matplotlib: A 2D Graphics Environment", Computing in Science & Engineering, vol. 9, no. 3, pp. 90-95.

Lasser, J. (2020). Creating an executable paper is a journey through Open Science. Communications Physics, 3(1), 1-5. https://www.nature.com/articles/s42005-020-00403-4

Maciocci, G., Aufreiter, M., & Bentley, N. (2019). Introducing eLife’s first computationally reproducible article. eLife Labs, https://elifesciences.org/labs/ad58f08d/introducing-elife-s-first-computationally-reproducible-article

Samota, E. K., & Davey, R. P. (2021). Knowledge and Attitudes Among Life Scientists Toward Reproducibility Within Journal Articles: A Research Survey. Frontiers in Research Metrics and Analytics, 35. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8276979/

Waskom, M. L. (2021). Seaborn: statistical data visualization. Journal of Open Source Software, 6(60), 3021.

Jupyter for Research and Scientific Discoveries
Poster Placeholder