Login Sign up

Monday Oct. 12, 2020, 5 p.m.–Oct. 12, 2020, 5:30 p.m. in Jupyter in Scientific Research

The Care and Feeding of JupyterHub for Climate Solution Models

Denton Gentry

Audience level:
Intermediate

Brief Summary

Imagine a climate solution model, originally constructed as spreadsheet, which grew in success and in scale such the sheer number of files and toil in keeping it all working has become a significant burden. We will discuss the process of re-implementing this model in Python, and the attributes that make JupyterHub well suited for the task.

Outline

This talk covers the two year adventure of a Python re-implementation of ~100 climate solution models originally created in Microsoft Excel. The models show how anthropogenic climate change can be substantially reversed, by reducing and sequestering a trillion tons of CO2-equivalent greenhouse gas emissions through a number of solutions. The Excel versions of these models were used in the publication of the Project Drawdown book in 2017 and The Drawdown Review in 2020.

Though Excel was a fine tool for the time, the needs of the project have grown. Excel worked well when there was one Excel file, and when there were five files, and somewhat less well when there were 30, and then 50, and now at 100 files the sheer degree of toil trying to manually update all of them is daunting. Researchers strongly avoid doing things which would require them to open the other 99 files.

Nonetheless the model methodology in the Excel files is well structured, with separate sheets for major aspects of the model. The effort to reimplement the model in Python began in 9/2018 at https://github.com/ProjectDrawdown/solutions/ and was able to keep the overall structure. Roughly, each sheet in the original spreadsheet has been reimplemented as a Python class performing the same purpose. The Python implementation also brings the modern software infrastructure: extensive tests, distributed versioning, good web support, metrics about the codebase, etc.

An outline of the talk:

  1. Model and methodology

    • Excel sheet -> Python class
    • code generation to extract data and settings from Excel files
  2. Why Jupyter and JupyterHub?

    • scale up in how many researchers are able to contribute
    • JupyterHub account creation: git pull the codebase, each researcher has their own workspace
    • Vega for visualizations, both using Altair Charts and generating Vega directly
    • ipyvolume for visualizations
    • use Voila for the main interface, with JupyterLab also available
  3. Workflow

    • allow the climate solution model work to scale up to larger number of researchers
    • with Excel, merging multiple researcher's work:
    • a. Ask, earnestly and with great conviction, exactly which spreadsheet cells each researcher changed
    • b. Reflect on the life choices that led to this point
    • c. Muddle through
    • with Python, merging multiple researcher's work:
    • a. git pull request
    • Really, only one person can work on an Excel solution at a time
    • This is a core reason for using JupyterHub: the user and account lifecycle matches our needs
  4. Operations

    • scaling: from The Littlest JupyterHub to Kubernetes
    • Voila vs JupyterLab, selectable by user
    • monitoring the service
  5. Testing

    • system tests:
    • a. Selenium test of Jupyter
    • b. automate Excel for tests