JupyterCon 2023

No Magic Added - Deploying Multiple JupyterHubs to Multiple Clouds from one Repository
05-10, 15:00–15:30 (Europe/Paris), Gaston Berger

The International Interactive Computing Collaboration (2i2c) manages the configuration and deployment of multiple Kubernetes clusters and JupyterHubs for a range of research and education purposes, spanning not only domains, but the globe. For the sake of optimising our engineering team’s operations, we manage these deployments from a single, open infrastructure repository. This presents a challenging problem since we need to centralise information about a number of independent cloud vendors, and independent JupyterHubs whose user communities are not necessarily related. Given that each hub has an independent user base, this centralisation must not come at the cost of a community being unable to extricate their JupyterHub configuration from 2i2c’s infrastructure and deploy it elsewhere, as detailed by our Right to Replicate.

In this talk, we will discuss a recent overhaul of 2i2c’s tooling that facilitates the centralisation of information and optimal operation of the engineering team, whilst protecting a community’s Right to Replicate their infrastructure. Critical to protecting the Right to Replicate is a configuration schema for both clusters and JupyterHubs, where these files should live in the repository, and how the contents should be structured. Each individual JupyterHub we deploy is defined by its own individual set of configuration files which enables simple extrication from the repository, and they can be deployed independently with a basic command. There is no added magic in the rest of 2i2c’s specific tooling that would prevent this.

Further tooling to optimise the deployment and management of these JupyterHubs for 2i2c’s engineering team includes:

  • A Python “deployer” module that knows how to read the configuration for a given JupyterHub on a given cluster and can perform an upgrade action
  • A function within the deployer module that can extrapolate which JupyterHubs on which cluster require an upgrade from a list of changed files in the repository (e.g. from a Pull Request)
  • A GitHub Actions workflow that can deploy to multiple clusters in parallel, deploy production JupyterHubs in parallel, implement Canary deployments using staging JupyterHubs, and intelligently prevent a Canary deployment failure affecting the deployments on an unrelated cluster

Details of these efforts were first published in the “Tech update: Multiple JupyterHubs, multiple clusters, one repository” blog post.

Sarah Gibson is an Open Source Infrastructure Engineer at 2i2c, an open source contributor and advocate. She holds more than two years of experience as a Research Engineer at a national institute for data science and artificial intelligence, as well as holding a core contributor role in the open source projects Binder, JupyterHub, and the Turing Way. Sarah is passionate about working with domain experts to leverage cloud computing in order to accelerate cutting-edge, data-intensive research and disseminating the results in an open, reproducible and reusable manner. Sarah holds a Fellowship with the Software Sustainability Institute and advocates for best software practices in research. She is a member of the mybinder.org operating team and maintains infrastructure supporting a global community in sharing reproducible computational environments. She has also mentored projects through two cohorts of the Open Life Science programme, imparting lived experience of her skills participating and leading in open science projects.

This speaker also appears in:

Co-founder at 2i2c.org. Ex Wikimedia, ex GNOME. On a motorcycle or watching star trek or texting someone when not on a computer. Death to accidental complexity.


This speaker also appears in:

Georgiana Dolocan is an Open Source Infrastructure Engineer at 2i2c and a JupyterHub team member.
Georgiana cares about building inclusive communities and open work practices. She served in the JupyterHub Contributor in Residence role, after getting involved with the community though an Outreachy internship. She has now switched roles and mentored an Outreachy intern, using her own experience in this position to grow the community.
You can follow Georgiana's work on GitHub at @GeorgianaElena.

This speaker also appears in:

Father, Software Developer, Quant, (formerly) Biochemist, and some other things ;-) Currently living between Córdoba and Buenos Aires, Argentina. I have made some contributions to popular Open Source projects such as Jupyter, Nikola, and Bokeh. I have also started several projects being RISE (a “live” slideshow for the Jupyter notebook) the most popular one. You can easily find videos of some of my talks and tutorials at multiples national and international conferences. How can I help?