Login Sign up

The cloud is the limit: scaling limits of JupyterHub

Min Ragan-Kelley

Audience level:

Brief Summary

JupyterHub began as a strictly scoped project for "small-scale" Jupyter deployments. Times change. With increasing pressure to scale JupyterHub to ever larger deployments, the current architecture is reaching its limits. See an overview of the current factors limiting JupyterHub's scale what would need to change to make JupyterHub truly scalable.


In 2015, we created JupyterHub with a primary use case in mind: a shared workstation in a research group with tens of users. From day one, we explicitly declared large, highly available, scalable deployments out of scope. And then we started having users :) Now in 2020, Zero to JupyterHub with Kubernetes is one of the main mechanisms folks use to install JupyterHub, and deployments routinely have several hundred concurrently active users. But there are still relatively modest limits to what a single JupyterHub deployment can handle and there are regular inquiries about even larger scale and "high availability" deployments.

However, the current architecture makes some choices in the interest of simplicity that are at odds with modern design practices for scalable applications. We will present an overview of those components of JupyterHub and the work needed to bring JupyterHub into the next level of scale and stability.

After this poster, you will have an understanding of why JupyterHub works the way it does, where the limits are for scalability, and what needs to be done to make it highly available and scalable.