JupyterCon 2023

To see our schedule with full functionality, like timezone conversion and personal scheduling, please enable JavaScript and go here.
05:00
05:00
90min
Wednesday, 1st day
Louis Armand 2
08:00
08:00
60min
Badges
Gaston Berger
09:00
09:00
15min
intro remarks

Intro remarks

MISC
Gaston Berger
09:15
09:15
45min
GitHub Keynote
Craig Peters, Cory Gwin

In the Keynote you will learn about how GitHub expands the reach of the amazing Jupyter technologies

Keynotes
Gaston Berger
10:00
10:00
30min
Break
Gaston Berger
10:00
30min
Break
Louis Armand 1
10:00
30min
Break
Louis Armand 2
10:30
10:30
30min
Creating interactive Jupyter websites with JupyterLite
Jeremy Tuloup

Jupyter notebooks are a popular tool for data science and scientific computing, allowing users to mix code, text, and multimedia in a single document. However, sharing Jupyter notebooks can be challenging, as they require installing a specific software environment to be viewed and executed.

JupyterLite is a Jupyter distribution that runs entirely in the web browser without any server components. A significant benefit of this approach is the ease of deployment. With JupyterLite, the only requirement to provide a live computing environment is a collection of static assets. In this talk, we will show how you can create such static website and deploy it to your users.

We will cover the basics of JupyterLite, including how to use its command-line interface to generate and customize the appearance and behavior of your Jupyter website. This will be a guided walkthrough with step-by-steps instructions for adding content, extensions and configuration.

By the end of this talk, you will have a good understanding of how JupyterLite works and how you can use it to create interactive websites.

Outline:

  • Introduction to Jupyter and JupyterLite
  • Examples of JupyterLite used for interactive documentation and educational content (NumPy, Try Jupyter, SymPy)
  • Step-by-step demo for creating a Jupyter website
    • Quickstart with the demo repository
    • Adding content: notebooks, files and static assets
    • Adding extensions to the user interface
    • Adding packages to the Python runtime
    • Customization and custom settings
  • Deploy JupyterLite as a static website on GitHub Pages, Vercel or your own server
  • Conclusion and next steps for learning more about the Jupyter ecosystem

The talk will be based on resources already publicly available:

  • try JupyterLite in your browser: https://jupyterlite.github.io/demo/
  • the JupyterLite documentation: https://jupyterlite.readthedocs.io/en/latest/quickstart/deploy.html
  • the JupyterLite repositories: https://github.com/jupyterlite
Community: Tools and Practices
Gaston Berger
10:30
150min
Dask and Distributed Computing
Matthew Rocklin, Hendrik Makait

Learn how to run the Python data science ecosystem in parallel with Dask in this hands-on tutorial.

Dask is an open source library for parallel computing in Python with deep integration with common Python libraries like numpy, pandas, xgboost, pytorch, xarray, and of course Jupyter. In this hands-on tutorial we will launch clusters of distributed machines, and use those clusters to process and analyze data on the cloud.

Students should be mildly familiar with Python and Pandas syntax, and be interested in the challenges of large scale computation.

Tutorial
Room 3 (Tutorial)
10:30
55min
Let the right one in: Custom Authentication in JupyterHub
Steen Manniche

JupyterHub provides a customizable authentication system for managing user access, but its default authentication methods may not always be suitable for all organizations. This talk will discuss the capabilities of JupyterHub for custom authentication, and will provide examples and best practices for customizing the user environment based on Identity providers (IdPs) and attributes hosted with them. The talk will dive into managing custom authentication with JupyterHub and using various IdPs to integrate JH into larger organizations.

If you'd like to follow along during the talk, bring a laptop with git, docker and docker-compose installed and clone the repository from https://gitlab.com/adamatics/let-the-right-one-in

Highlights of the talk:

  • A brief introduction to the JupyterHub authentication module
  • Integrating AD/LDAP and Oauth2 with JupyterHub
  • Benefits of SSO authentication in JupyterHub, including improved security and user experience
  • Best practices for customizing the spawning of the user environment based on the authentication process
Enterprise Jupyter Infrastructure
Room 1
10:30
30min
Taipy or how to build stunning Python Applications from your Jupyter Notebooks
Vincent Gosselin, Marine Gosselin

This talk presents Taipy, a new low-code Python package that allows you to create complete Data (Science) applications, including graphical visualization and the management of data, algorithms, models, and pipelines.
It is composed of two main independent components:
* Taipy Core
* Taipy GUI.

Taipy GUI enables any Python developer to build highly interactive graphics in no time.

Similarly, Taipy Core provides a natural way to develop and create your pipelines with Caching, Versioning, Scenario management, etc.

Even though Taipy Core and Taipy GUI can be used independently, they can be combined to develop very quickly powerful apps from your Jupyter Notebooks.

Sponsored
Louis Armand 1
10:30
30min
nbQA- run any standard Python code quality tool on a Jupyter Notebook
Marco Gorelli

There exists a tonne of Python code quality tools which can help catch bugs and fix stylistic issues. Unfortunately, the vast majority don't work out-of-the-box on Jupyter Notebooks. nbQA is a tool which addresses this issue, by allowing any standard Python code quality tool to be run on a Jupyter Notebook.

In this talk, you will learn:
- how to use nbQA;
- how nbQA works, and what its limitations are;
- how you can use nbQA to run your own custom tools.

Community: Tools and Practices
Louis Armand 2
11:00
11:00
30min
Best practices for building docker images for use with a JupyterHub
Yuvi Panda

Building & maintaining the docker image used whenever a user logs into a JupyterHub is one of the most time consuming, hard yet rewarding parts of running a JupyterHub. Done right, it can wow users - "YOU CAN DO THAT?!". But if not properly designed, something as simple as adding a new python package can turn into a multi-week ordeal that breaks everything, and makes the maintainer of the image hate computers. A lot of the industry's docker image advice needs to be modified for use when building docker images for use with JupyterHub, as they are meant to have arbitrary code executed in them. General docker advice often does not cover our use cases at all (who else is putting fortran into docker?).

This talk summarizes lessons learnt in building a wide variety of images for JupyterHubs over the years. It will cover:

  1. Building the simplest possible image that can work with a JupyterHub
  2. Best practices for installing python & most python-related packages inside the image, and why (hint: use mamba)
  3. When to base off a community built docker image (such as rocker, pangeo or jupyter-stacks), and the various tradeoffs associated with it.
  4. Basic maintenance that must be performed on an image periodically so a simple package install doesn't turn into a 3 week nightmare
  5. Suggestions for automatic building & testing with CI / CD and mybinder.org
  6. Best practices for including R in your image
  7. Best practices for running non-Jupyter frontends (such as RStudio, virtual linux desktop, code-server, etc) in your JupyterHub

The audience would be able to walk away with much better knowledge on how to maintain a docker image for use with JupyterHub in a way that keeps both the users and the maintainers of the image happy.

Community: Tools and Practices
Louis Armand 2
11:00
30min
MyST Markdown: Using notebooks in scientific publishing workflows
Rowan Cockett

We introduce mystjs (http://js.myst-tools.org/), a set of open-source, community-driven tools designed for scientific communication, including a powerful authoring framework that supports blogs, online books, scientific papers, reports and journals articles.

The MyST (Markedly Structured Text) project has grown out of the ExecutableBooks team, which has been working on MyST Markdown and JupyterBook as new ways to publish notebooks. Originally based on Sphinx and RST, over the past year the ExecutableBooks team has been working on a MyST Specification to coordinate development of the markup language and extensions across multiple languages & parsers (e.g. implementations in Python & Javascript). The new mystjs libraries run directly in the browser, opening up new workflows for components to be used in web-based editors, directly in Jupyter and in JupyterLite. The libraries work with current MyST Markdown documents/projects and can export to LaTeX/PDF, Microsoft Word and JATS as well as multiple website templates using a modern React-based renderer. There are currently over 400 scientific journals that are supported through templates, with new LaTeX templates that can be added easily using a Jinja-based templating package, called jtex.

In our presentation we will give an overview of the MyST ecosystem, how to use MyST tools in conjunction with existing Jupyter Notebooks, markdown documents, and JupyterBooks to create professional PDFs and interactive websites, books, blogs and scientific articles. We give special attention to the additions around structured data, standards in publishing (e.g. efforts in representing Notebooks as JATS XML), rich frontmatter and bringing cross-references and persistent IDs to life with interactive hover-tooltips (ORCID, RoR, RRIDs, DOIs, intersphinx, wikipedia, JATS, GitHub code, and more!). This rich metadata and structured content can be used directly to improve science communication both through self-publishing books, blogs, and lab websites — as well as journals that incorporate Jupyter Notebooks.

Our talk is aimed at attendees who are looking to communicate their Jupyter Notebooks, markdown documents or write scientific articles/papers. Our talk will assume basic familiarity with command-line tools and markdown. Throughout the presentation we will practically demonstrate the myst command-line interface to work with Jupyter Notebooks & markdown articles to create PDFs & Word documents, add citations & cross-references, and deploy modern websites with multiple themes that preserves the structured content.

Community: Tools and Practices
Gaston Berger
11:00
30min
Rapidly prototyping and deploying powerful data applications in Jupyter using Panel and Lumen
Philipp Rudiger

The Jupyter ecosystem provides a powerful platform for iteratively developing and deploying data applications and dashboards. In this talk we will discover how to leverage Panel and Lumen to develop data applications by iterating in a notebook, previewing and finally deploying it to a JupyterHub or to an external cloud provider - all without leaving the JupyterLab UI. Along the way we will explore best practices for structuring these applications and make them performant. We will also explore the integration of these tools with the rest of the Jupyter ecosystem, e.g. by leveraging Jupyter Widgets, and deploy them in applications running outside Jupyter on the Panel server. In doing so we will discover the power of Jupyter not just for exploratory workflows but for sharing complex and rich data applications with the world.

Developing & Iterating

In the first section we will go over the development of a rich and powerful data application in a notebook. We will demonstrate how we can quickly build and preview individual components by displaying them inline and previewing the entire application using a JupyterLab extension.

Once we have built the application we will go over recommendations for structuring the application to make them easier to maintain but also to achieve a good look and feel and get the best performance out of the application.

Jupyter Widgets integration

Next we will go over the integrations of the Panel and Lumen stack with the Jupyter Widgets ecosystem. The ability to leverage this existing ecosystem allows users to leverage the tools they love and allows taking Jupyter Widgets out of Jupyter and ship them in a standalone application.

Deployment

Lastly we will demonstrate how we can easily deploy Panel applications in a JupyterHub environment using existing extensions like CDSDashboards and how we can begin to extend this with Jupyter extensions that provide UIs to deploy to a cloud provider.

Enterprise Jupyter Infrastructure
Louis Armand 1
11:30
11:30
30min
Autoreload in Production at Meta
Omer Dunay, Omer Dunay

An underappreciated aspect of Jupyter and IPython experiences in general is their ability to autoreload Python modules during running sessions via the autoreload extension. At Meta, we began leveraging this functionality to power interactive test sessions that allow software engineers to quickly iterate on their projects without waiting for slow restart times.

However, the base autoreload algorithm suffers from a number of reliability issues and can easily crash, thereby necessitating a costly restart. In this presentation, we’ll describe references reload, which is our new and improved autoreload algorithm with a number of benefits over the basic autoreload functionality. We’ll show how we use references reload in production to save developers hours of time, and we’ll close with a concrete use case for development on top of Bento server, which is Meta’s internal version of the Jupyter notebook server.

Enterprise Jupyter Infrastructure
Louis Armand 1
11:30
30min
Interactive data exploration in a notebook with hvPlot
Maxime Liquet

hvPlot is a Python package part of the HoloViz suite of tools. Its original design was based on reproducing and extending the familiar and effective Pandas .plot API, allowing regular data practitioners to have easy access to powerful, but somewhat difficult to use, features offered by the HoloViz tools. This includes handling large data with Datashader, geographic data with GeoViews and many features of HoloViews such as its support of different plotting backends (Bokeh, Matplotlib, Plotly). hvPlot is nowadays no longer limited to its .plot API and provides two other and newer functionalities dedicated to making data exploration easier. With the .interactive API, Panel/IPyWidgets widgets can be injected into a processing pipeline (e.g. a pipeline of Pandas or Xarray methods) to interactively control its parameters; when a widget value changes .interactive takes care of re-evaluating the pipeline and updating its output. The latest addition to hvPlot is the Explorer, a graphical interface that offers a low-code experience to data exploration and with which it is simple to create customized plots, selecting the plot options directly from widgets. This talk will focus on describing in more details .interactive and the Explorer, and will then show how they can be used together with the .plot API to set up an approachable and interactive workflow to data exploration in a notebook.

Data Science
Louis Armand 2
11:30
30min
Navigating the Jupyter Landscape
Jeremy Tuloup, Johan Mabille

The Jupyter ecosystem is vast and complex, with many different projects and libraries that work together to support interactive computing and data science. In this talk, we will navigate and explore the Jupyter ecosystem, highlighting the key projects and libraries that make up the ecosystem and discussing how they relate to each other.

We will start by introducing the core Jupyter projects, including the Jupyter Notebook and JupyterLab, and explaining how they provide a platform for interactive computing and data visualization. We will then discuss some of the key sub-projects within the Jupyter ecosystem, such as JupyterHub for enabling multi-user access to notebooks and nbconvert for converting notebooks to other formats, and we will explain how they fit into the overall landscape of Jupyter.

Next, we will delve into the underlying projects and libraries that make Jupyter and its related projects possible, such as the Jupyter server, the core APIs projects and the traitlets library. We will discuss the different protocols used for communication between the applications and the kernels, and show how it makes Jupyter agnostic to the language. We will also cover some of the key technologies used by Jupyter and its related projects, such as the Tornado web framework or the ZeroMQ messaging library, and we will explain how these technologies fit into the Jupyter landscape.

Throughout the talk, we will provide examples of how these tools and technologies can be used in practice and discuss the latest developments and future directions of the Jupyter ecosystem. By the end of the talk, attendees will have a better understanding of the Jupyter ecosystem and how its various projects and libraries fit together to enable interactive computing and data science.

Outline:

  • Introduction to the Jupyter ecosystem
  • Overview of the core Jupyter projects (Jupyter Notebook, JupyterLab)
  • Overview of key sub-projects within the Jupyter ecosystem (JupyterHub, nbconvert)
  • Explanation of the underlying projects and libraries used by Jupyter (Jupyter server, jupyter_client, traitlets)
  • Introduction to the Jupyter protocol and the widget protocol
  • Overview of key technologies used by Jupyter and its related projects (ZeroMQ, Tornado framework)
  • Discussion of the latest developments and future directions of the Jupyter ecosystem
  • Conclusion and next steps for learning more about the Jupyter ecosystem
Community: Tools and Practices
Gaston Berger
12:00
12:00
30min
How to grow the JupyterHub community and improve its practices by mentoring Outreachy interns
Sarah Gibson

In 2021, JupyterHub was awarded a CZI EOSS grant to improve community practices around inclusion within the project, and that work began in earnest in 2022. An important part of this work involves developing pathways into the community that cater for i) contributors that are diverse and bring a new perspective that is not already represented in our community; and ii) contributors beyond the “burnt out PhD” archetype that is prevalent throughout the landscape of open source scientific software.

One strategy we employed from the start of the grant-writing process was to secure funding for four rounds of Outreachy, with two interns per round, over the grant duration of two years. Outreachy is a mission-aligned organisation dedicated to placing interns from backgrounds that are underrepresented in tech, into open source projects. The mentorship these interns receive is the bedrock on which sustainable entry-level pathways into the community can be built. Since Outreachy supports more than only coding projects, we can also provide other pathways into the community that do not rely on being a “coder” or “software developer”.

This kind of “Mountain of Engagement” work is important to any community-led project, whether within the Jupyter ecosystem or beyond, and as such we have been capturing lessons learned in a guide as we go. This will ensure that the process of participating in Outreachy as a community is a little more repeatable with each round, and provide clear pathways for other community members to become involved in the processes after the term of the grant. We also hope that by sharing our experiences, this resource becomes usable by other Jupyter subprojects, or elsewhere, to begin their own internship initiatives.

  • Repository: https://github.com/jupyterhub/outreachy
  • Website: https://jupyterhub-outreachy.readthedocs.io

By the time JupyterCon 2023 arrives, JupyterHub will have completed the first Outreachy round funded by the CZI grant. We have already learned, and will continue to learn, a great deal around the processes required for running these internships, which we have captured in the above guide. During this talk, we will discuss some strategies the JupyterHub team implemented during this initial round, such as:

  • Establishing partnerships with other mentoring organisations, such as Open Life Science, to deliver support through mentor training and cohort calls for interns
  • Developing processes during the Outreachy contribution period to manage and evaluate applications
Community: Tools and Practices
Gaston Berger
12:00
30min
IPyflow: Supercharging Jupyter with Dataflow-Awareness
Stephen Macke

In this talk, I'll introduce IPyflow, which is my stab at addressing the issue of statefulness in Jupyter notebooks. IPyflow is a Jupyter kernel designed from the ground up to annotate each symbol appearing in program text with metadata pertaining to how it relates to every other symbol.

This dependency graph is then leveraged to provide several interesting features, including:
- execution suggestions, for keeping the notebook state in sync with the code;
- dynamic backward slicing, to show the minimal code needed to reconstruct each symbol; and
- dynamic forward slicing, used to enable reactive execution.

Finally, I will also discuss IPyflow's reactive syntax extensions, which allow for opt-in reactivity at the granularity of individual symbols, as well as its interoperability with other libraries and frameworks such as ipywidgets and stickyland. I'll end the talk by showing how IPyflow can be used with the aforementioned projects to build fully reactive and interactive dashboards on top of JupyterLab.

Data Science
Louis Armand 2
12:00
30min
Tétras Lab : an open source platform propulsing notebooks as Web applications
David Rouquet, David Beniamine

Tétras Lab is an open source platform built around Jupyter Lab. Its goal is to provide an easy to deploy infrastructure that allows fruitful interactions between stakeholders, data scientists and developers while working on decision support systems.

The platform is composed of a Docker stack with the following containers :

  • a Django application to manage users and permissions,
  • a Jupyter Lab instance to provide an integrated IDE,
  • a Voilà container to allow live kernels when accessing dashboards,
  • worker containers for housekeeping tasks, data management, etc.
  • as many database containers, of as many flavors as needed.

The results of notebooks can be rendered in a polished environement for the stakeholders with the following modalities :

  • precomputed notebooks with nbconvert,
  • live notebooks with Voilà that can use Dash or Panel technologies,
  • and we plan to add the support for pure Dash application deployment for use cases that need a better scalability.

The platform is used in production for several clients and in different contexts such as :

  • monthly updated dashboards for marketing evaluation based on open data and sales forecast,
  • hourly updated Web application using statistical models to triggers warnings about landslide and rockfall risks around mountain roads.

Although it was first developped for our internal needs, we believe that the platform can be of interest for organisations that need an easy to deploy environment, that can deliver production ready solutions, for data science and business intelligence projects.

The code and resources are publicly available under the GNU AGPLv3 licence : https://gitlab.com/tetras-lab

A public demo illustrating the capabilities of the Tétras Lab platorm will be deployed for demo and testing during the conference. Meanwhile, a demo of the application for landslide alertness is available : https://sigale.pinea.sage-ingenierie.com/dashboard/public/yrnhmikkticoepshxnmaibnrsslifkxljgbpzltdudlsneeazeanmwnbrtlcjizq.

Enterprise Jupyter Infrastructure
Louis Armand 1
12:30
12:30
90min
Lunch
Gaston Berger
12:30
90min
LUNCH
Room 1
12:30
60min
lunch
Room 2 (Tutorial)
12:30
30min
How JupyterLab 4 is strategic to Two Sigma (and you)
Diego Torres Quintanilla

Two Sigma's financial scientists spend many of their most productive hours testing their hypotheses about the market in JupyterLab. Like with any research environment, responsiveness and easy collaboration are key to keep our users focused on their hypotheses, not the tools they use.

In this talk I will discuss Two Sigma's partnership with QuantStack's team to deliver direct-to-upstream contributions to improve JupyterLab's performance and furtheir its Real-Time Collaboration (RTC) initiative, all to be soon seen in JupyterLab 4.

Beyond just a teaser of new features to get excited for, this talk is a testimonial of a fruitful and successful relationship between two private entities, Two Sigma and QuantStack, which not only furthers each partner's own internal goals but also improves quality of life of all Jupyter users.

Key takeaways

  • Get excited for JupyterLab 4, which will bring large performance improvements throughout the whole application.
  • JupyterLab 4 will pave the way for a Real-Time Colalboration mode in JupyterLab, similar to Google Docs. Come join the effort!
  • For small and medium-size organizations, an external partnership with a team like QuantStack can help them exponentiate their impact in the open-source community.
  • Making sure your organization runs on the latest version of Jupyter is key to shipping your contributions directly to your users.
  • JupyterLab's extensibility is a great way to deliver value to an organization's internal users. But it can also undermine your organization's ability to stay current with open-source.

Call to action

Two Sigma, QuantStack, and many other members of the Jupyter community are actively contributing to Jupyter's general performance and making Real-Time Collaboration a reality. Use the lessons you learn in this talk to get your organization involved, or get involved yourself!

Audience

This talk is aimed at 1. Jupyter users looking to hear about new and exciting features and 2. users of Jupyter at private entities who are interested in ways to get their organization involved in open-source development. You do not need any specific knowledge of how Jupyter works to get value from this talk. If you're interested in contributing to Jupyter, the talk will also give you an idea of features that are being actively developed by the project's contributors and hopefully get you excited about contributing!

Sponsored
Louis Armand 1
12:30
30min
MLOps made easy and reproducible with Jupyter and containers
Sune Askjaer, Subramaniam, Richard Nemeth

While training machine learning (ML) models is the subject of countless MOOCs and web tutorials, productionizing and operating ML models is usually left to the big commercial players or expert users. By leveraging the Jupyter environment and ecosystem, we describe a method to democratize the productionization of ML models while making the process transparent for the casual user.

ML models have become an essential tool for organizations across almost all industries, providing valuable insights and predictions based on data. However, deploying and maintaining ML models can be challenging due to the complex and often dynamic nature of the compute environments they require.

This talk will discuss the benefits of using containerization to manage the compute environments of ML models in JupyterLab and MLFlow, and how it can help organizations make their ML operations more democratic, efficient, and reliable over the long term.

Containerization offers a powerful solution for managing the compute environments of ML models and, combined with tools such as Cookiecutters and MLFlow, making it easier for organizations to deploy and maintain their ML operations over time. By adopting containerization both during model development, training and deployment and integrating open-source tools and services, organizations can better manage and trust the ML models used in their business.

Data Science
Louis Armand 2
13:00
13:00
60min
Lunch
Louis Armand 1
13:00
60min
Lunch
Louis Armand 2
13:00
60min
lunch
Room 3 (Tutorial)
13:30
13:30
150min
Security Tutorial/Discussion
Rick Wagner

Are you interested in Jupyter Security around deployments ? In this tutorial we will discuss the best practices around Jupyter Security. We will setup Jupyter Hub stating with unsecure default, try to man in the middle our deployment, and slowly secure it.

While the tutorial attendance is comprised in the conference pass, we ask you to register for this tutorial on https://www.jupytercon.com/tickets as the seats available are limited.

Tutorial
Room 2 (Tutorial)
14:00
14:00
30min
Beyond Papermill: A New Notebook Executor For Running Notebooks in Production
Eduardo Blancas

Papermill has become a widely used tool for executing Jupyter notebooks. Teams use papermill for many production use cases, such as scheduled report generation, model re-training, etc. However, since papermill relies on spinning up a second process with the IPython kernel to execute code, it has several drawbacks when used in production.

This talk will introduce an alternative notebook executor that powers Ploomber, a popular open-source orchestration framework. This new executor runs notebooks in a single process, allowing us to provide capabilities for production workloads, such as interactive debugging with pdb and notebook profiling (CPU, and memory usage). The executor is integrated into the Ploomber project and can also be used from the command line, like papermill.

Some experience with Jupyter (notebook or lab) and the terminal is required. Experience with papermill is optional.

Outline:
- [0 - 2 minute] Introduction papermill
- [2- 6] Papermill's drawbacks
- [6 - 10] Running notebooks in a single process
- [10 - 16] Debugging notebook execution with pdb
- [16 - 22] Profiling notebooks
- [22 - 26] Orchestrating notebook pipelines in production with Ploomber
- [26 - 28] Summary and conclusions
- [28 - 30] Q&A

Community: Tools and Practices
Louis Armand 2
14:00
30min
Boost productivity with generative AI and scalable development using Jupyter Notebooks from anywhere
Giuseppe Angelo Porcelli

AI/ML can make you a disruptive innovator in any industry. As an active sponsor and contributor to Project Jupyter, our goal at AWS is to help Jupyter become the best possible notebook platform for data science and machine learning. Join this session to learn about the latest innovations you can use anywhere your Jupyter Notebook runs to accelerate your development productivity. You will also learn how to increase ML development productivity by 10x with collaborative, and fully managed notebooks on Amazon SageMaker, providing the most comprehensive set of tools and fully managed infrastructure to help you build, train, and deploy ML models at scale.

Sponsored
Louis Armand 1
14:00
30min
Capytale: a case of large-scale use of jupyter notebooks in education
Nicolas Poulain

The Capytale project, developed by the "Académie de Paris" (the Paris school district) and the help of the "Académie d'Orléans-Tours" is an online education platform for teaching the basics of programming to high schoolers.

Capytale provides several web-based interfaces well suited for the various use cases of the high-school curriculum, one of which is the Jupyter notebook.

Now adopted by most school districts in the country, the notebook service provided by Capytale serves over 80,000 users per week.

The deployment model of Capytale, built upon Pyodide and the Jupyter notebook user interface, proved remarkably reliable and scalable, allowing tens of thousands of concurrent user sessions with small hosting requirements.

In this presentation, we detail the technical constraints and pedagogical motivations that lead to this specific design and discuss ongoing and future improvements to the platform.

Enterprise Jupyter Infrastructure
Gaston Berger
14:00
55min
Visualizing live data pipelines in JupyterLab with ipydagred3
Tim Paine

Dataflow graphs have become indispensable tools for data science, from ETL batch processes to live streaming data pipelines. A variety of tools exist for constructing and scheduling graphs, but few generic tools exist for visualizing them, and even fewer let you analyze and interact with them from inside a notebook.

In this talk, we will discuss ipydagred3, a Jupyter Widgets wrapper around the dagre graph layout engine and the popular charting framework D3.js. We will use this framework to visualize a variety of static graphs, then interact with these graphs my pushing mutations from both python and javascript. We will then build a real-world example using a popular streaming dataflow framework, and show how ipydagred3 can integrate to provide a performant, intuitive interface to the underlying graph engine.

Audience: Jupyter - Novice, familiarity with any graph engine recommended e.g. Apache Airflow, Dask, networkx, etc

Community: Tools and Practices
Room 1
14:30
14:30
30min
Building on Jupyter at Databricks
Jason Grout, Florian Wetschoreck

The Databricks Notebook, used by thousands of organizations worldwide, recently adopted Jupyter standards and software to power a number of features. We now execute Python code using ipykernel, support ipywidgets (including custom widgets), and have improved compatibility with the Jupyter notebook format. In this talk, we will discuss lessons we learned customizing and integrating Jupyter in our enterprise product, which has some different assumptions from the full Jupyter stack. For example, Databricks sandboxes custom Jupyter widget code with iframes for security, which complicates kernel communication. We encode Databricks-specific visualizations in exported Jupyter notebook files in a way that is compatible with other Jupyter tooling. Also, in Databricks, document state lives on the server, which changes how Jupyter kernel messages are processed.

We also offer some observations about how to help Jupyter be more flexible in enterprise contexts.

This talk is for intermediate to advanced developers/administrators wanting to customize or build on Jupyter standards or software to deploy in an enterprise context.

Enterprise Jupyter Infrastructure
Gaston Berger
14:30
30min
Thebe - add Jupyter-based interactive computing to modern websites
Steve Purves

thebe let's you easily add interactive visualisations, reproducible figures and interactive code editors to any publication on the web -- backed by a kernel from a Jupyter server or an in-browser WASM kernel with thebe-lite.

It’s a compact and versatile tool that is easy to add to any static HTML page where, by default code blocks are turned into editors and made ready for execution, just by adding a couple of additional script tags. So in contrast to nbconvert or nbviewer, thebe enables the interleaving of Jupyter input cells and output areas with other content (e.g. from myst markdown files) in flexible ways while maintaining the ability to run the underlying notebooks or individual code cells. This enables rich reading experiences that no longer have to be linear in flow or look anything like a notebook.

Over the last year, thebe has seen major upgrades making it more flexible and usable in different web contexts. Whilst keeping the original “drop it on the page” mode of operation for static websites, it has been refactored to be less opinionated, allow for more control over the UI and provide a simplified API for server, session and notebook management, as well as integrating with jupyterlite for enabling WASM based kernels. These changes also make it much easier to use thebe with modern javascript frameworks like React, Vue and Next.js.

The project provides:

  • thebe - the original drop-in library with improved event management and configuration.
  • thebe-core library - an out-of-core refactor of server, session/kernel and notebook management code in a typescript API, giving developers full control of UI interaction.
  • thebe-lite library - an extension library that enables in-browser computation via jupyterlite & pyodide.
  • Now written in typescript and utilises the latest services from JupyterLab and ipywidgets 8.

In this presentation, we’ll introduce thebe, its new sub-packages and their capabilities, explaining where each is best applied through example use cases. We’ll then walk-through how to get started using thebe to get computational content from Jupyter notebooks into an interactive website with a custom layout. We’ll also demonstrate how easy it is to enable in-browser computation via jupyterlite/pyodide, discussing current capabilities and future direction.

Our presentation is aimed at attendees who are looking to incorporate Juptyer Notebooks with other materials in new and novel ways - to create compelling scientific communication materials whether in the form of books, blogs or articles, for education or research. The talk will cover how and where thebe can be applied effectively, as well as going into some development basics. Some knowledge of basic web development with HTML, javascript or typescript will be beneficial for walk-through element of the talk as we'll cover some code and configuration, but we aim for the talk to be accessible by anyone interested in putting interactive scientific communication on the web, whether they develop themselves or not.

Community: Tools and Practices
Louis Armand 2
14:30
30min
e2xgrader: An Add-on for Improved Grading and Teaching with Jupyter Notebooks at Scale
Tim Metzler, Mohammad Wasil, Paul Plöger

We are using Jupyter Notebooks with nbgrader for teaching several university courses in the area of STEM. Courses range from small size to large size (up to 300 participants per semester). We cover topic areas from mathematics and statistics to AI and robotics.

This includes assignments during the semester as well as exams. Nbgrader provides basic tools for creating, grading, and exchanging assignments. We developed e2xgrader, an nbgrader add-on, to address the needs of three types of users: (i) students, (ii) graders, and (iii) teachers.

Students, especially in undergraduate STEM courses, come with varying levels of knowledge in programming and how to use Jupyter Notebooks. The freedom provided by Jupyter Notebooks has led to several problems in grading and teaching. Using JupyterHub eliminates issues with setting up local environments and ensures fairness by providing a standardized computing environment. However, students may still accidentally delete or alter nbgrader cells or notebooks. In interpreted languages, students may import libraries in a later cell and then use it in an earlier cell, causing further problems.

To address these issues, we have implemented measures to restrict the notebook and provide clear instructions for students. We provide a visual indicator in the form of a cell toolbar to help students know where to write solutions. During exams, we replace the standard view with a simple toolbar that includes a save and submit button and links to provided resources. Students can view their submission in HTML, and are given a restricted Python kernel with preloaded libraries and disabled terminal commands and magics. This kernel is fully configurable.

To support the transfer of paper grading processes to electronic systems, we added the ability for graders to annotate student submissions by drawing on them and display them in feedback.

We also implemented a new grading view that allows graders to view submissions horizontally (e.g. looking at one exercise for all students).

After partially grading a notebook, graders may realize that the automatic test cases need improvement. To address this, we added the option to autograde only specific cells. After updating a test case, the grader can run the autograding process again for the changed cells, and only the relevant grades will be updated. Graders can also export grades in CSV format for use in other university systems. This provides greater control and ease of use in the grading process.

Teachers need tools that make it easier to create assignments. To address this, we created a tool that organizes simple exercises into pools by topic. Teachers can choose from a variety of preset question types and create their own, and can use templates to quickly assemble assignments from the pools. We also added version control to the pools and exercises to make the process more transparent. This provides greater efficiency and flexibility in assignment creation.

Teachers need specific cell types for different courses. For example, in a software engineering class, they may want students to create diagrams, while in a mathematics class they may want to give students the ability to solve exercises on paper and upload an image of the solution. To support these needs, we added multiple and single choice cells, a fully integrated diagram editor cell based on diagrams.net, and a cell for uploading files and taking pictures from a webcam.

Finally, teachers wanted greater standardization and ease of use in authoring test cases. For this, we provide the assignmenttest package which can assign partial grades and provides an easy way to author test cases for variables, functions and classes in Python.

Summing up, the Jupyter ecosystem provides the needed to flexibility to enhance the effectivity of learning, teaching and grading for students, staff and professors, with e2xgrader.

Jupyter in Education
Louis Armand 1
15:00
15:00
30min
CodePod: Scalable Computational Notebook on a Hierarchical Whiteboard with Scoped Runtime
Forrest Bao

Jupyter is great. But as a project gets larger, it becomes messier and harder to manage, due to the only one global namespace and the 1D vertical organization of code cells. CodePod expands notebook-based interactive programming for large, complex projects by allowing code cells to be placed anywhere on a 2D whiteboard and to be clustered into nested scopes where code cells under the same scope share the namespace. A scope can be simply defined by drawing a bounding box, while associating or dessociating a member into or from a scope is as easy as a drag-and-drop. In this way, code hierarchies are visualized and maintained in computational notebooks without using traditional file-based module systems. CodePod provides the same cell-based computational notebook experience but with scopes, modularization, and 2D, nested layout of cells. Now projects can be much larger because the users spend less effort on keeping code organized.

CodePod is an open source project (available at https://codepod.io) under the MIT license. Standing on the shoulders of Jupyter kernels, CodePod's scoped runtime and nested 2D whiteboard are completely built from scratch. The scoped runtime is language-agnostic and supports functions, classes, and variables.

By enabling code modularization and hierarchy without files, CodePod will revolutionize software engineering. In one interface, a programmer can navigate through the complex hierarchies or dependencies of code by zooming in and out swiftly at any time. Between the big picture and one particular function is just a mouse scroll. CodePod brings the beloved REPL experience to the entire hierarchy of code by making code evaluation aware of modules. Developers can interact with any part of the entire software in one single runtime. Via scoped code cell management, collaborative development can be bumped to the next level: developers in a team can collaborate on different modules without interfering with each other. No more repo-based branching but scoped branching for simpler and more fine-grained synchronization and merging between branches.

In our talk, we will discuss the technical architecture of CodePod and provide a quick demo of CodePod. We hope to use this opportunity to introduce this Jupyter-related open source project to the community and learn from the community at JupyterCon.

Community: Tools and Practices
Louis Armand 2
15:00
30min
Flexible course management and validation system using Jupyterhub with additional services using Flask
Marc Buffat, Thomas DUPRIEZ, Sarah Di Loreto Pollet

Using multiple tools from the open-source ecosystem such as JupyterHub, Nbgrader, Python, Flask, and Pandoc, we designed a flexible platform for courses using jupyter notebooks and python to support the student's learning process at Lyon 1 University.

Our platform provides a virtual JupyterHub server for each year from undergraduate to graduate. Each virtual server hosts all the classes of that year. This lets us optimise and customise the servers depending on the use case:

  1. numerous students (>1000) for the first undergraduate year, with low memory and CPU needs (discovery of python and notebook in science)
  2. a few students (~30) for the last graduate year, with large memory and CPU needs (modelling, simulation, machine learning in mechanical engineering)
  3. all the courses remain active all the semester, and are accessible 24 hours a day, 7 days a week by students using only a web-browser.

Our platform augments the classical Jupyter experience by offering features helping teachers run and manage courses, such as:
1. simple sharing of documents (notebooks, data, python library..) back and forth between teachers and students
2. multiple courses on the same JupyterHub server with the ability to set which students see which courses. Also useful to divide a course in groups for practical work
3. automatic grading of notebooks, python code, ...
4. simple evaluation of students work (notebooks, reports in markdown or LaTeX...) directly in the web browser without download or file transfers
5. plagiarism test on the students work
6. link with TOMUSS, the educational monitoring platform of the Lyon 1 University, to input student lists and output student grades

Our platform is made of three main components: JupyterHub, nbgrader, and a custom web application written in Flask running as JupyterHub services. To implement it, we have followed the Unix philosophy KISS, by using simple components such as bash scripts and python programs. The documentation written in French using Jupyter-book is available here: https://perso.univ-lyon1.fr/marc.buffat/2022/BOOK_VALIDATION.

This platform has grown as a prototype in the Mechanical Engineering department of the Lyon 1 University. Thanks to the AMI INCLUDE project, it is being refined, redeployed on new servers and will eventually be available to other educational institution around Lyon.

During the talk, a live demonstration will illustrate how students and teachers use the platform. Finally, we will share feedback from teachers that have run courses on the platform.

Jupyter in Education
Louis Armand 1
15:00
30min
No Magic Added - Deploying Multiple JupyterHubs to Multiple Clouds from one Repository
Sarah Gibson, Damián Avila, Yuvi Panda, Georgiana Dolocan

The International Interactive Computing Collaboration (2i2c) manages the configuration and deployment of multiple Kubernetes clusters and JupyterHubs for a range of research and education purposes, spanning not only domains, but the globe. For the sake of optimising our engineering team’s operations, we manage these deployments from a single, open infrastructure repository. This presents a challenging problem since we need to centralise information about a number of independent cloud vendors, and independent JupyterHubs whose user communities are not necessarily related. Given that each hub has an independent user base, this centralisation must not come at the cost of a community being unable to extricate their JupyterHub configuration from 2i2c’s infrastructure and deploy it elsewhere, as detailed by our Right to Replicate.

In this talk, we will discuss a recent overhaul of 2i2c’s tooling that facilitates the centralisation of information and optimal operation of the engineering team, whilst protecting a community’s Right to Replicate their infrastructure. Critical to protecting the Right to Replicate is a configuration schema for both clusters and JupyterHubs, where these files should live in the repository, and how the contents should be structured. Each individual JupyterHub we deploy is defined by its own individual set of configuration files which enables simple extrication from the repository, and they can be deployed independently with a basic command. There is no added magic in the rest of 2i2c’s specific tooling that would prevent this.

Further tooling to optimise the deployment and management of these JupyterHubs for 2i2c’s engineering team includes:

  • A Python “deployer” module that knows how to read the configuration for a given JupyterHub on a given cluster and can perform an upgrade action
  • A function within the deployer module that can extrapolate which JupyterHubs on which cluster require an upgrade from a list of changed files in the repository (e.g. from a Pull Request)
  • A GitHub Actions workflow that can deploy to multiple clusters in parallel, deploy production JupyterHubs in parallel, implement Canary deployments using staging JupyterHubs, and intelligently prevent a Canary deployment failure affecting the deployments on an unrelated cluster

Details of these efforts were first published in the “Tech update: Multiple JupyterHubs, multiple clusters, one repository” blog post.

Enterprise Jupyter Infrastructure
Gaston Berger
15:00
55min
Why Won't My Favorite Notebook Extension Work in JupyterLab?
Daniel Goldfarb

Classic Jupyter Notebook extensions do not work in JupyterLab. This can be an impediment, for some people, to using JupyterLab.

The main focus of this talk is how to write Extensions for JupyterLab. We walk through a couple of practial examples, illustrating how to create and install a JupterLab extension. We also examine some details of the labextension framework.

We then discuss why classic notebook extensions don't work in JupyterLab, and propose strategies for porting your favorite nbextension to JupyterLab.

Repository with slides: https://github.com/DanielGoldfarb/jlxd

Community: Tools and Practices
Room 1
15:30
15:30
30min
Building GitHub Code Review Experience for Jupyter Notebooks
Amit Rathi

Key Takeaways

  • What is notebook code review & why should Jupyter users care
  • How to build rich diffs & commenting for notebooks
  • How to integrate notebook diff & commenting with version control platforms like GitHub & Bitbucket

Summary

For the past 4+ years, I’ve built a notebook code review experience (ReviewNB) for GitHub as a solo bootstrapped developer. Thousands of organizations now use the service, including Apple, Airbnb, Lyft, Deloitte, Affirm, AWS, Meta Reality Labs, and NASA JPL.

This talk focuses on behind-the-scenes technical details such as,

  • Challenges of building rich notebook diffs on top of GitHub / Bitbucket
    • handling JSON diffs
    • handling images, plots & other rich outputs
  • Challenges of building discussion / commenting functionality for notebooks
    • where & how to store notebook comments
    • how to handle comments when underlying notebook changes
Community: Tools and Practices
Louis Armand 2
15:30
30min
Federated collaborative workflows for Jupyter
Diogo Castro, Marcin Sieprawski

Cloud Storage for Synchronization and Sharing (CS3) platforms, like ownCloud or Nextcloud, have been widely deployed in the research and educational space, mostly by e-infrastructure providers, NRENs (National Research & Education Networks) and major research institutions. These services, used usually in daily workflows by hundreds of thousands of users (including researchers, students, scientists and engineers) remain largely disconnected, and are developed and deployed in isolation from each other.
The same can be said about Jupyter deployments, the de-facto standard for data analysis in the scientific and research communities: each institution has its own configuration, deployment strategy and, more importantly, their customized way of giving users access to their (siloed) data and code.

The EU-fundend project CS3MESH4EOSC was started to address these major technical, but also societal challenges. Science Mesh, its main asset, was idealized to provide an interoperable platform that easily integrates and extends sync & share services, applications (like Jupyter) and software components within the full CS3 community. Such federated service mesh provides a frictionless collaboration platform for hundreds of thousands of users, offering easy access to data across institutional and geographical boundaries.

This presentation will focus on the development of the cs3api4lab, a plugin created by the project to connect Jupyter to the Science Mesh. It brings features like easy to configure access to CS3 services’ backends, sharing and parallel access to notebooks right from within the Jupyterlab interface. We will also discuss its applicability outside of the Mesh and, finally, on the project vision for collaborative scientific analysis.

Enterprise Jupyter Infrastructure
Gaston Berger
15:30
30min
How to convince French HSS researchers to use Jupyter Notebooks ? Autopsy of a missed attempt
Emilien Schultz, Antoine Blanchard, LE BECHEC, Mathieu Morey

The use of scientific programming in Python is still developing in the French Human and Social Sciences (HSS). While computational approaches are gaining visibility, adoption by the HSS community remains quite low. Supporting good practices across disciplines and providing training for students and young researchers will require infrastructure, as well as examples of treatments specific to the fields concerned. Although the tools for scientific programming in Python are largely mature and used in numerous scientific communities, they have yet to be widely adopted in HSS.

To promote the diffusion of interactive computational practices, we developed five proof of concept notebooks that aim not only to demonstrate the possible uses of Jupyter Notebooks and Python machine learning tools, but also to foster their adoption by the HSS community. This project was initiated by the large digital research infrastructure in HSS, Huma-Num, in the context of testing a jupyter hub deployment. However, despite our best efforts, these notebooks were woobly and not very useful. We can say that we failed to match our expectations.

In this communication, we propose to dissect this failure as an attempt to gain a better understanding of the current uses of Notebooks by French researchers. We would like to emphasize the need to better explore research practices. Drawing on Science and Technological Studies, we suggest the hypothesis that many notebooks are primarily "intermediate objects" that allow for the coordination of the research process. If this is the case, reproducibility is not the primary goal, nor is diffusion. For this reason, the adoption of Jupyter Notebooks by HSS researchers would require primarily effort on the general computational literacy to foster their integration into their research process.

Jupyter in Education
Louis Armand 1
16:00
16:00
30min
Break
Gaston Berger
16:00
30min
Break
Room 1
16:30
16:30
30min
State of the Union: Jupyter Community
Ana Ruvalcaba, Afshin Darian, Jason Grout, Fernando Pérez

Come learn how the Jupyter community and leadership is organized today. We'll talk about new strategic initiatives impacting the global Jupyter community.

Speakers: Jupyter Executive Council

AFSHIN DARIAN is a technical director at QuantStack. He is a member of the Jupyter Notebook, JupyterLab, and Jupyter Server councils. Darian is a co-author of JupyterLab and currently works on several layers of the Jupyter stack.

ANA RUVALCABA is Director of Project Jupyter’s presence at California Polytechnic State University. She holds a Bachelor of Science degree in Business Administration with a minor in Ethnic Studies. Ana’s areas of expertise include program/project management, people management, operations, budgeting, and global events. Over the years she has collaborated with a wide variety of stakeholders in open source, tech, and university environments to deliver a unique set of contributions.

BRIAN GRANGER is co-creator of Project Jupyter and Altair, a statistical visualization library for Python. He is also an advocate for open-science, open-data and open-source software.

FERNANDO PEREZ is faculty at UC Berkeley Statistics and a scientist at LBNL. Trained in physics, he builds tools for humans to use computers as companions in thinking and collaboration, mostly in the scientific Python ecosystem. Today, he focuses on open, reproducible science at scale to tackle problems like the climate crisis that bridge physical modeling, data analysis and societal concerns. He co-founded Project Jupyter, 2i2c.org and NumFOCUS.

JASON GROUT is a staff software engineer at Databricks working on interactive computational interfaces. In Jupyter, Jason helped build JupyterLab and ipywidgets, and has contributed to many other parts of the project.

STEVEN SILVESTER is an Engineering Lead at MongoDB Inc. He is a veteran of the United States Air Force and has been an active contributor to Jupyter since 2015. He helped build the original version of JupyterLab, and has since focused on improving the maintainability of Jupyter software.

Organiser Choice.
Gaston Berger
17:00
17:00
10min
-
Gaston Berger
17:10
17:10
30min
AutoML as it should have always been
Greg Michaelson

When AutoML was popularised during the 2010s, there was a great hope that the citizen data scientist would take over machine learning and that business analysts everywhere would soon be building thousands of advanced AI-based solutions, ushering in the age of AI in business. Not only did that not happen, but even the name “AutoML” has become sullied along with the myth of the citizen data scientist. In this talk, Greg will discuss the launch of a brand new open source project that promises to deliver AutoML as it should have been: open, flexible, code-based, and targeted at the only people generating value from machine learning — data science experts.

Sponsored
Gaston Berger
17:40
17:40
15min
Wrap up

We'll wrap up for the day/week, and give you informations about the evening /next day

MISC
Gaston Berger
17:55
17:55
45min
Lightning talks

You did not had a chance to present, or had an idea during JupyterCon, here is your chance to have a 4 minutes presentation about it.

You may register during the day for the lightning talks.
At the entrance level of the conference you will find a number of index cards, a box, and pens.

  • Write clearly the title of the proposal and your name.
  • Put in int the box.

The proposal / talk does not need to be polished, it does not need to be an existing project, nor be your project.
It does not have to be about programming. It can be about waffle, or it can be about shoes. You are allowed to not have slides. It is recommended to make puns.

You have 4 (FOUR) minutes max.

At the end of the day we'll select talks at random.

Please sit near the front if you have submitted a talk.

proposals = {...}
while still_time():
    on_stage = waiting_area
    on_stage.plug_in()
    proposal = random.pick(proposals). 
    proposal.author.walk_to(waiting_area)
    waiting_area = proposal
    on_stage.present(timeout=4min)
    on_stage.unplug_and_walk_away()
MISC
Gaston Berger
18:40
18:40
80min
Gaston Berger
06:05
06:05
60min
Second day, Thursday
Louis Armand 2
08:00
08:00
60min
Badges
Gaston Berger
09:00
09:00
15min
intro remarks

Intro remarks

MISC
Gaston Berger
09:15
09:15
45min
Alyssa Goodman Keynote
Alyssa Goodman, A Goodman

Alyssa Goodman will present our second day keynote.

Keynotes
Gaston Berger
10:00
10:00
30min
Break
Gaston Berger
10:00
30min
Break
Louis Armand 1
10:00
30min
Break
Louis Armand 2
10:30
10:30
150min
Getting Started With Python
Marianne Corvellec, Maria Teleńczuk

While the tutorial attendance is comprised in the conference pass, we ask you to register for this tutorial on https://www.jupytercon.com/tickets as the seats available are limited.

--

This tutorial is an introduction to Python programming using JupyterLab. It is aimed at students with little or no programming experience, and is intended as a follow-along tutorial. It draws inspiration from the late Boston Python Workshop and uses materials from Software Carpentry, which are available under the CC-BY license.

If you register to the tutorial, it is very important that you install the required software ahead of the tutorial. Please refer to the setup instructions below and follow them step by step.

Setup instructions

  1. Open https://docs.conda.io/en/latest/miniconda.html in your web browser.
  2. Click on the latest installer link depending on your OS: Miniconda3 Windows 64-bit for Windows, Miniconda3 macOS Intel x86 64-bit pkg for MacOS, Miniconda3 Linux 64-bit for Linux.
  3. Install Python 3 by running the Miniconda installer (double click on the downloaded file) using all of the defaults for installation. On MacOS and Linux, make sure to check Add Miniconda to my PATH environment variable.
  4. On Windows: open the Anaconda Prompt from the Start menu. On MacOs and Linux: open the Terminal app. Run the following lines (i.e., copy-paste them in the window that just opened up and press Enter):
conda config --add channels conda-forge
conda install ipywidgets=7.6.5 jupyterlab=3.5.3 matplotlib=3.7.0 pandas=1.5.3 voila=0.3.6

You have just installed the packages required to follow the tutorial.
5. Type jupyter lab in the Anaconda Prompt / Terminal. After JupyterLab has launched, click the “Python 3” button under “Notebook” in the launcher window, or use the “File” menu, to open a new Python 3 notebook.
6. To test your setup, run the following code in a cell of the notebook:

import pandas as pd
table = pd.DataFrame(
    {'Time': [0, 1, 2, 3],
     'Emma': [0, 10, 20, 30]}
)  
table.plot();

You should see a plot display right below the code cell.

Tutorial
Room 3 (Tutorial)
10:30
55min
Ipytone: Interactive Audio in Jupyter
Benoît Bovy

Jupyter already has a rich ecosystem of widgets that together allow using it as a powerful platform for interactive data visualization. However, to my knowledge no generic widget library has been proposed yet for exploring data through sound. In parallel, there exists a few programming environments for interactive creation of sound and music (e.g., Sonic-Pi, FoxDot), but those are isolated applications focused on code and hardly reusable in general-purpose environments.

Ipytone is a widget library that fills this gap by providing many audio components (i.e., oscillators, filters, effects, synthesizers, samplers, etc.) to Python and Jupyter. Those components are part of the Tone.js library, which is built on top of the Web Audio API for creating interactive music and sounds in the browser. Ipytone exposes each component as a widget, making the library very flexible and tightly integrated with the rest of the Jupyter widget ecosystem. Ipytone aims at turning Jupyter into a versatile DAW (Digital Audio Workstation) and at democratizing “data sonification”, a fascinating although still largely unexplored area!

This talk will introduce the audience to Ipytone through various examples and demos, hopefully with live sound! These will range from basic usage (e.g., creating a simple synthesizer and playing it) to more advanced usage (e.g., reproducing NASA’s sonification of a Hubble deep space image in a notebook using Ipytone and other Python/widget libraries).

Useful links:

  • Ipytone repository: https://github.com/benbovy/ipytone
  • Ipytone examples repository: https://github.com/benbovy/ipytone-examples
  • NASA Hubble data sonification: https://www.youtube.com/watch?v=H-Ci_YwfH04

Audience: intermediate (basic knowledge of ipywidgets)

Community: Tools and Practices
Room 1
10:30
30min
Jupyter AI: Bringing Generative AI to Jupyter
David Qiu, Piyush Jain

Generative artificial intelligence (AI) models are trained to generate new, previously unseen data (text, images, etc.). The generated data is both similar to the training data and a response to a user provided natural language prompt that describes a task or question. Recent generative AI models such as Amazon CodeWhisperer, Codex, Stable Diffusion, and ChatGPT have demonstrated solid results in performing a wide range of tasks involving natural language (content generation, summarization, question answering), images (generation, explanation, inpainting, outpainting, transformation), data (synthetic data generation, transformation, querying, etc.), and code (autocompletion, explanation, debugging, code reviews, etc.). In this talk, we describe open source work in Jupyter to enable end users to perform a wide range of development tasks using generative AI models in JupyterLab and the Jupyter Notebook.

Since their creation, Jupyter and IPython have always had an architecture and user interface for code generation through autocompletion. Indeed Jupyter users have been well trained to hit “tab” as they are coding to get contextual autocompletions for functions, variable, methods, properties, classes, etc. This architecture has already been used to integrate AI provided autocompletions into JupyterLab and the Jupyter Notebook (Kite, Tabnine). As generative AI models expand to perform other tasks, a generalization of the Jupyter autocompletion architecture is needed to enable users to perform potentially any task in Jupyter’s applications through a generative AI. In this talk we will describe an architecture and user experience for generative AI tasks in Jupyter.

The Jupyter generative AI architecture is founded on an extensible Jupyter Server API for registering generative AI models and the tasks they perform (represented as task prompts) with Jupyter. This enables third parties to quickly integrate their generative AI models into Jupyter and for end users to enable the models and tasks using the JupyterLab extension manager UI or a simple pip/conda install. Once models have been enabled, users working in JupyterLab or the Jupyter Notebook can 1) select anything (text, notebook cell, image, file, etc), 2) type a prompt to perform a task with the selection, and 3) insert the AI generated response in a chosen location (replace, insert below, new file, etc.). The task system is integrated with Jupyter’s MIME type output system, which enables GAI tasks to work with inputs and outputs of any MIME type. From the end users perspective, this enables them to enable and use models to perform tasks on text, code, images, data, audio, video, etc. This architecture also has APIs in Jupyter Server and JupyterLab that enable third parties to extend and integrate with the models and UI.

In the talk we will also give concrete examples of the AI driven tasks that can now be performed in JupyterLab. Examples include code refactoring, debugging, code transformation (e.g., Python to C/C++), code explanation (“generate and insert a markdown cell that describes this code”), synthetic data generation, data transformation, API documentation generation, technical content generation, and more. The focus of these examples will be less on the capabilities of the underlying AI models, and more on the seamless integration of the models into Jupyter to perform various tasks. Our assumption is that generation AI models will continue to improve and that Jupyter users will want a simple user experience that works with any such model.

Another potential application of this architecture is as a general purpose “playground” for exploring the capabilities of generative AI models. While not the primary goal of this work, the architecture effectively turns JupyterLab into a general-purpose generative AI playground. This enables users to quickly integrate models, and then use the familiar interface of Jupyter, with its support for editing files/notebooks, version control, real-time collaboration, etc., to play with generative AI. We discuss the pros and cons of this user experience and pose questions for future research and development.

This talk will be useful and of interest to 1) researchers who are building generative AI models to perform different tasks, 2) JupyterLab extension authors who would like to leverage generative AI, and 3) end users of Jupyter, who would like explore generative AI models and use them to perform tasks in their work.

Others
Louis Armand 2
10:30
30min
Jupyter notebooks for education: Computational thinking in practice
Cécile Hardebolle, Pierre-Olivier Vallès, Patrick Jermann

Introduction

In 2018, the Ecole polytechnique fédérale de Lausanne (EPFL, Switzerland), embarked on a major curriculum reform to introduce computational thinking across its science and engineering programs.
Thanks to their multi-representational nature combining interactive code ("computational") and rich text ("thinking"), Jupyter notebooks have been identified as a key enabling technology for this project.
The Center for Digital Education of EPFL has therefore created, in September 2019, a centralized service to support teachers in the development and use of Jupyter notebooks for their classes and Massive Open Online Courses (MOOCs).
This service consists of a technical infrastructure in the form of a centralized JupyterLab platform, a technical support component and pedagogical support component.
In this talk, we will present how we have developed this service and share our experience both in terms of challenges encountered and solutions implemented.
We hope to provide useful insights to teachers and institutions interested in deploying Jupyter notebooks for education at scale.

Centralized JupyterLab platform

The centralized JupyterLab platform has proved to play a pivotal role in the transformation of our teachers’ practices.
By making it easy for anyone to use and produce notebooks directly in their browser without any installation, it has simplified drastically the class logistics and has attracted more and more users over the years.
Since 2019, more than 9600 unique users have used our centralized JupyterLab platform, and 3500 users are active all year round.
During the autumn semester 2022-2023, we have had an average of 350 daily users on the platform.
Dealing with user surges at the beginning of classes and with increasing numbers of concurrent users, and managing shared resources while allowing non-trivial types of use (e.g. machine learning) are typical examples of challenges we have faced.

Technical support

Beyond answering requests from teachers, an important role of our technical support has been to adapt the service to the pedagogical needs associated with teaching computational thinking in a science and engineering context.

Because the practice of computational thinking takes various forms in various fields, our technical support has evaluated and installed a range of different types of software stacks and libraries.
For instance, we have installed specific kernels for simulations in computational chemistry or for polyglot notebooks in image processing (the Script of Script kernel). Our computational neuroscience MOOCs use a separate instance of JupyterLab with a specific software stack and a LTI front-end (Learning Tools Interoperability) to accommodate users that are external to our institution.

Using notebooks for educational purposes also involves operations that are specific to education such as managing assignments.
Most institutions use learning management systems such as moodle for these tasks, but these don’t necessarily interface with JupyterLab beyond file upload/download. We have developed a moodle plugin that interfaces with a JupyterHub installation, making the whole notebook assignment workflow easier from the assignment creation to the transmission of notebook-based feedback to students.

Pedagogical support

Because the range of possible pedagogical patterns for Jupyter notebooks is wide (see “Teaching and Learning with Jupyter”), a first challenge for us has been to help teachers choose ways to use notebooks that are likely to help students develop the targeted knowledge and skills.
Using results from research in different domains of learning sciences, we have experimented and developed best practices for teachers to optimize the design of their notebooks in five scenarios: virtual demonstrations, interactive textbooks, exercise worksheets, labs and projects, and graded assignments.
This work has resulted in the development of several training workshops for teachers as well as a complete website including teacher interviews: https://go.epfl.ch/notebooks.
Of course, developing notebooks for a course requires a significant investment in time and resources for teachers. We accompany them with both funding and advice for developing their projects.
Finally, to foster networking and exchanges, we invite teachers to share how they use notebooks in their classes or MOOCs in bi-annual community events.
We will present examples of notebooks from the 27 git repositories we have documented so far.

Jupyter in Education
Louis Armand 1
10:30
30min
Sending Rovers to Mars with Jupyter
Thomas Boyer Chammard

Jupyter has long been proven to be a tool of choice in the data science and scientific computing
community. At NASA’s Jet Propulsion Laboratory, we have built upon those foundations to
answer engineering challenges and prove that with innovative approaches, the use cases that
can benefit from the Jupyter ecosystem are endless. With modular extensions and novel
methods, we are able to perform system-level and physics-based modeling, trade studies,
automated pipelines of engineering reports, hardware testing procedures, and more.

In this talk, we share our experience of using Jupyter as an integration platform and multi-domain
engineering tool, and why we believe that with the correct approach, Jupyter can be extended
to do just about anything.

Community: Tools and Practices
Gaston Berger
11:00
11:00
30min
Jupyter Notebooks + Quarto for customizable and reproducible documents, websites and books
J.J. Allaire

To share our results and communicate effectively in data science, we need to weave together narrative text and code to produce elegantly formatted, interactive output. Not only does it need to look great, but it needs to be reproducible, accessible, easily editable, diffable, version controlled and output in a variety of formats, such as PDF, HTML and MS Word for convenience and often compliance. Jupyter has already made so much of this possible. By combining Jupyter with the open-source publishing platform, Quarto, built on Pandoc, you can easily create the output format and the styling that you need for any situation. With Quarto, you can author documents as plain text markdown or Jupyter notebooks with scientific markdown, including equations (LaTeX support!), citations, cross references, figure panels, callouts, advanced layouts, and more. You can also engage readers by adding interactive data exploration to your documents using Jupyter Widgets, htmlwidgets for R, Observable JS, and Shiny.

In this talk, we’ll discuss authoring these dynamic, computational documents with Quarto and Python that bring code, output, and prose together, leveraging integrations with both Jupyter and the Quarto VS Code extension. Whether you’re new to Jupyter or have thousands of notebooks already, we’ll walk you through using a single source document to target multiple formats - transforming a simple document into a presentation, a scientific manuscript, a website, a blog, and a book in a variety of formats including HTML, PDF and MS Word. We’ll also show how you can change themes and styling, and publish these artifacts directly from the command line to the web, so they’re immediately available online.

Community: Tools and Practices
Gaston Berger
11:00
30min
Pluto.jl – reactive and reproducible notebooks for Julia
Fons van der Plas

Pluto.jl is a new, open source notebook programming environment for Julia, written in Julia and JavaScript. Our mission is to make Julia more accessible and fun! 🎈

In this talk, we would like to introduce Pluto.jl to the JupyterCon audience, and we will talk specifically about our approach to reproducibility and reactivity. While Pluto.jl is not directly connected to the Jupyter ecosystem, we think that our position (Julia-only, beginners-first) has led to new discoveries and solutions that are exciting to discuss!

Reproducibility 1 – Package Management

We see package management as one of the major hurdles for beginner programmers. It can be intimidating to set up an environment to start programming, but it is especially difficult to set it up in a reproducible way. We want to flip this paradigm: a simple, reproducible environment should be the default, and more advanced users can set up an environment themselves. As a whole, 'scientific computing' has an awful onboarding process, and we scare away so many creative and wonderful people before they are able to contribute. Let's fix that!

One of our goals is to make notebooks reproducible by default. Each notebook file (or HTML export) contains the Manifest.toml file that can be used to exactly recreate the package environment. When you open a Pluto notebook file, the embedded package information is used to automatically recreate the package environment that was used to write it.

A second big feature is automatic package management: instead of a terminal interface, packages are automatically installed and removed as they are used in code. We show package GUI inline in code, and we relay installation progress to the user visually. As a user, it feels like you can simply import any package you want (we even autocomplete all registered package names!), and Pluto takes care of installation and reproducibility.

Reproducibility 2 – Reactivity

Pluto notebooks are reactive, which means that – just like a spreadsheet – your notebook forms a computational graph, and cells re-run automatically when one of their dependencies changes. We also have a "managed scope": we delete variables from scope when the definition disappears.

Reactivity makes Pluto fun and interactive, but it also avoids effects from old code lingering around until a restart. Reactivity and managed scope mean that the notebook is always in its correct state, the same state you would get if you would restart the notebook. At any instant, the notebook state is completely defined by the code you see.

Reproducibility 3 – Binder

Pluto can be installed as a JupyterLab extension, which means that we also run on Binder, the free cloud compute service. We went one step further, and integrated the Binder startup directly into our notebook UI. Inspired by the Thebe project, this allows users to launch a Binder session directly from the website where they are reading a notebook!

Every HTML export file has the original code, and an embedded project environment. But as part of our Binder integration, each HTML file also contains a reference to a version-pinned Binder image, meaning that the exports from Pluto can be re-run in exactly the same environment years into the future.

Lessons and discussion

Pluto is deeply integrated with Julia's package manager, metaprogramming and runtime, something that we were able to freely explore by limiting ourselves to a single language. Originally inspired by Jupyter, which supports more languages and use cases each year, our experiment is to see what happens when we really narrow down our scope: we focus on one language and one audience (Julia newcomers and educators).

We hope to offer an interesting new take on existing topics in the Jupyter ecosystem, and we really look forward to hear what you think!

Others
Louis Armand 2
11:00
30min
Tools for Interactive Education Experiences in Jupyter Notebooks and Jupyter Books
John M. Shea

In this talk, I will introduce some simple-to-use libraries for providing interactive educational experiences in Jupyter notebooks and Jupyter Books. These tools are designed to support low-stakes testing and spaced repetition, which are two techniques that pedagogical research has shown support learning and retention. JupyterQuiz allows the simple creation of interactive self-assessment quizzes with custom feedback for answers that can help students understand why they are making a mistake. The typical use pattern is for question sets to be created and stored in a separate JSON file, but they can also be stored in an obfuscated format within a Jupyter notebook. Questions are delivered in Javascript to provide for interactivity, but the user does not need to know anything about Javascript. JupyterQuiz supports spaced repetition by allowing questions to be randomly drawn from a pool of questions, so that notebooks or book sections can randomly retest material from previous notebooks or sections. JupyterCards provides interactive, animated flash cards for review of terminology and concepts in Jupyter notebooks and Jupyter books. Cards can be created in either JSON or Markdown. Both libraries are free, open source, and easily installable via pip.

Jupyter in Education
Louis Armand 1
11:30
11:30
30min
And Voilà!
Martin RENOU, Trung Le

Voilà can turn Jupyter notebooks into interactive dashboards without any modifications. This means that the millions of notebooks shared on GitHub and other online venues are just as many potential interactive dashboards. Unlike classical HTML-converted Notebooks, each user connected to a Voilà dashboard gets a live Jupyter kernel that can execute callbacks, making the dashboard interactive using e.g. Jupyter widgets.

In this talk we will present the latest updates on the Voilà open-source project, including the recent move to using components from the JupyterLab project for the rendering, enabling support for rich mime type renderers. We will also demo the ongoing work on using in-browser Python kernels from JupyterLite.

Community: Tools and Practices
Gaston Berger
11:30
30min
OSSCAR: leveraging interactive Jupyter notebooks to enhance teaching in the scientific domain
Taylor Baird

Are you an educator working in one of the scientific disciplines and are looking to take your lessons to the next level? In the OSSCAR
project (Open Software Services for Classrooms and Research, https://www.osscar.org/), we have created an online collaborative hub where instructors can find a curated set of software tools for creating interactive lessons that leverage all the benefits of the Jupyter ecosystem to enrich classroom teaching. At the heart of this initiative, Jupyter notebooks play the role of self-contained lessons which, through their conversion to web-based applications, are practicable for all prospective students independent of their local setup (what libraries and packages they have installed on their own computer). In this talk, I will first guide you through a prototypical example of one of these notebook-based lessons and show how it may be used to enhance traditional teaching methods. Whilst doing so, emphasis shall be put on the technological and design choices that were made to optimize the pedagogical effectiveness of the lessons. Following on from this, I shall demonstrate the various steps that go into constructing such a notebook and how one goes about deploying it as a cloud-based web application. Here, we shall see an example of how an instructor, who has a given concept that they would like to present, can quickly go from a basic lesson plan to a functioning notebook in a few steps. Additionally, I shall briefly show how a teacher is able to benefit from the flexibility of the Jupyter environment to implement customized features (such as bespoke widgets and extensions) in order to get the full potential from their lessons. Finally, I shall outline the advantages engendered by this collaborative project: how lessons, tools, and knowledge can be shared and re-used amongst the broader community - aspects which are discussed in more detail in the paper [1]. In this spirit, we hope that this talk fosters discussion amongst interested Jupyter users regarding how they may get involved in the OSSCAR initiative, use ideas derived from our approach to notebook-based teaching, and possibly contribute suggestions for building upon our current approach.

[1] D. Du, T. Baird, S. Bonella and G. Pizzi, OSSCAR, an open platform for collaborative development of computational tools for education in science, Computer Physics Communications, 282, 108546 (2023)

Jupyter in Education
Louis Armand 1
11:30
30min
Simplify DevOps with Executable Notebooks
Amit Rathi, Vinay Kakade

Summary

Today, Jupyter Notebooks are mostly confined to science, research & education. But notebooks can provide organizations with a powerful general-purpose “executable documentation” platform. A solid use case for this is DevOps & more specifically, IT incident response.

Technology teams usually have an on-call rotation with static wiki-style documentation to guide the on-call engineer. Jupyter Notebooks can replace static documentation with executable notebooks. E.g. “fetch service logs” and “rollback last deployment” can simply mean executing a code cell that’s available alongside the markdown instructions.

What are the benefits of executable vs. static documentation for DevOps -

  • Quick e.g. “check DB latency” is 1-click notebook code cell execution to plot latency graph vs. going to a third-party UI in the middle of an incidence
  • Precise e.g. “promote read replica to master” can mean a series of steps & possibility of human error; codifying the steps in advance removes ambiguity & results in precise action.

“Executable documentation” is a simple yet powerful concept that can extend to other use cases such as - API documentation, developer onboarding, data visualization & reporting, scheduling routine tasks & so on. Think of it as executable GoogleDocs powered by Jupyter!

In this talk, we’d like to,
- Introduce the concept of Jupyter powered “executable documentation” platform, particularly for DevOps and Incident Response,
- Show a demo of how it’d work - (https://www.youtube.com/watch?v=vvLXSAHCGF8)
- Talk about important challenges, and propose a way forward to make this a mainstream application of Jupyter notebooks.

Others
Louis Armand 2
12:00
12:00
30min
Five guiding principles to make Jupyter Notebooks educational and reusable
Julia Wagemann, Simone Mantovani, Sabrina H. Szeto

Jupyter notebooks are a popular choice for training and teaching data-intensive science, such as training users of large volumes of Earth Observation data. Computational notebooks, including Jupyter, are in particular valued for facilitating reproducibility and collaboration. However, quantitative analyses and empirical research have identified unique challenges when it comes to using them. Critics claim that notebooks foster bad coding practices due to the possibility of an out-of-order execution of cells, that only a small percentage of notebooks hosted on Github are in fact reproducible and that annotations are not evenly distributed and do not reach the objective of well-described computational narratives.

These findings are the motivation to develop and share best practices to make Jupyter notebooks more educational and reusable. During the development of a Jupyter-based training course on Earth Observation data (Learning Tool for Python (LTPy)), we defined and applied five guiding principles from different fields (mainly scientific computing and Jupyter notebook research) to make these notebooks more educational and reusable.

The Jupyter notebooks developed (i) follow the literate programming paradigm by a text/code ratio of 3, (ii) use instructional design elements to improve navigation and user experience, (iii) modularize functions to follow best practices for scientific computing, (iv) leverage the wider Jupyter ecosystem to make content accessible, and (v) aim for being reproducible.

In this talk, we will share with you five guiding principles to make Jupyter notebooks educational and reusable and for each principle we share a practical example of how it can be applied and implemented. The guiding principles have also been published in the Journal of Remote Sensing.

Jupyter in Education
Louis Armand 1
12:00
30min
Real Time Collaboration in Jupyter
Carlos Herrero, Trung Le, David Brochart

In the past years, real-time collaboration has become a must for any editor, a feature that users expect as core functionality in their daily editor. Sharing and collaborating on the same document with your colleagues or teachers increases productivity by improving the teamwork experience.

The adoption of real-time collaboration in JupyterLab has been a challenge for many developers over the years. From the very beginning, RTC was in JupyterLab's roadmap. Still, it was only in v3.x that it became a reality, and in v4.0, that it shows its real power. We want to describe the feature in detail with everyone to give extension developers the knowledge to leverage it into their plugins.

This talk will go through the RTC implementation and describe the role of the packages used.. The various entry points to use and extend real-time collaboration on documents will be highlighted. And it will show the corner cases and the restrictions it enforces.

Finally, we will offer a glimpse into how real-time collaboration works in JupyterCAD and JupyterLite. JupyterCAD is a JupyterLab extension using a non-default document type for 3D geometry modeling that supports the FreeCAD format. JupyterLite is a lightweight serverless version of JupyterLab, which changes the paradigm of collaboration. It adds a new challenge by removing the central authority that's the server, requiring the use of peer-to-peer communication to synchronize clients. The document is leveraging the real-time collaboration API to allow collaborative editing.

Community: Tools and Practices
Gaston Berger
12:00
30min
The past, present and future of the Jupyter Notebook
Rosio Reyes, Jeremy Tuloup, Eric Charles, Eric Gentry

Jupyter Notebook 7 is being developed as a replacement for users who may have been previously using Notebook 6 and want more of the features being created for JupyterLab, like real-time collaboration, debuggers, theming, and internationalization, among other benefits. To ensure that those users are equipped with some essential knowledge that will help them smoothly transition to using Notebook 7, this talk will go over some of the key details of working with the new Jupyter Notebook. We will explain how users can run multiple frontends like Notebook 7, JupyterLab and NbClassic (the long term supported version of the Notebook 6 code base) that will ease the transition of users not ready to switch to Notebook 7 as well as give users the freedom to choose between the Notebook 7 and Lab interface based on project needs. Through this talk we will also aim to provide Notebook 6 extension developers with information about the resources available to aid the transition of their extensions to both Notebook 7 and JupyterLab. Notebook users will leave this talk having a better understanding of what next steps they may want to take to get started with Notebook 7.

Others
Louis Armand 2
12:25
12:25
60min
Lunch
Poster Placeholder
12:30
12:30
90min
Lunch
Gaston Berger
12:30
90min
Lunch
Room 1
12:30
30min
De-Regid the Widget: Making Jupyter a Haven for Startups
Nate Rush

Becoming a Widget Author

Over the past 2.5 years, I’ve been building a spreadsheet extension for JupyterLab called Mito. This extension needs to share state between the frontend and the backend, and also have the backend/frontend communicate constantly. As such, we naturally built Mito as a Jupyter Widget.

Being a widget comes with many benefits, including easy-to-extend templates, automatic shared state, pre-setup comm channels, and a handling of kernel restarts and page refreshes. For the first 2 years of development, widgets were perfect.

Limitations of Widgets

But recently, we've hit into some limitations of the widgets framework. Namely, opinionated state management practices, complexity with integrating with other UI libraries, additional Python dependencies, and install rates that leave a bit to be desired.

As a result of these complexities, we decided to move from being a widget to being a rich output: a javascript frontend that manually establishes comms with the Python kernel, and shares state with it.

I'll share details on how to make this transition away from widgets, and also talk about what it gives you as widget author: reduced dependencies, potentially-easier installation, more manual (but customizable) state management, and more!

Towards a Unified Rich Output

Jupyter contributors are currently exploring (https://github.com/jupyterlab/richoutput-js) a more minimial rich output interface. I’ll close by reflecting on the benefits of this more minimal interface, and why it might be the right way forward for the entire notebook ecosystem :)

Others
Louis Armand 2
12:30
30min
Use Spark from anywhere: A Spark client in Python powered by Spark Connect
Martin Grund

Over the past decade, developers, researchers, and the community have successfully built tens of thousands of data applications using Spark. Since then, use cases and requirements of data applications have evolved: Today, every application, from web services that run in application servers, interactive environments such as notebooks and IDEs, to phones and edge devices such as smart home devices, want to leverage the power of data.

However, Spark's driver architecture is monolithic, running client applications on top of a scheduler, optimizer and analyzer. This architecture makes it hard to address these new requirements: there is no built-in capability to remotely connect to a Spark cluster from languages other than SQL.

Spark Connect introduces a decoupled client-server architecture for Apache Spark that allows remote connectivity to Spark clusters using the DataFrame API and unresolved logical plans as the protocol. The separation between client and server allows Spark and its open ecosystem to be leveraged from everywhere. It can be embedded in modern data applications, in IDEs, Notebooks and programming languages.

This talk highlights how simple it is to connect to Spark using Spark Connect from any data applications or IDEs. We will do a deep dive into the architecture of Spark Connect and give an outlook of how the community can participate in the extension of Spark Connect for new programming languages and frameworks - to bring the power of Spark everywhere.

Sponsored
Louis Armand 1
13:00
13:00
60min
Lunch
Louis Armand 1
13:00
60min
Lunch
Louis Armand 2
13:00
60min
Lunch
Room 2 (Tutorial)
13:00
60min
Lunch
Room 3 (Tutorial)
13:40
13:40
10min
e2xhub: Simplifying Course Setup and Management in Education with JupyterHub at Scale
Mohammad Wasil, Tim Metzler

Reminder - The date and time of this sessions are placeholder due to the limitation of the conference software.
All Poster will be presented during the Poster Session from 6:30 to 8:00 on Thus 11th,

https://cfp.jupytercon.com/2023/talk/AKPRE8/

--

Jupyter Notebook has been widely used in academia for research, teaching and examination. JupyterHub enables multi-user Jupyter Notebook environments, making it an ideal tool for educators. However, configuring nbgrader, a popular tool for managing assignments, for multi-course and multi-grader systems can be challenging and often requires a sysadmin to provide the necessary environment. We propose e2xhub as a solution to simplify this process, allowing educators to easily set up courses and environments using the Jupyter ecosystem without having to deal with the complexities of configuration.

We achieve this by providing a user-friendly JupyterHub configuration to allow graders to easily create courses and specify their requirements. We use YAML, a well-known declarative configuration language, to allow graders to set up courses, environments and resource allocation. Our JupyterHub is deployed on a Kubernetes cluster in order to provide isolation for users and improve scalability and maintainability. Isolation is important for our system to ensure security during examinations.

We use Zero to JupyterHub with Kubernetes (Z2JH) to deploy JupyterHub on our Kubernetes cluster. Our software extends the capabilities of Z2JH, allowing us to deploy a more customizable JupyterHub. The main objectives include providing the ability to create or load courses for individual users, creating personalized profiles for each course e.g. personalized image and resource allocation (CPU, RAM, and GPU), and enabling multi-course and multi-grader support. All of these modifications can be easily made using YAML, without the need for a sysadmin to update the configuration. In addition, we also maintain continuous integration and deployment to ensure that any updates to the upstream environment created by graders can be easily delivered to our system.

We have developed e2xhub for use at our university. Since our initial JupyterHub deployment in 2018, we have used it for 27 courses and 41 examinations, involving a total of 3569 students working on assignments and 1878 students taking exams. The flexibility to transfer our system to different cloud providers makes it easy to scale and support more courses and students. Overall, the configurability and maintainability of our system not only make it easier for sysadmins to manage, but also allow graders to create and manage courses more easily. This could potentially encourage wider adoption of the Jupyter ecosystem in education.

Jupyter in Education
Poster Placeholder
13:50
13:50
15min
A Case of Developing a Jupyter Extension: its challenges and our solutions
Hooncheol Shin

Audience level: Intermediate
- Project Jupyter users, looking to build Jupyter Extensions
- Developers hoping to deep dive into Project Jupyter

TLDR;

In this session, we will introduce 2 challenges we faced on Project Jupyter while developing our extension - MakinaRocks Link; and how we solved these challenges and how we implemented our ideas

  • Handling how to display cell outputs
  • Handling how to display error messages

We hope that sharing our lessons learnt from our deep dive into Project Jupyter can help other software developers of similar Project Jupyter-related extensions.

Background & Introduction

MakinaRocks Link

MakinaRocks Link is a JupyterLab extension that allows you to create pipelines on Jupyter by converting cells into components and setting parent-child relationships between components. We have built this extension to allow users access to all features originally available in the Jupyter environment. They say a picture is worth a thousand words - click below to find out more about how to create pipelines and run these pipelines.

  • create-pipeline
  • run-pipeline

2 challenges we wanted to solve

  • 1 Displaying only the outputs of a child cell, when multiple parent cells are executed internally
    • When we run a certain component in a pipeline, we execute all dependent components as well. Then, what would be the ideal output when that child component is run?
    • Before we solved the issue - The outputs of all parent components are also displayed
      • unresolved-case-handling-outputs-gif
  • 2 Displaying error messages like Jupyter
    • We hoped to replicate the useability of JupyterLab as much as possible. So, when an error occurs, we wished to replicate the JupyterLab error message and not expose our source code that is operating in the background, acting as unnecessary noise to the user.
    • Before we solved the issue - When an error occurs, the extension source code is also displayed, making it difficult for users to find the error messages about their own source code.
      • unresolved-case-handling-error-message-gif

#1. Displaying only the outputs of a child cell, when multiple parent cells are executed internally

  • Our goal was to display only the output from the “executed component” and store the other outputs from other components internally
    • resolved-case-handling-outputs-gif
  • Description of the original logic for displaying the output of a child cell with multiple parent cells on JupyterLab
  • Development steps to display only the output from the “executed child component”
    • Inspired by capture_output class from IPython.utils module
    • Learnt that stdout & stderr are related to OutStream object, and the displays(images etc.) are related to DisplayPublisher object.
    • Our solution - snitched Outstream and DisplayPublisher objects and modified the write and publish methods respectively to implement our goal.

#2. Displaying error messages like Jupyter

  • Our goal was to provide the identical Jupyter experience that users are familiar with. (We wanted to minimize time lost by having to learn something new. Hence, displaying error messages in the following way allows users to experience the same Jupyter error message environment.
    • resolved-case-handling-error-message-gif
  • Description of the original logic of displaying error messages on JupyterLab
  • Development steps for displaying the error message in the ‘right’ way
    • Found showtraceback method, which is used on ipykernel.zmqshell module, when ZMQInteractiveShell object displays error messages.
    • Eureka! Found _render_traceback_() method within showtraceback method, that allows customizations.
    • Our solution - wrote an algorithm to implement _render_traceback_() to display only the actual error message with zero noise.

Closing Remarks

We have shared our case of deep diving into Project Jupyter, and how we solved the challenges of showing the ‘right’ cell outputs and the ‘right’ error messages. We hope that Project Jupyter users can enhance their understanding of Jupyter and be inspired to solve more challenges with our case story.

Community: Tools and Practices
Poster Placeholder
14:00
14:00
30min
Distributed Data Science for Humans with Dask
Matthew Rocklin

Distributed computing is great! Unfortunately, distributed computing is also hard and often heavyweight. This friction gets in the way of the human+computer joint data exploration process that we value so dearly in the Jupyter ecosystem.

Dask is a popular library for parallel and distributed computing in Python that was co-developed alongside Jupyter with human interaction and interactivity in mind. In this talk we'll discuss Dask in the context of interactive data science, highlighting the ways in which Dask and Jupyter leverage each other to achieve a powerful and scalable user experience that fits easily into your hand. In particular we'll highlight rich notebook outputs, JupyterLab dashboard extensions, and JupyterHub deployment integrations, and how leveraging the extensibility of Jupyter can result in a first-class open source experience

Data Science
Gaston Berger
14:00
150min
Getting Started With Python
Marianne Corvellec, Maria Teleńczuk

While the tutorial attendance is comprised in the conference pass, we ask you to register for this tutorial on https://www.jupytercon.com/tickets as the seats available are limited.

--

This tutorial is an introduction to Python programming using JupyterLab. It is aimed at students with little or no programming experience, and is intended as a follow-along tutorial. It draws inspiration from the late Boston Python Workshop and uses materials from Software Carpentry, which are available under the CC-BY license.

If you register to the tutorial, it is very important that you install the required software ahead of the tutorial. Please refer to the setup instructions below and follow them step by step.

Setup instructions

  1. Open https://docs.conda.io/en/latest/miniconda.html in your web browser.
  2. Click on the latest installer link depending on your OS: Miniconda3 Windows 64-bit for Windows, Miniconda3 macOS Intel x86 64-bit pkg for MacOS, Miniconda3 Linux 64-bit for Linux.
  3. Install Python 3 by running the Miniconda installer (double click on the downloaded file) using all of the defaults for installation. On MacOS and Linux, make sure to check Add Miniconda to my PATH environment variable.
  4. On Windows: open the Anaconda Prompt from the Start menu. On MacOs and Linux: open the Terminal app. Run the following lines (i.e., copy-paste them in the window that just opened up and press Enter):
conda config --add channels conda-forge
conda install ipywidgets=7.6.5 jupyterlab=3.5.3 matplotlib=3.7.0 pandas=1.5.3 voila=0.3.6

You have just installed the packages required to follow the tutorial.
5. Type jupyter lab in the Anaconda Prompt / Terminal. After JupyterLab has launched, click the “Python 3” button under “Notebook” in the launcher window, or use the “File” menu, to open a new Python 3 notebook.
6. To test your setup, run the following code in a cell of the notebook:

import pandas as pd
table = pd.DataFrame(
    {'Time': [0, 1, 2, 3],
     'Emma': [0, 10, 20, 30]}
)  
table.plot();

You should see a plot display right below the code cell.

Tutorial
Room 3 (Tutorial)
14:00
30min
ITS_LIVE: Jupyter and cloud native formats to map climate change.
Luis Lopez

ITS_LIVE is a NASA project that produces low latency, global glacier flow and elevation change datasets. The size and complexity of this data makes its distribution and use a challenge. To address these problems, ITS_LIVE was built for modern cloud-optimized data formats and includes easy-to-use Jupyter notebooks for data access and visualization.

This presentation will show how ITS_LIVE uses the Pangeo stack to generate Zarr data cubes that make fast access possible without the need of back-end services. We will also delve into our data access strategy and how we leverage and enhanced the Jupyter ecosystem by implementing native map projections and services in ipyleaflet to visualize big geospatial data in a matter of seconds.

Jupyter for Research and Scientific Discoveries
Louis Armand 1
14:00
30min
Understanding and Visualizing Dependencies between Notebook Cells
Colin Brown

Understanding and Visualizing Dependencies between Notebook Cells

In Jupyter IPython notebooks, variable declarations are global, so a variable defined in one cell can be referenced, mutated, or redefined in any other cell. Each reference or mutation leads to a cell dependency where one cell should be executed before the other because these dependencies can span the entire notebook and understanding which cells need to be re-executed after a change can be challenging. “Run all” strategies rely on top-down ordering that can be broken by modifying or inserting cells and may also cause needless execution of cells that do not require updates. A reactive kernel can automatically update the necessary cells when cell dependencies are captured and non-circular (e.g., in dataflow notebooks). However, there may still be cases (e.g., long-running computations) where that solution is problematic. In this talk, we present two interactive visualization techniques, the minimap, and cell dependency graph, that let users examine and navigate cell dependencies and show how this helps them understand, organize, and execute notebook cells.

Minimap

One approach is to create a map of cells in the notebook linked according to their dependencies; a “minimap” reduces each cell to a point labeled by a truncated line of code and mirrors the top-down cell ordering in the notebook. Edges connect these points to show dependencies between cells, but most edges are hidden until a cell is selected. When a cell is selected, cells that are upstream or downstream of it shift left or right, respectively, to show dependencies, and immediate dependencies are connected with lines. These interactions allow users to easily understand the impact of changing cells and resolve misunderstandings about dependencies. For example, when given a notebook with a cell that loads in data, we can quickly know all the cells that this data depends on in our notebook and which cells must be re-run as a result of our changes. Observable, a Javascript notebook environment, introduced this style of minimap and inspired our approach. In an Observable notebook, however, there is at most one output per cell. In a Jupyter IPython environment, a single cell may generate many outputs, making the visualization more challenging. Having a minimap allows for faster understanding and navigation based on this information without the screen clutter of a traditional graph and can easily be viewed while casually creating a notebook. This additional knowledge and understanding can prevent tricky situations leading to notebook irreproducibility as users now know which cells directly impact others and can make better decisions about when to re-execute.

Cell Dependency Graph

However, users may be more familiar with standard node-link graph diagrams, where cells are nodes with variables nested within them, and dependencies are links. This structure allows the topology of the notebook to be more easily understood from the graph as links are not hidden like in the minimap. This topological information helps prevent users from the misunderstanding that cells they have placed out of order in the notebook may have to be re-run before reaching their previous results. This practice allows for exporting partial notebooks that may be shared with colleagues. Microsoft’s Gather provides one way to export portions of notebooks to share with others. However, we provide a way to view and understand what is being exported by selecting and highlighting inside the graph. Using graphs in the notebook encourages interaction in new ways from structural points of view and allows users more control over the notebook via graphs. The linking of notebook and graph interactivity promotes better practices in notebooks by preventing unwanted behavior and provides a step forward in better understanding.

Community: Tools and Practices
Louis Armand 2
14:00
150min
Write, Document, Test and Distribute Python Packages With Jupyter & Quarto
Wasim Lorgat, Hamel Husain, J.J. Allaire

nbdev is an exciting literate and exploratory programming framework that provides developers with 10x productivity in Python. With nbdev, you write your tests, documentation and software in one context: a Jupyter Notebook. Nbdev leverages Quarto to render documentation sites giving you additional power to customize your documentation.

In this tutorial, we will walk you through how to use nbdev and provide an overview of some of the underlying technologies such as Quarto and execnb.

Please see the tutorial website for more information and a detailed outline.

Tutorial
Room 2 (Tutorial)
14:05
14:05
15min
Reproducible figures for scientific publication
Thijs van der Plas

Reminder - The date and time of this sessions are placeholder due to the limitation of the conference software.
All Poster will be presented during the Poster Session from 6:30 to 8:00 on Thus 11th,

https://cfp.jupytercon.com/2023/talk/AKPRE8/

--

Reproducible Figures (RFs) are (multi-panel) figures that are produced completely and only by code, and can therefore be reproduced (by anyone). RFs are by definition not compiled or edited by illustration software, whose editing steps cannot be traced in the final product. RFs therefore have the potential to accelerate scientific progress, while putting the scientific principles of rigor and reproducibility at the forefront. Firstly, because a RF is uniquely defined by the code that created it, all underlying data analysis and statistical methods are traceable, aiding its reproducibility as well as facilitating (others) to extend the analysis to different data or parameters. Furthermore, avoiding repetition of figure-making saves time across different versions, projects and people, hence creating a resource for the community. These advantages are recognised by the broader scientific community [Samota & Davey, 2021], and scientific journals such as eLife have recently called for a move towards reproducible figures (e.g., through the use of executable publications) [Maciocci et al., 2019].

Even though modern programming languages and packages possess the capability to create RFs - demonstrated by a growing body of RF-featuring publications*, many scientists (who are not software developers) are unfamiliar, inexperienced or hesitant to use them, while guidelines remain scarce [Lasser, 2020]. In fact, although many advanced visualisation packages exist [Bokeh, 2018; Waskom, 2021], what is often holding back scientists from creating fully RFs is the ability to easily fine-tune basic figure elements. These tasks, such as adding panel labels or aligning elements across panels, are often simple in illustration software, but non-trivial in code.

We’ve created a Python package of functions that perform these basic tasks, essential for creating RFs that should be at the level of scientific publication, based on matplotlib [Hunter, 2007]. Our package includes a complete walk-through tutorial of all functions, as well as demonstrations of common customisation operations (that are too user- or project-specific to warrant new functions). These include; aligning axis limits, eliminating unnecessary and duplicate items, aligning panel labels, customising labels and legends etc.

Hence, our contribution is not a new set of advanced visualisations, but rather short, low-level functions and tutorials that perform actions that are easy in illustration software but unintuitive to many entry-level programmers. Time is wasted on creating manually compiled figures that slow down scientific progress by their lack of reproducibility. This package will enable more scientists, who are not software developers by training, to easily create RFs for their research purposes.


*: For a collection see https://github.com/jupyter/jupyter/wiki#reproducible-academic-publications, as well as publications by the authors:
https://doi.org/10.1101/2021.11.09.467900
https://doi.org/10.1101/2021.12.28.474343

References:

Bokeh Development Team (2018). Bokeh: Python library for interactive visualization
URL http://www.bokeh.pydata.org.

J. D. Hunter (2007), "Matplotlib: A 2D Graphics Environment", Computing in Science & Engineering, vol. 9, no. 3, pp. 90-95.

Lasser, J. (2020). Creating an executable paper is a journey through Open Science. Communications Physics, 3(1), 1-5. https://www.nature.com/articles/s42005-020-00403-4

Maciocci, G., Aufreiter, M., & Bentley, N. (2019). Introducing eLife’s first computationally reproducible article. eLife Labs, https://elifesciences.org/labs/ad58f08d/introducing-elife-s-first-computationally-reproducible-article

Samota, E. K., & Davey, R. P. (2021). Knowledge and Attitudes Among Life Scientists Toward Reproducibility Within Journal Articles: A Research Survey. Frontiers in Research Metrics and Analytics, 35. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8276979/

Waskom, M. L. (2021). Seaborn: statistical data visualization. Journal of Open Source Software, 6(60), 3021.

Jupyter for Research and Scientific Discoveries
Poster Placeholder
14:20
14:20
15min
Jupyter for education with Noteable
James Stix

Reminder - The date and time of this sessions are placeholder due to the limitation of the conference software.
All Poster will be presented during the Poster Session from 6:30 to 8:00 on Thus 11th,

https://cfp.jupytercon.com/2023/talk/AKPRE8/

--

In this online poster we will share the main benefits and story behind Noteable, a widely used subscription service packaging Jupyter notebooks into subject-specific packages and integrated into existing learning systems.
Noteable has been successfully delivered as a service across the University of Edinburgh, and is being provided to a number of leading universities across the UK and internationally. Noteable is being used across academic areas and subjects to provide access to Jupyter notebooks with Nbgrader, multiple instructors and collaboration with JupyterLab . Noteable makes use of a number of open source libraries, extensions and packages to provide the service.
The poster brochure shares examples of using the service across education, the provision of Noteable free of cost to Scottish school teachers, and an overview of the innovations and steps ahead for Noteable.

Jupyter in Education
Poster Placeholder
14:30
14:30
30min
Elyra an AI development workspace based on Jupyter Notebooks
Luciano Resende

Do you love Jupyter Notebooks, but are getting tired of the Wild Wild West of external tools and devOps tasks that consumes a lot of your time away from model development? In this talk, we will introduce you to Elyra, an open-source AI development workspace that enables data scientists, Machine Learning Engineers, and AI developers to be more productive. It provides support for low code/no code AI Pipelines integrated with Apache Airflow and Kubeflow runtimes, integrated Python, Scala, and R editors, collaboration support via git integration, code reusability via code snippets, etc all without having to leave your Notebook workspace. After a quick introduction to Elyra's capabilities, we will dive into a live demo, showcasing how to build and execute a data pipeline in a few minutes.

Data Science
Gaston Berger
14:30
30min
Visual Network Analysis from the comfort of your Jupyter notebook
Guillaume Plique

Jupyter notebooks have quickly become a staple for exploratory data analysis for python-savvy social science researchers. But, to this day, it remains hard to bridge computational practices such as building graphs from social network data using python code and visual network analysis, typically done using desktop applications such as Gephi or Pajek. The intention of this talk is therefore to present, through a series of social sciences-related use-cases, a novel Jupyter widget named "ipysigma" whose goal is to enable notebook users to visually and interactively explore networks. "ipysigma", developed using the graphology JavaScript library and the sigma.js WebGL renderer, makes it very simple to tweak any of the graph's visual variables, such as node size, edge color etc., so that one may understand it better. It supports seamlessly both networkx and igraph graph instances and is also able, using another satellite library named "pelote", to convert pandas dataframes to relevant graphs. We will also demonstrate how ipysigma is able to render synchronized & interactive "small multiples" of a same network so that one can easily compare different features, such as community partitions and adhoc categories. Finally, we should have time to discuss the design issues we faced and the path that led us, at SciencesPo médialab, to build this new tool and the reasons why we finally chose to make a Jupyter widget instead of a dedicated web app.

Jupyter for Research and Scientific Discoveries
Louis Armand 1
14:30
30min
Xeus kernels in the browser
Johan Mabille, Thorsten Beier

Xeus, a native implementation of the Jupyter protocol, facilitates the authoring of new kernels, especially for languages for which the interpreter has a C or a C++ API. Kernel authors can focus on the language-specific parts of their work and don’t have to deal with the protocol. The number of flurishing kernels based on xeus these last years has proven it to be a reliable component of the Jupyter ecosystem.

In this talk, we will talk about the last evolutions of the xeus stack, and how the flexible architecture of xeus made it easy to develop kernels that run entirely in the browser.

We will first give an overview of the xeus ecosystem and the different kernels based on it. We will then dive into the detail of xeus and its architecture, and how users can author new kernels with the library.

In the next section, we will discuss the specifity of WASM kernels, and demonstrate how the change in xeus made it easy to generate kernels that run entirely in the browser.

We will conclude with a roadmap for future developments.

Outline:

  • Overview of the xeus-based kernels
  • Details of the xeus architecture
  • Specificity of WASM kernels and how we adapted xeus
  • WASM kernel generation
Community: Tools and Practices
Louis Armand 2
14:40
14:40
10min
Synchronizing the data science workflow with data management at scale
David Stirling

Reminder - The date and time of this sessions are placeholder due to the limitation of the conference software.
All Poster will be presented during the Poster Session from 6:30 to 8:00 on Thus 11th,

https://cfp.jupytercon.com/2023/talk/AKPRE8/

--

As high-throughput imaging technology has advanced, researchers have been able to acquire increasingly large and multi-dimensional image datasets, which yield even more complex and high-volume derived analytical results. This poses a challenge for data management, image analysis, and data science workflows. The open source platform OMERO, and its commercial counterpart OMERO Plus, offer enterprise-level data management for bioimaging data and associated metadata, including rich image analysis results.

While OMERO Plus offers various data mining interfaces within its browser-based clients, the data science workflow is inherently flexible and bespoke, while still relying on domain-standard tools. Therefore, we have built an integrated data science environment in OMERO Plus via a Jupyter extension, such that the approved data science libraries are already installed and usable alongside the open OMERO API to retrieve and analyse pixels, metadata, and tabular data from the OMERO data management platform and other custom sources of data. Further, standard Jupyter notebooks and data dashboards provide templates for those just getting started with their imaging and data science. Finally, these integrations use approved compute resources and rely on institutionally-defined security profiles for optimised operational control.

We will demonstrate the scalable integration of our data management platform with interactive data analysis and visualisation tools for an enterprise environment.

Enterprise Jupyter Infrastructure
Poster Placeholder
14:50
14:50
10min
Notebooks to support the growth of sea-based renewal energy sector
Simon Chabot, Nicolas RAILLARD

Reminder - The date and time of this sessions are placeholder due to the limitation of the conference software.
All Poster will be presented during the Poster Session from 6:30 to 8:00 on Thus 11th,

https://cfp.jupytercon.com/2023/talk/AKPRE8/

--

The french sea research institute, Ifremer, and its project parters, publicly launched the Resourcecode free software toolbox in march 2022. Resourcecode is the result of a research project that aims to support investment and growth in the wave and tidal energy sector by providing marine data and associated models and software tools, through an innovative online platform.

Logilab was tasked by Ifremer to design and implement a part of that
online platform:

  • a web application at resourcecode.ifremer.fr, developed with javascript by software developers, allows the users to use a map and various filters to search for and select the data they want to process ;

  • the processing of this data can happen online and interactively by using pre-defined tools, these tools being jupyter notebooks maintained by scientists in a GitLab Forge, hosted by the IFREMER ;

  • the processing of this data can also happen offline and is made easier by the resourcecode Python library that can download the selected data and turn it into a Pandas DataFrame.

This setup tries to get the best of both worlds by asking software developers to develop and deliver the application and offering scientists the ability to keep improving the tools and algorithms that
process the data. It also implements good software development practices, by tracking jupyter notebook code in GitLab, providing updates of the online platform thanks to the continuous integration of GitLab and generating the documentation of the tools.

In this talk Logilab will present the technical side of the project to show how the different parts were put together to the service of reproducibility and open science.

Jupyter for Research and Scientific Discoveries
Poster Placeholder
15:00
15:00
30min
Computational reproducibility of Jupyter notebooks from biomedical publications
Daniel Mietchen

In this talk, we present a study that analyzed the computational reproducibility of Jupyter notebooks extracted from GitHub repositories associated with publications indexed in the biomedical literature repository PubMed Central. We will present the key steps of the pipeline we used for assessing the reproducibility of Jupyter Notebooks. The study is based on the metadata extracted from 1419 publications from PubMed Central published in 373 journals. From the 1117 GitHub repositories associated with these publications, a total of 9625 Jupyter Notebooks were downloaded for further reproducibility analysis. The code for the pipeline is adapted from Felipe et al., 2019. We will discuss the results of the study, including variables such as programming languages, notebook structure, naming practices, modules, dependencies etc. that we found in these notebooks. We will then zoom in on common problems and practices, highlight trends and discuss potential improvements to Jupyter-related workflows associated with biomedical publications. This talk is aimed at researchers who use Jupyter notebooks to publish their results in public repositories and help them to use best practices while documenting their research. The slides are available via the DOI 10.5281/zenodo.7854503.

Jupyter for Research and Scientific Discoveries
Louis Armand 1
15:00
30min
Driving down the Memray lane - Profiling your data science work
Cheuk Ting Ho

When handling a large amount of data, memory profiling the data science workflow becomes more important. It gives you insight into which process consumes lots of memory. In this talk, we will introduce Mamray, a Python memory profiling tool and its new Jupyter plugin.

Data Science
Gaston Berger
15:00
30min
Maximizing the Impact of User Feedback: Effective Practices for Community Management
Meag Doherty

User feedback is crucial to any community, as it helps shape the community’s direction and growth. However, managing and processing this feedback can be challenging, especially for large and active communities. This talk will discuss practices for community management teams to effectively handle user feedback and turn it into valuable insights.

We will cover the following topics:

  • Strategies for gathering and prioritizing user feedback, including surveys, polls, and short interviews.
  • Techniques for analyzing and synthesizing user feedback, including data visualization and text analysis tools.
  • Methods for communicating and following up on user feedback, including the use of transparent and consistent processes for decision-making.
  • Best practices for engaging with users and fostering a sense of community ownership.

By the end of this session, attendees will have a better understanding of how to effectively manage user feedback and use it to drive the growth and success of their projects.

Audience:

This conference proposal is suitable for community managers, moderators, and other professionals responsible for online communities. It is also relevant for anyone interested in understanding how to effectively process and use user feedback to shape the direction of their communities.

Objectives:

  • To provide attendees with strategies for gathering and prioritizing user feedback.
  • To discuss techniques for analyzing and synthesizing user feedback.
  • To share community practices for communicating and acting on user feedback.
  • To explore ways to engage with users and foster a sense of community ownership and responsibility.
Community: Tools and Practices
Louis Armand 2
15:00
10min
Share Jupyter Notebook as a web app with Mercury framework
Piotr Płoński, Aleksandra Płońska

Reminder - The date and time of this sessions are placeholder due to the limitation of the conference software.
All Poster will be presented during the Poster Session from 6:30 to 8:00 on Thus 11th,

https://cfp.jupytercon.com/2023/talk/AKPRE8/

--

Mercury is an open-source framework that converts Jupyter Notebook into an interactive web application. Users can tweak widgets values and execute the notebook. The final result can be downloaded as a PDF or HTML. Notebook access can be restricted with authentication. You can easily share notebooks with non-technical users, exposing notebooks as web apps, dashboards, scheduled reports, and interactive slides. RunMercury.com

Community: Tools and Practices
Poster Placeholder
15:10
15:10
10min
Good Practices, Missed Opportunities and the use of Jupyter notebooks for Inquiry-based Learning
Jose Montero

Reminder - The date and time of this sessions are placeholder due to the limitation of the conference software.
All Poster will be presented during the Poster Session from 6:30 to 8:00 on Thus 11th,

https://cfp.jupytercon.com/2023/talk/AKPRE8/

--

Jupyter notebooks allow the presentation of programming code, text, figures and other multimedia content in an interactive way which, in principle, makes them an ideal tool for education. Not surprisingly, the number of publications dealing with Jupyter notebooks in teaching have increased rapidly during recent years. Inspired by the Open Software philosophy, most of these notebooks are intended as open educational resources. However, few of these notebooks take into consideration basic teaching and learning principles, a problem that potentially results in poorly designed content and/or little reuse. This talk is a wake-up call on the need for implementing well-established educational principles into Jupyter notebooks for creating content of superior educational value. We will address this subject putting a special emphasis on the development of Jupyter notebooks in an inquiry-based learning (IBL) context. In the talk, we will address the different steps of creating an IBL activity using Jupyter: (i) formulating the initial question, (ii) the resources for solving this question (the Jupyter notebook itself), (iii) guidance of the students through the notebook and (iv) assessment.

Jupyter in Education
Poster Placeholder
15:20
15:20
10min
Introducing Jupyter Scheduler: Native support in Jupyter for running and scheduling Jupyter notebooks as jobs
Jason Weill, David Qiu, Andrii Ieroshenko, Piyush Jain, Brian Granger

Reminder - The date and time of this sessions are placeholder due to the limitation of the conference software.
All Poster will be presented during the Poster Session from 6:30 to 8:00 on Thus 11th,

https://cfp.jupytercon.com/2023/talk/AKPRE8/

--

Many Jupyter users want to run their notebooks in batch processing and production environments. Some popular and well-regarded open source projects and products now let users run and schedule notebooks as jobs. Until now, though, Jupyter itself has not offered its own notebook job capability. This talk will introduce you to Jupyter Scheduler, a new open source JupyterLab and Jupyter Server extension that lets all Jupyter users run and schedule notebook jobs via a simple yet powerful UI and API.

Jupyter Scheduler provides several benefits. First, notebook jobs can let users run more notebooks than their system could normally handle, without the overhead of setting up additional infrastructure. Second, notebook jobs improve the user experience of managing and parametrizing multiple long-running notebooks. Third, notebook jobs, unlike other types of production options, can include rich output (charts, data, widgets, etc.) and narrative text (markdown cells) in a periodic fashion, suitable for reporting. Fourth, in many cases, notebook jobs are a better way of moving to production than the traditional process of converting notebooks to Python scripts, containers, and CI/CD infrastructure.

Jupyter Scheduler, which is now part of the official Jupyter Server Subproject, provides an extensible notebook job architecture and user experience for notebook jobs. Not only does this capability enable users to run and schedule notebook jobs out of the box with Jupyter, it also enables third parties to plug different backends into the system to run notebooks as jobs on different platforms. In this talk, we will cover Jupyter Scheduler for end users who would like to run notebook jobs and for developers who would like to implement a new backend.

For end users, we will show how to install the extension, and use it to create, update, and manage both “run now” and scheduled jobs. We will demonstrate how to create and manage jobs with Jupyter notebooks, and walk through the different types of jobs supported by the scheduler. While this is not a full tutorial, you will be able to follow along and create your own jobs by installing the Scheduler extension prior to the talk.

For developers, we will also describe the architecture of this extension and how it allows you to customize both the user interface and the backend. Jupyter users are on a wide range of platforms: cloud providers, Kubernetes clusters, supercomputers, etc. Jupyter Scheduler addresses users’ needs by offering public extension APIs for running jobs on different platforms and for customizing the user interface for job creation. For example, Amazon Sagemaker Studio and Amazon SageMaker Studio Lab have created extensions for Jupyter Scheduler that let users run, schedule, and store notebook jobs on AWS through SageMaker.

Lastly, we will go over the future roadmap for this project, and talk about some of the interesting features expected in future releases. By the end of this session, you will have a good understanding of how to install, use and customize the Jupyter Scheduler extension in your notebook workflows. We are also very interested in learning about your unique use cases, so that we can adapt this extension to serve a wider Jupyter community.

Enterprise Jupyter Infrastructure
Poster Placeholder
15:30
15:30
30min
Jupyter as a Brain-Computer-Interface
Matthew Elliott

We developed a Jupyter interface to conduct machine learning experiments on living human brain tissue. Our group, the Braingeneers, is a collaboration between neuroscientists, computational scientists, and engineers from University of California Santa Cruz, San Francisco, Santa Barbara, and Washington University St. Louis working to incorporate AI algorithms in the design of neuroscience experiments. We found Jupyter to be an indispensable tool that allows us to accomplish two key goals:
1. Rapidly design automated experiments on neurobiology
2. Share results with collaborators and students

Over two years, our group has created a customized open-source Jupyter environment which we call WetAI, because its intention is to run AI algorithms with wet biology brain organoids. Brain organoids are sesame seed-sized brain-like tissues that are grown from stem cells in the lab, such that they can be induced to form neural circuits. We have used WetAI to control automated experiments that are conducted on human brain organoids. Code written in Jupyter implements an experimental protocol or learning algorithm that defines how a computer communicates with neurons. The WetAI portal provides experimenters with access to the brain-computer-interface and the other biotechnological devices that maintain tissue in optimal condition. Researchers can view a livestream of the tissue underneath a microscope and send drugs to the culture medium using Jupyter widgets. Our neuroscience experiments take place over multiple days, and this entire process is automated through Jupyter without a biologist needing to be present.

We also used WetAI to teach programming and to provide data analytic tools to the general scientific community. WetAI was instrumental in teaching underrepresented high school students scientific coding in physically distant locations. Youtube tutorials were embedded into notebooks, and the Jupyter interface was customized to be tablet and smartphone friendly. We designed WetAI to also share datasets and analytic tools with the general scientific community. The WetAI docker was customized to work with the Broad Institute’s notebook based sharing platform, Terra. This made it possible for other laboratories to analyze datasets from our experiments using WetAI.

We expect WetAI will increasingly be used by other labs to conduct remote neuroscience experiments. We are also expanding the number of high schools and colleges in our student outreach program. We hope WetAI inspires other groups to use Jupyter to unify research, education, and collaborations.

Jupyter for Research and Scientific Discoveries
Louis Armand 1
15:30
30min
Machine learning with dirty tables: encoding, joining and deduplicating
Jovan Stojanovic

Data scientists and analysts working with Jupyter are too often forced to deal with dirty data (with typos, abbreviations, duplicates, missing values...) that comes from various sources.

Let us step in the shoes of a data scientist, and with a Jupyter Notebook try to perform a classification or regression task on data coming from a collection of raw tables.

In this tutorial, we will demonstrate how dirty_cat, an open source Python package developed in our team, can help with table preparation for machine learning tasks and improve results of prediction tasks in the presence of dirty data.

Some of the common problems we will be tackling are:
- joining groups of tables on inexact matches;
- de-duplicating values;
- encoding dirty categories with interpretable results.

And all of this on dirty categorical columns that will be transformed into numerical arrays ready for machine learning.

Examples of individual features can be seen here:
- https://dirty-cat.github.io/stable/
- https://github.com/dirty-cat/dirty_cat/tree/main/examples (link to Jupyter notebooks)

Data Science
Gaston Berger
15:30
10min
Reproducible workflows with Jupyter: case study in materials simulation research
Hans Fangohr

Reminder - The date and time of this sessions are placeholder due to the limitation of the conference software.
All Poster will be presented during the Poster Session from 6:30 to 8:00 on Thus 11th,

https://cfp.jupytercon.com/2023/talk/AKPRE8/

--

Target audience

Scientist, engineers and researchers more widely. Our example project is based on Python, but key points are independent from the programming language used. It is not necessary to know anything about materials simulation.

Summary

We have developed a Python library [1] that acts as an interface to two existing research simulation tools (OOMMF and mumax3, which are based on Tcl/C++ and Go/CUDA, respectively) and which have comparable functionality. Through the Python interface, simulations can be conducted from Jupyter notebooks. Each simulation can use either tool through the uniform Python interface.

The whole project, bringing together the Python interface with additional analysis tools, is known as Ubermag [2], and available as open source [3]. The software allows to simulate magnetic materials at the micrometre scale, but many of the concepts should be transferable to other domains.

The contribution will describe the design ideas behind the Python library and describe typical use cases of the computational simulation studies which are orchestrated through Jupyter notebooks. Data analysis is also conducted from the Jupyter notebook. For many problems, an iterative and explorative cycle is possible and effective.

We discuss how the setup can help scientists to move to more reproducible workflows and publications [4]. This includes the one-study one-document nature of the approach to capture the steps that have been carried out.

For more reproducible publications, we propose to publish a set of Jupyter notebooks with each publication, where the notebooks are used to compute central figures and statements of the paper [4].

A challenge is the preservation and creation of software environments in which the notebooks can be executed (potentially using software outside the notebook, which might be called from the notebook). We mention Binder [5] as a possible option here.

We discuss other benefits of the notebook-based approach to computational science, including no-install creation of software environment via Binder, easy documentation of software using notebooks as sphinx input, executable tutorials (with Binder) and automatic testing of documentation and reproducibility using nbval [6].

References

[1] Marijan Beg, Ryan A. Pepper, Hans Fangohr, "User interfaces for computational science: a domain specific language for OOMMF embedded in Python", AIP Advances 7, 056025 (2017) https://doi.org/10.1063/1.4977225

[2] Marijan Beg, Martin Lang, Hans Fangohr, "Ubermag: Towards more effective micromagnetic workflows", IEEE Transactions on Magnetics 58, 7300205 (2021) https://doi.org/10.1109/TMAG.2021.3078896

[3] Ubermag home page https://ubermag.github.io and sources (2022) https://github.com/ubermag

[4] Marijan Beg, Juliette Belin, Thomas Kluyver, Alexander Konovalov, Min Ragan-Kelley, Nicolas Thiery, Hans Fangohr, "Using Jupyter for reproducible scientific workflows", Computing in Science & Engineering 23, 36-46 (2021) https://doi.org/10.1109/MCSE.2021.3052101

[5] Project Jupyter, M. Bussonnier, J. Forde, J. Freeman, B. Granger, T. Head, C. Holdgraf, K. Kelley, G. Nalvarte, A. Osheroff, M. Pacer, Y. Panda, F. Perez, B. Ragan Kelley, and C. Willing, “Binder 2.0 - Reproducible, interactive, shareable environments for science at scale”, Proceedings of the 17th Python in Science Conference, pp. 113 – 120 (2018) https://jupyter.org/binder

[6] Hans Fangohr, Vidar Fauske, Thomas Kluyver, Maximilian Albert, Oliver Laslett, David Cortes-Ortuno, Marijan Beg, Min Ragan-Kelly. "Testing with Jupyter notebooks: NoteBook VALidation (nbval) plug-in for pytest" https://arxiv.org/abs/2001.04808 (2020) https://github.com/computationalmodelling/nbval

Jupyter for Research and Scientific Discoveries
Poster Placeholder
15:30
30min
The Spyder debugger: An interactive debugger based on Jupyter technologies
Carlos Cordoba

One of the main features of scientific programming is its exploratory nature: starting from some input data, the goal is to analyze it in order to understand what it can tell us about the phenomena that generated it. However, the means to do this are often unclear, and the results unforeseen. That is why this type of programming requires tools for rapid, interactive prototyping that allow users to seamlessly switch solutions to tackle the problem at hand. Unfortunately, it is not possible to follow the same approach while debugging, because traditional debuggers are mostly focused on letting users explore the call stack and state of variables, and have limited capabilities to run code.

Spyder, a community developed, open source IDE written in and for Python, aims to bridge the gap in that area. It blends the debugger with the interpreter to allow data exploration at any point during code execution, not just at the end. For that, Spyder's debugger attempts to offer the same functionality as a full IPython interpreter, so that its users can debug their code in the same way they are used to doing the rest of their scientific programming.

In this talk, I will cover the debugger features available in Spyder 5, as well as those planned for future versions. Specifically, I will present a live demo to showcase the following features (described in depth in this blog post and the Spyder documentation):

  • How to start the debugger and set breakpoints.
  • How to better understand the code at a certain frame by writing code snippets in the debugger itself, facilitated by syntax highlighting, code completion, multi-line editing and command history.
  • How to move up and down in the call stack to explore other frames in the same way.
  • How to use Spyder's Variable Explorer to browse the contents of objects.
  • How to generate Matplotlib plots while in the debugger.
  • How to use IPython magics while debugging to profile code using %timeit, explore the filesystem with %cd and %ls, and open files with %edit.

Thanks to these improvements, Spyder transforms debugging from a task that feels foreign to scientific programming, to be almost second nature. By adding breakpoint calls or setting breakpoints in the debugger, users can easily prototype different solutions for their problems at any point during code execution, and not just at end. This also increases programming speed by allowing to constantly check the correctness of code during development.

At the end of this talk, I hope attendees will learn that they don't need resort to print() or other workarounds to debug their code. Instead, they can rely on the robust debugging methods used by professional developers, supported by the interactivity and workflow they're familiar with in IPython/Jupyter.

Community: Tools and Practices
Louis Armand 2
15:40
15:40
10min
Astronomy Notebooks for All: A Project to Develop More Accessible Jupyter Notebooks
Patrick Smyth, Jenn kotler

Reminder - The date and time of this sessions are placeholder due to the limitation of the conference software.
All Poster will be presented during the Poster Session from 6:30 to 8:00 on Thus 11th,

https://cfp.jupytercon.com/2023/talk/AKPRE8/

--

This poster summarizes the "Astronomy Notebooks For All" project undertaken in 2022-2023 by Space Telescope Science Institute. We are researching how Jupyter Notebooks can be made more accessible through paid usability testing sessions with disabled developers, scientists, and students. Our focus is on the compatibility of Jupyter Notebooks with assistive technologies such as screen readers, magnification, and braille readers. Additionally, we are developing potential solutions for the problems we find that may be incorporated upstream into the wider Jupyter project.

This poster will examine:

  • Accessibility issues with Jupyter Notebooks and how these problems frequently keep disabled people out of careers in data and bench sciences.
  • Our user-centered method of tackling this difficult problem by cycling between paid usability feedback sessions with impacted users and implementation of accessibility enhancements. We repeat this until we create a benchmark notebook built with a more accessible structure to contribute back to the community.
  • Stories of how Jupyter accessibility impacts community members, including case studies of people with disabilities deterred from pursuing careers in STEM by inaccessible tools.
  • Specific problems we have uncovered so far and solutions we have tried for those problems. Solutions will focus on both individual authors and the wider codebase.
  • Our team from a variety of organizations (Space Telescope Science Institute, Iota School, Quansight) with a diverse array of skills used in the project.
  • How members of the Jupyter community can contribute to further work towards accessibility and inclusion in this area.
Community: Tools and Practices
Poster Placeholder
16:00
16:00
30min
Break
Gaston Berger
16:00
10min
Per-cell Remote Execution on Jupyter
Hwiyeon Cho, Hooncheol Shin, Hagsoo Kim

Audience level: Novice
- Everyone who knows Project Jupyter

Introduction

JupyterLab is an IDE that is loved by many in the fields of data science and machine learning. Jupyter provides an outstanding, interactive feature that allows the REPL based execution and review of cell-level code, and facilitates data exploration and machine learning experiments. It is used by many including students and experts who apply Jupyter for their work.
Data science and machine learning code generally require large amounts of computing. Operating these code on personal laptops or local environments may require excessive amounts of time or fail to run successfully due to a memory shortage. These issues can be resolved by installing JupyterLab on a high computing power workstation and and access it via port forwarding, or deploying it on a Kubernetes cluster. Using a remote workstation’s JupyterHub or JupyterLab can lead to issues on the shared resources. If the IPython kernel connected to the Jupyter notebook is not terminated, resources, such as the memory and the GPU, will not be returned. This means that other users of the workstation will not be able to use those resources when they need to.
We thought of new ways to execute code remotely on JupyterLab while avoiding these issues. We were able to implement a remote execution feature that allows codes to run on remote environments per the user request. Link allows each pipeline component (i.e. each Jupyter cell) to run either locally or on a designated remote environment. Moreover, the resources used for the execution is returned automatically, leading to a more efficient shared resource management. In next section of this note, we will explain the design of Link’s remote execution feature.

Remote exectution on Link

Link pipeline consists of one or more components, and each component corresponds to one Jupyter cell. Each component has properties, and properties contain information from the local or remote environment. Depending on the execution information, components can be executed in an independent environment. Link executes code in a specific environment according to the user request and returns the resources used. As a result, users can efficiently use and manage the shared resources of workstation.

Figure 1: Design of per-cell remote execution

Per-cell remote execution is designed and composed of a message queue, data store, and remote worker as shown in Figure-1. Local Link and remote Link workers communicate with each other through message queue and data storage. The message queue manages running tasks, and the storage stores data such as code and objects. Remote execution of each component operates in the following process.

  1. Serialize and transfer the selected cell’s code and parent cells’ data to the remote worker via the message queue and data storage.
  2. Remote worker receives the task from the message queue, deserializes the code and data from the data storage and executes the code.
  3. Execution results and the output data is serialized and transferred to the local environment via the message queue and the data storage.
  4. The local environment receives the results from the message queue and imports the output data from the data storage

Figure 2: Add a remote worker

Figure 3: Select components to execute remotely

Link can connect to a remote worker using the message queue and data storage access information. Users can register with an easy-to-understand alias. After successfully connecting to the remote worker, users can select certain components to run on this worker, and the selected components will be executed remotely. This information is available even when the computer is turned off and on again, even after several days.

Summary

JupyterLab is an IDE loved by many developers ranging from junior students to experts in the fields of data science and machine learning. Data science and machine learning code require large amounts of computing, and executing these codes in individual local environments may require a lot of time or may fail due to a lack of memory. These issues can be overcome by installing and JupyterLab on a high compute workstation and utilizing that environment. However, using a remote JupyterLab may lead to shared resources not being returned correctly, leading to problems in using these shared resources among different users. In order to avoid these problems, we have implemented the remote feature to run only parts of the code on the remote environment, as requested by the user. Link allows user to designate and run respective components (i.e. cells) on either local or remote environments. Link enhances efficiency even further by automatically returning the shared resources upon the execution of the code.

Enterprise Jupyter Infrastructure
Poster Placeholder
16:10
16:10
10min
Bridging the gap between climate data and policy makers: The CLIMAAX project example
Milana Vuckovic

Reminder - The date and time of this sessions are placeholder due to the limitation of the conference software.
All Poster will be presented during the Poster Session from 6:30 to 8:00 on Thus 11th,

https://cfp.jupytercon.com/2023/talk/AKPRE8/

--

Citizens in every inhabited place on the planet are increasingly experiencing dramatic consequences of a changing climate. Also within Europe the adaptation gap between the multi-hazard climate risk and the risk management capability is growing. The EU project CLIMAAX (CLIMAte risk and vulnerability Assessment framework and toolboX) addresses this by providing financial, analytical and practical support to climate risk assessment community, allowing an improvement of regional climate and emergency risk management plans. The CLIMAAX Operational Toolbox will consist of existing and improved tools for data access, manipulation, processing, modelling and dissemination. The four main elements of the toolbox will be:
1) A wiki to serve as user guide with full description of the tools involved;
2) Jupyter notebook templates and examples of the workflows of case studies;
3) Access points to the models, data needed and tools for data manipulation and visualisation for the Climate Risk Assessment;
4) Access to computational and storage resources.
Jupyter ecosystem will be the heart of the CLIMAAX toolbox with Jupyter lab enabled for users and Jupyter books for wiki, documentation and templates.
The project will start in January 2023 and in this poster we will share first design and implementation of the toolbox.

Jupyter for Research and Scientific Discoveries
Poster Placeholder
16:20
16:20
15min
Space Telescope Science Institute Notebook ecosystem
Lee Quick

Reminder - The date and time of this sessions are placeholder due to the limitation of the conference software.
All Poster will be presented during the Poster Session from 6:30 to 8:00 on Thus 11th,

https://cfp.jupytercon.com/2023/talk/AKPRE8/

--

Space Telescope Science Institute has embarked on a Jupyter Notebook Project that includes, JupyterHub, JupyterLab, CI, repository, policies, processes and best practices. The goal of the Project is to support our astronomical science community by providing curated notebooks that help users understand the data, our tools or science cases. We will also be accepting community authored notebooks that will be supported in this ecosystem. This poster will illustrate the technology and workflows of this project.

Jupyter for Research and Scientific Discoveries
Poster Placeholder
16:30
16:30
30min
Two decades of IPython and Jupyter - strategy, community and technical thoughts for the road ahead.
Fernando Pérez

This is an important year for Jupyter, as we have revamped our entire governance model, looking to give us a solid foundation for another 10 or 20 years of growth (counting from the start of IPython back in 2001). We discussed that transition in a separate talk - here I would like to reflect from a personal perspective on a few aspects of the project.

As part of this transition, I stepped down as "project BDFL". This was a decision driven by many reasons; among others, I think the context of open source communities today is radically different from 20 years ago, and our view with Jupyter is centered around a multi-stakeholder community, where no single actor should concentrate too much authority. From day one, IPython, then Jupyter, were collaborative efforts. Many people put in their ideas, code, feedback and energy to create something we can all collectively own, adapt for our needs, and extend in new directions. While it's necessary to build a sense of vision and direction for any project to be successful in the long run, a "dictator" is entirely the wrong metaphor to base that work on. Python itself has moved on from that model, and we hope more projects will continue to find better ways to harness the energy of their entire community.

The space where Jupyter exists today is challenging: we have had phenomenal success on many fronts. From users of Juptyer itself to those who build on our foundations, standards and protocols in new directions, the impact of the project goes far beyond our original hopes and expectations. But, as well stated in the classic Innovator's dilemma, this type of large-scale success is itself a potential trap. While not a commercial entity, Jupyter still needs to provide value to all of its stakeholders, with the added challenge that its own strategic directions are not decided by any single actor.

This conference is an opportunity to learn from each other on both what has worked so far, and how the next decade will be very different. If we are to continue providing value, these are some of the challenges I hope we'll tackle together:

  • Jupyter beyond the "Global North": Jupyter should be a tool for democratizing access to science and education, but in a way that empowers all to participate, not only to access and consume. This means we need to continue growing our community of developers, contributors and leaders in countries beyond those most traditionally represented (which for Jupyter, is highly concentrated in the US and Europe). Important efforts have made progress on this front already, but we should make it a priority to build culturally-relevant, contextually appropriate spaces for local communities to develop new tools and use cases, as well as contributing back into the main project.
  • AI: as of this writing, 2022 is shaping up as a watershed year in AI, with advances like ChatGPT causing ripples in the industry and even general media. These tools open a new front on the idea of "interactive conversations with humans and the computer" that Jupyter has already explored. They also tend to be developed and controlled by a few large and powerful corporations. Jupyter should explore its own version of how to ensure these tools can empower everyone, while innovating with use cases that capitalize on our architecture.
  • What are the layers of the "core of Jupyter" that its stakeholders should agree upon, and continue building as 100% open-source, vendor-independent technology, while so many of these stakeholders have their own business goals? These goals often depend on Jupyter-based products, so there is an inherent tension between what goes into the "Jupyter commons" and what is the value added by a specific vendor. Rather than wishing this tension away, we should recognize it and work productively within these constraints.

To explore these and related questions, we should reflect on identifying the scope of Jupyter - Jupyter tools cover many different aspects of computing, yet there's a common theme. I'll present my own take on this, informed by my interactions with many of you for 20+ years, in the hope that it can lead to useful conversations for our next period.

I hope this talk will be a useful point of reference for some of the ideas that the project has built so far, but focused on thinking of the future. While I will discuss them from my limited, personal perspective, my goal is to motivate thinking at the strategic, community and technical levels.

Organiser Choice.
Gaston Berger
16:35
16:35
15min
Sharing notebooks onto the European Open Science Cloud
Manon Marchand

Reminder - The date and time of this sessions are placeholder due to the limitation of the conference software.
All Poster will be presented during the Poster Session from 6:30 to 8:00 on Thus 11th,

https://cfp.jupytercon.com/2023/talk/AKPRE8/

--

Reminder - The date and time of this sessions are placeholder due to the limitation of the conference software.
All Poster will be presented during the Poster Session from 6:30 to 8:00 on Thus 11th,

https://cfp.jupytercon.com/2023/talk/AKPRE8/

--

Open science best practices are yet to be clearly defined, but a recurring medium is in all conversations: the computational notebook. However, sharing these is of no use if they cannot be re-executed by the other scientists. In this communication, we'll expose a set of best practices for sharing a publication-ready notebook based on a literature review and on compatibility with the virtual science environments provided on the European Open Science Cloud.

Community: Tools and Practices
Poster Placeholder
16:50
16:50
15min
Addressing global sustainabilty challenges with Jupyter and cloud-based geospatial data platforms
Tyler Erickson

Reminder - The date and time of this sessions are placeholder due to the limitation of the conference software.
All Poster will be presented during the Poster Session from 6:30 to 8:00 on Thus 11th,

https://cfp.jupytercon.com/2023/talk/AKPRE8/

--

There is widespread recognition that the Earth's climate is being changed due to human activity. The world's inhabitants (including humans, animals, and other organisms) are already experiencing many negative impacts, such as displacement from sea-level rise, ecosystem degradation, and increases in extreme weather events. People need access to better information so they can understand the impacts that are already occurring, predict what changes will occur in the future, mitigate the changes we can prevent from occurring, and prepare to adapt to those changes that we cannot mitigate. We have a global grand challenge of making living on Earth sustainable.

Fortunately, in recent decades there has been a rapid increase in data ability about the state of the Earth. Earth observing satellites have been continuously providing data for over 50 years, constantly improving in frequency and resolution. Earth system models generate weather predictions of the coming weeks and climate projections of what may happen in the next few decades. And an increasing number of data providers are making this data openly accessible.

However, there are numerous barriers that prevent decision makers (whether individuals, businesses or governments) from utilizing these data.

  1. The data volume/velocity is a major challenge for most potential users.
  2. The data need to be processed into actionable information that is appropriate for those making decisions. This critical work of designing processing workflows is being done by scientific researchers and data analysts, who need to have both scientific understanding and access to technologies that help them develop implement those workflows.
    We need to lower the barriers so that more people can participate in addressing global sustainability challenges.

These barriers are being addressed through the use of data-proximate computing systems that are optimized for geospatial data analysis. These systems move compute close to data storage, and are improvement over the historical practice (and bottleneck) of downloading large datasets for local analysis. Some examples of these systems include:

  • Pangeo a open source software stack consisting of Jupyter, Dask, Kubernetes.
  • Earth Engine Google's cloud-based geospatial analysis system, including a >50 PB data catalog.

A remaining challenge is how to improve the navigate, exploration, and analysis of data residing in these systems so that hypotheses can be rapidly tested, workflows can be prototyped, and applications can be rapidly deployed to less technical decision makers.

This presentation will cover:

  • Any overview of global sustainability challenges and how cloud-based geospatial tools that can be used to address them.
  • How Jupyter technologies simplify working with petabyte scale datasets including:
    • Operator and function overloading and custom rich display methods to facilitate analysis of complex objects.
    • Jupyter Widgets and their bi-directional interactions, allow for interactive exploration through space and time. Important examples include ipyleafet, ipytree, bqplot.
  • earthengine-jupyter - a Python package for working with Google Earth Engine from within a Jupyter notebook.
  • Examples of analyzing 1000's of satellite images and decades of climate data.
Jupyter for Research and Scientific Discoveries
Poster Placeholder
17:00
17:00
30min
What’s New in JupyterLab 4.0
Frederic Collonval, Martha Cryan

There are some exciting new features and enhancements shipping with JupyterLab 4. JupyterLab is a web-based user interface for scientists and developers for data exploration, analysis, and visualization. JupyterLab provides a Jupyter notebook editor, code editor, code console, terminal, debugger, and more as core extensions. In the talk we will highlight some of the recent new features such as the settings editor, real time collaboration and the reworked extension manager.

Organiser Choice.
Gaston Berger
17:20
17:20
15min
Accelerate your ML Cycle from Model development to deployment using Jupyter (feat. Extension Link)
Jongsun Shinn, Jongseob Jeon

Reminder - The date and time of this sessions are placeholder due to the limitation of the conference software.
All Poster will be presented during the Poster Session from 6:30 to 8:00 on Thus 11th,

https://cfp.jupytercon.com/2023/talk/AKPRE8/

--

1. Introduction

Jupyter has been empowering many ML projects especially in the initial stage, due to its interactive and versatile nature.
On the other hand, Jupyter is commonly regarded as a less suitable option for collaboration among data scientists or for productizing developed models.
To cover these weaknesses and promote Jupyter as a key development tool throughout projects, LINK, as a Jupyter extension, offers a Pipeline feature.

2. What is LINK ?

LINK supports exporting and importing pipelines such as Kubeflow and ArgoWorkflow pipeline and also guarantees reproducibility.
Data scientists can easily make pipelines in YAML format with LINK by simply clicking the export button on UI without having to learn the complexities of using SDKs, such as KFP.
Once a pipeline is uploaded, it is difficult to handle pipeline codes when collaborating with others.
LINK supports importing pipelines in YAML format and reproduces the code and the pipeline, so other data scientists can easily compare python code in the notebook cells and execute version control for the project.

3. How could LINK be used in real world projects?

In the real world, data scientists in Makinarocks use LINK to develop anomaly detection models in the industrial domain, such as for motors and CO2 laser drill equipments.
From training to deploying a machine learning model in the real world, we found out that building a training pipeline is essential. We build these training pipelines with LINK, which allows easier integration with other MLOps tool like Kubeflow Workflow, MLFlow, Seldon Core and Stream Serving. The following are the steps we executed:
1. We made the training pipeline which train and save ML models to MLflow on jupyter notebook.
2. By simply exporting to Kubeflow pipeline, we can execute reproducible pipeline runs and track run histories.
3. We deployed our model from MLflow using Seldon Core to serve streams. Models that are deployed must have the same environment as the training pipeline built from jupyter notebook.

Data Science
Poster Placeholder
17:30
17:30
15min
Wrap up

We'll wrap up for the day/week, and give you informations about the evening /next day

MISC
Gaston Berger
17:35
17:35
15min
The “Share” button for Jupyter Notebooks - A generic service for publishing and sharing notebook files
Shravan Achar, Zach Sailer, Sathish kumar Thangaraj

Reminder - The date and time of this sessions are placeholder due to the limitation of the conference software.
All Poster will be presented during the Poster Session from 6:30 to 8:00 on Thus 11th,

https://cfp.jupytercon.com/2023/talk/AKPRE8/

--

A key innovation of Jupyter Notebooks is that they are both computational AND narrative documents—i.e. they are meant to be read by an audience. However, “sharing” a notebook is no easy task. JupyterLab (and other Jupyter applications) does not provide a generic service for sharing notebooks between notebook servers.

Introducing the Notebook Sharing service—an open-source, generic, configurable service and set of JupyterLab extension that make notebook sharing easy and intuitive. Users see a “Share” button in the top right corner of every Jupyter Notebook and a new “Sharing” side panel in JupyterLab to browse shared files. This service offers a generic interface for talking to any storage backend (e.g. DropBox, Amazon S3, Box, local file system, etc.) and works seamlessly with JupyterHub to enable notebook sharing between all hub users

In this talk, we will give a live demo of this service and highlight the strengths and value it brings to Jupyter community, such as:

  • generic authentication module, extensible collaborator model and optional remote storage backend APIs
  • Jupyter server- and JupyterLab-extensions that provide a complete Notebook sharing experience
  • Bonus user-experience features, like preview a shared notebook inside JupyterLab, preview a shared notebook on a web browser with a URL, download a copy and filter all shared files
  • customization extension points for searching collaborators and storing with different backends

Finally, we would like to invite audience to collaborate on this project and solicit feedback on how to improve this software. We would also like to invite members to submit implementations for their favorite storage systems

Prerequisistes

  • The talk will be fairly technical. Basic familiarity with JupyterLab concepts such as server extension, kernels, REST APIs, front-end technologies like React
  • Familiarity with JupyterLab’s content API is a plus.
Community: Tools and Practices
Poster Placeholder
17:45
17:45
45min
Lightning talks

You did not had a chance to present, or had an idea during JupyterCon, here is your chance to have a 4 minutes presentation about it.

You may register during the day for the lightning talks.
At the entrance level of the conference you will find a number of index cards, a box, and pens.

  • Write clearly the title of the proposal and your name.
  • Put in int the box.

The proposal / talk does not need to be polished, it does not need to be an existing project, nor be your project.
It does not have to be about programming. It can be about waffle, or it can be about shoes. You are allowed to not have slides. It is recommended to make puns.

You have 4 (FOUR) minutes max.

At the end of the day we'll select talks at random.

Please sit near the front if you have submitted a talk.

proposals = {...}
while still_time():
    on_stage = waiting_area
    on_stage.plug_in()
    proposal = random.pick(proposals). 
    proposal.author.walk_to(waiting_area)
    waiting_area = proposal
    on_stage.present(timeout=4min)
    on_stage.unplug_and_walk_away()
MISC
Gaston Berger
17:50
17:50
10min
Scicode-widgets: A toolkit to bring computational thinking into the classroom
Alexander Goscinski

We introduce scicode-widgets, a Python package designed for educational purposes that facilitates the creation of interactive exercises for students in interdisciplinary fields of computational sciences. The implementation of computational experiments often demands extensive coding, hindering students' ability to effectively learn the interwork of coding experiments and analyzing their results. To reduce this workload for students, instructors can already provide a codebase and demand educational contributions from students. These contributions have to be embedded into a general workflow that involves coding experiments and analyzing their results. For that task, scicode-widgets provides the tools to connect custom pre- and post-processing of students’ code, with the ability to verify the solution and to pass it to interactive visualizations driven by Jupyter widgets.

Others
Poster Placeholder
18:00
18:00
15min
Plywood Gallery - a new framework to generate python documentation via notebooks
Jan-Hendrik Müller

Reminder - The date and time of this sessions are placeholder due to the limitation of the conference software.
All Poster will be presented during the Poster Session from 6:30 to 8:00 on Thus 11th,

https://cfp.jupytercon.com/2023/talk/AKPRE8/

--

Plywood Gallery is a new framework for generating Python documentation via notebooks. It can be used for creating cheat sheet like overviews of Python packages with image output.
The unique feature: Code snippets are represented by images on the generated documentation website. This makes it easy to see how slight changes in code correspond to slight changes in the output image.

Here are a couple of examples:
1. https://kolibril13.github.io/plywood-gallery-matplotlib-scalebar/
2. https://kolibril13.github.io/plywood-gallery-functions/
3. https://kolibril13.github.io/plywood-gallery-napari/

How it works: A notebook cell and its output are saved in a code-image-pair, when the plywood cell magic is called. This pair is then in real-time rendered in an HTML-page.

During JupyterCon, I'd like to show this Open Source framework to people and brainstorm with them how this could be useful in their everyday python workflows.

The target audience are people who use python packages with image outputs, such as Matplotlib, napari, scikit-image, and manim.

This poster presentation will furthermore contain an interactive part:
My plan is to have a table with all website div elements (images, code blocks, cursor and highlighters) printed on cardboard.
Together with interested JupyterCon participants, these elements can be arranged to all sorts of gallery layouts. The aim of this interactive part is to find out what a good example gallery would look like for people from the community.

Community: Tools and Practices
Poster Placeholder
18:15
18:15
15min
A tale of notebook recovery: session reconnects, execution recoveries and more
Parul Gupta

Reminder - The date and time of this sessions are placeholder due to the limitation of the conference software.
All Poster will be presented during the Poster Session from 6:30 to 8:00 on Thus 11th,

https://cfp.jupytercon.com/2023/talk/AKPRE8/

--

Notebook disconnects are one of the most frustrating user experiences in the enterprise world of the client-server jupyter model. If the user refreshes the notebook or loses internet access while executing a long-running notebook on a remote server, data or notebook executions will be lost. Re-running everything again will lead to increased resource utilization, waste users' time and impact user productivity.

In this talk, we’ll describe how we, at Meta, built a reliable solution to reconnect the notebook to an existing session and recover the notebook executions from that session on the server to the client. This workflow has already been deployed successfully in Bento, Meta’s Jupyter distribution. This talk will navigate through the client-server architecture of jupyter and then, get deeper into the technical solution to session reconnection and recovery for notebook executions.

Through this talk, we want to bring awareness among the audience on the importance of recovery of notebook executions, how we implemented a reliable solution for seamless recovery at Meta, and more explorations to persist notebook sessions, save compute and improve user productivity.

Enterprise Jupyter Infrastructure
Poster Placeholder
18:25
18:25
90min
Poster Session
Louis Armand 1
18:30
18:30
90min
Poster session
Louis Armand 2
18:30
90min
Poster Session
Room 1
18:30
90min
Poster Session
Room 2 (Tutorial)
18:30
90min
Poster Session
Room 3 (Tutorial)
18:30
90min
Poster Session

This is dedicated time for the poster session.

This time is reserved for conference attendees and presenter to discuss various posters in front of light refreshments while waiting for the reception.

Poster Presenter will be able to hang posters at the beginning of the conference.

Please ignore each poster individual time in the "Poster Placeholder" room.

Posters
Gaston Berger
19:00
19:00
15min
JupyterApps a platform to develop, deploy and share single-page web applications from jupyter notebooks
Nicolas Chauvat

Reminder - The date and time of this sessions are placeholder due to the limitation of the conference software.
All Poster will be presented during the Poster Session from 6:30 to 8:00 on Thus 11th,

https://cfp.jupytercon.com/2023/talk/AKPRE8/

--

JupyterApps is a web plateform that leverages JupyterLab, Voila, JupyterHub, GitLab, Docker and Kubernetes to empower its users to develop, deploy and share single-page web applications.

Similarly to Binder, JupyterApps can clone a code repository (hosted by GitLab, GitHub or Heptapod) containing a Jupyter Notebook application, then build a docker image containing an environment with all the dependencies needed to run the notebooks. The container can then be started from the web interface built on top of JupyterHub. As opposed to Binder, JupyterApps can be given user credentials to work with private repositories.

JupyterApps supports several "types" of applications: running JupyterLab, running a Jupyter notebook with Voilà, running a Streamlit application or running a virtual desktop.

JupyterApps also implements basic collaboration features, including granting access to groups of users and sharing data among project members.

JupyterApps is used at Logilab to provide an exercises platform for professional trainings (mainly on Python programming). Trainees get to run their own environment to read course manuals, practice with self-correcting exercises within notebooks, run course-specific web tools, etc.

JupyterApps is currently being tested by some clients to enable scientists to design and share single-page applications that chain data computation and data visualisation without having to master web development skills. These applications built by R&D experts can then be widely used by production users.

This talk will focus on the main workflow of the application and the interactions between the different parts of the architecture.

Others
Poster Placeholder
19:55
19:55
95min
Reception
Louis Armand 1
20:00
20:00
90min
Reception
Louis Armand 2
20:00
90min
Reception
Room 1
20:00
90min
Reception
Room 2 (Tutorial)
20:00
90min
Reception
Room 3 (Tutorial)
20:00
90min
JupyterCon 2023 Reception
Gayle Ollington

Second Night JupyterCon reception,

After the poster session where light refreshment were served, come enjoy an evening to mingle with our speakers and attendees.

Some of our sponsors have also scheduled activities, and we might have a few announcements.

So see you there.

MISC
Gaston Berger
20:15
20:15
15min
Supporting new pedagogical approaches in Education using Jupyter Hubs at Berkeley
Balaji Alwar, shane knapp

Over the last 5 years, UC Berkeley has utilized a campus-wide JupyterHub to advance the learning objectives of courses across the University, including Engineering, Data Science, Natural Science, and Social Sciences disciplines. UC Berkeley hosts a cloud-based Jupyterhub for educational purposes, with up to 5000 users per week and 11,000 users per semester. More than 60+ courses from 20+ departments within UC Berkeley teach using Jupyter Hubs. A core constituency of courses are foundational courses in Data Science and quantitative methods, with a long tail of advanced courses.

Basic use cases for JupyterHubs at UC Berkeley are i) Instructors demoing notebooks during their lectures, ii) Graduate Student Instructors running notebooks during their labs, and iii) Students working on individual homework and assignments. However, there is no one size fits all solution that could solve the diverse instructional needs of the Berkeley community. A multi-level support team listens to the needs of the instructors and engineers our hubs to solve their requirements. This poster will describe several innovative use cases where the Jupyterhub server is used to expand beyond the basic use case. In each case, specific engineering was required to adapt the base image or to have specific extensions adapted to use cases.

Database Hub - Database fundamentals in Data Science Courses:
Deployed MongoDB and PostgreSQL server per student for a Data Science course that teaches the fundamentals of Data Engineering. Instructors teach the fundamentals of databases by connecting their Jupyter notebooks to databases mounted on the persistent volume disk for each student. Students perform large computations in their Jupyter notebooks and store the values in the databases.
In addition, the core data science methods class deployed the SQLite database on the Network File Sharing (NFS) server to teach database fundamentals. This deployment did have occasional problems with race conditions causing errors with the NFS server.

Shiny Hub - Teaching quantitative Social Science using Shiny Dashboard:
Shiny dashboarding functionality was deployed for a quantitative political science course to explore relationships between different variables. Students click on nbgitpulller link which launches into the Shiny hub and they can explore visualizations through the Shiny dashboard.
Shiny via Jupyterhub was a fast and seamless use case for R users (Predominantly Social Science instructors) to build/engage with dashboards.

Biology Hub - Share complex genomic datasets with students through shared storage:
Biology instructors utilize common “shared read-write” directories to store large and complex datasets (running into a few GBs) from genetics data. Biology instructors also act as hub admins and have write access to these directories. Students load datasets into their Jupyter notebooks from shared directories with read access. The Genomics workflow also means that the Biology hub requires more compute than the base hub.

Linux Desktop hub - Explore simulations in engineering courses by using open-source GUI tools:
The efficiency of the Jupyterhub in authentication, scaling resources, and shared file server makes a compelling use case to deliver a more graphical user interface, e.g. Linux Desktop.
The “Designing Information System and Devices'' course in the Electrical Engineering and Computer Science department uses pyqt5 which is a python-based cross-platform software for a visual demonstration using the Linux desktop hub.
Deployed a new Linux Desktop hub to launch a desktop environment and installed the Jupyter-qgis application which is extensively used in Civil Engineering class to teach geospatial explorations.
Deployed a Julia-based simulation called FUND model which is used to calculate the Global Cost of Carbon as part of a course lecture using this hub.

RTC Hub - Facilitates collaboration in project-based work through Real-Time-Collaboration (RTC) pilot feature:
Experimentally deployed an RTC hub which supports a course about reproducible computing. We enabled Real-Time Collaboration (RTC) for Spring 22 semester where multiple students worked as part of projects on a single Jupyter notebook file.
This pilot deployment ran into data corruption issues which were isolated to be caused by RTC. We were able to provide valuable feedback to the upstream JupyterLab developer team based on the learnings from this deployment.

Workshop Hub / High School Hub
In an outward facing deployment, this hub has provided a small amount of compute which allows for the demo of materials to non-UC Berkeley users.

Open Hub - Support new users with open access to a generic hub:
Foster innovative use with an open hub that anyone with Berkeley uses without management overhead. In many cases this has dynamically led to new deployments where instructors can try out new materials and understand the teaching workflow.

Jupyter in Education
Poster Placeholder
21:30
21:30
30min
Venue closes at 22:00 (10PM)
Gaston Berger
02:30
02:30
90min
Third day, Friday
Louis Armand 2
08:00
08:00
60min
Badges
Gaston Berger
09:00
09:00
15min
intro remarks

Intro remarks

MISC
Gaston Berger
09:15
09:15
45min
Paul Romer Keynote
Paul Romer

For this closing day we will reflect on Paul Romer Keynote and his expertise on open-source.

Keynotes
Gaston Berger
10:00
10:00
30min
Break
Gaston Berger
10:00
30min
Break
Louis Armand 1
10:00
30min
Break
Louis Armand 2
10:30
10:30
30min
Accelerating Discovery for NASA Cryosphere Communities with JupyterHub
Tasha Snow, James Colliander

Brief Summary:
Science is not composed of isolated groups of practitioners, but is rather an interconnected network of communities of practice, with members who fluidly move between them. Infrastructure for scientific research and collaboration should leverage this structure to make science more productive and inclusive. You will learn about the JupyterHub tools, community, and best practices being developed to achieve these goals around a specific NASA science mission and research objectives in a project known as the CryoCloud.
Science is not composed of isolated groups of practitioners, but is rather an interconnected network of communities of practice, with members who fluidly move between them. Infrastructure for scientific research and collaboration should leverage this structure to make science more productive and inclusive. You will learn about the JupyterHub tools, community, and best practices being developed to achieve these goals around a specific NASA science mission and research objectives in a project known as the CryoCloud.

Outline:

For NASA’s upcoming Year of Open Science, NASA and other federal funding agencies have begun to transition their data stores and computing into the cloud. However, substantial barriers exist for individual users to make the transition from their local systems to the cloud to accomplish research goals and fully utilize new cloud capabilities: cloud cost opacity, infrastructure deployment complexity, and a general lack of community awareness and knowledge, among others.

To address these challenges, we have built a persistent JupyterHub in partnership with the International Interactive Computing Collaboration (2i2c), called CryoCloud (cryointhecloud.com), for NASA Cryosphere communities. We provide a persistent hub space across a series of hackathon-style workshop events to help the NASA ICESat-2 Science Team and related Cryosphere researchers build community and transition to a collaborative cloud workspace. CryoCloud models how to build a research community around a specific science mission and research objectives, while enabling learning and creation of the technical knowledge to facilitate NASA’s open-source, interconnected, and science-accelerated vision of the future. We gather user data to understand research needs, imagine tools required to streamline collaboration, and build community best practices. We share examples of how these JupyterHub cloud tools make scientific computing more intuitive, cost- and time-efficient, and open for all.

This presentation is intended for cloud engineers, scientific programmers, data scientists, and all other interested people. Attendees are expected to have no specific knowledge background. Familiarity with geospatial datasets and Earth science is useful but not required.

Jupyter for Research and Scientific Discoveries
Gaston Berger
10:30
30min
IPywidgets: From an experiment in the Notebook to Production-ready data app
Maarten Breddels

IPywidgets is the go-to library for building interactive interfaces in the Jupyter notebook. What may start as an experiment with a few sliders and buttons, may evolve into a company-wide data app that needs to be production-ready quality.

The biggest issue we see for applications is managing the complexity of the code.
In the JavaScript world, React is considered a library that helps conquer that problem. The main reasons for this are its declarative nature and its ability to define encapsulated components that can be composed to form larger components of applications.
Reacton is a pure Python implementation of the React library, that enables the same style of programming as React, but in Python using the ipywidgets libraries.
On top of that, Reacton adds type information, allowing type checkers such as mypy to find bugs before they occur in production.

The next problem is how to run your ipywidget application outside the Jupyter notebook or Jupyter Lab environment.
Voilà is the goto library for running notebooks as standalone web applications and has support for ipywidgets. However, each request requires a new kernel, which requires starting a new process. This requires a lot of resources, and different processes also make it hard to share memory between different users/requests.
Solara allows you to write ipywidget-based applications, using a server that is running “virtual kernels” in a single process, saving resources, and leading to faster page loads.
On top of that, Solara will come with many Reacton-based components, and support for multiple pages, making it the perfect framework for writing production-ready data apps.

Others
Louis Armand 2
10:30
55min
Ipytone: Interactive Audio in Jupyter
Benoît Bovy

Jupyter already has a rich ecosystem of widgets that together allow using it as a powerful platform for interactive data visualization. However, to my knowledge no generic widget library has been proposed yet for exploring data through sound. In parallel, there exists a few programming environments for interactive creation of sound and music (e.g., Sonic-Pi, FoxDot), but those are isolated applications focused on code and hardly reusable in general-purpose environments.

Ipytone is a widget library that fills this gap by providing many audio components (i.e., oscillators, filters, effects, synthesizers, samplers, etc.) to Python and Jupyter. Those components are part of the Tone.js library, which is built on top of the Web Audio API for creating interactive music and sounds in the browser. Ipytone exposes each component as a widget, making the library very flexible and tightly integrated with the rest of the Jupyter widget ecosystem. Ipytone aims at turning Jupyter into a versatile DAW (Digital Audio Workstation) and at democratizing “data sonification”, a fascinating although still largely unexplored area!

This talk will introduce the audience to Ipytone through various examples and demos, hopefully with live sound! These will range from basic usage (e.g., creating a simple synthesizer and playing it) to more advanced usage (e.g., reproducing NASA’s sonification of a Hubble deep space image in a notebook using Ipytone and other Python/widget libraries).

Useful links:

  • Ipytone repository: https://github.com/benbovy/ipytone
  • Ipytone examples repository: https://github.com/benbovy/ipytone-examples
  • NASA Hubble data sonification: https://www.youtube.com/watch?v=H-Ci_YwfH04

Audience: intermediate (basic knowledge of ipywidgets)

Community: Tools and Practices
Room 1
10:30
30min
Leveraging the Jupyter ecosystem to create and run the "Machine Learning in Python with scikit-learn" massive open online course
Loïc Estève

We, a team of scikit-learn core developers and contributors, created the
"Machine Learning in Python in scikit-learn" MOOC (Massive Open Online Course)
in 2021 with the goal of making it accessible to an audience without a strong
technical background.

Since then, we have run three sessions of the MOOC, with an average of roughly
10,000 registered participants, and have reused the material for scikit-learn
courses in a variety of settings, for example Python conference tutorials,
remote scikit-learn training and in-person university courses.

In this talk, we will describe how we leveraged tools within the Jupyter
ecosystem to develop the course material and teach it, in particular:
- JupyterBook and Jupytext to develop the material in a convenient and
collaborative fashion
- JupyterHub to give learners a zero-install live environment during the MOOC
session
- Binder for convenient fall-back for tricky installation issues together with
its integration into JupyterBook

We will also share the lessons we learned along the way while developing the
material, running the MOOC and teaching the material.

We will conclude with some of our ideas to improve the course, for example:
- using Jupyterlite in our JupyterBook setup and potentially replace our
JupyterHub in the longer term. Towards this goal, we already have started
investigating issues we found in Pyodide scipy and scikit-learn packages
- moving away from classic notebook to Retrolab
- moving to jupyterlab-myst to better support MyST markdown inside notebooks
and get rid of our custom scripts to genenerate HTML admonitions

The content of the course is available under a CC-BY license at
https://inria.github.io/scikit-learn-mooc and the associated repository at:
https://github.com/inria/scikit-learn-mooc. The MOOC is available at:
https://www.fun-mooc.fr/en/courses/machine-learning-python-scikit-learn/.

Jupyter in Education
Louis Armand 1
10:30
150min
Predictive survival analysis and competing risk modeling with scikit-learn, scikit-survival, lifelines, Ibis, and DuckDB (Part 1)
Guillaume Lemaitre, Olivier Grisel, Vincent Maladiere

While the tutorial attendance is comprised in the conference pass, we ask you to register for this tutorial on https://www.jupytercon.com/tickets as the seats available are limited.

Tutorial notebooks:

  • https://github.com/soda-inria/survival-analysis-benchmark

According to Wikipedia:

Survival analysis is a branch of statistics for analyzing the expected duration of time until one event occurs, such as deaths in biological organisms and failure in mechanical systems. [...]. Survival analysis attempts to answer certain questions, such as what is the proportion of a population which will survive past a certain time? Of those that survive, at what rate will they die or fail? Can multiple causes of death or failure be taken into account? How do particular circumstances or characteristics increase or decrease the probability of survival?

In this two-part tutorial (morning and afternoon), we will deep dive into a practical case study of predictive maintenance using tools from the scientific Python ecosystem. Here is a tentative agenda:

Part 1 (Morning)
- What is time-censored data and why it is a problem to train time-to-event regression models.
- Single event survival analysis with Kaplan-Meier using scikit-survival.
- Competing risks modeling with Nelson–Aalen, Aalen-Johansen using lifelines.
- Evaluation of the calibration of survival analysis estimators using the integrated brier score (IBS) metric.
- Predictive survival analysis modeling with Cox Proportional Hazards, Survival Forests using scikit-survival, GradientBoostedIBS implemented from scratch with scikit-learn.
- Estimation of the cause-specific cumulative incidence function (CIF) using our GradientBoostedIBS model.

Part 2 (Afternoon)
- How to use a trained GradientBoostedIBS model to estimate the median survival time and the probability of survival at a fixed time horizon.
- Measuring the statistical association between input features and survival probabilities using partial dependence plot and permutation feature importance.
- Presentation of the results of a benchmark of various survival analysis estimators on the KKBox dataset.
- Extracting implicit failure data from operation logs using sessionization with Ibis and DuckDB.
- Hands-on wrap-up exercise.

It is not recommended to attend Part 2 without having attended Part 1.

Target audience: good familiarity with machine learning concepts, with prior experience using scikit-learn (you know what cross-validation means and how to fit a Random Forest on a Pandas dataframe).

Tutorial
Room 3 (Tutorial)
11:00
11:00
30min
Jupyter Server—the workhorse that drives nearly all Jupyter web applications
Zach Sailer, David Brochart

Jupyter Server is the core web server that powers most Jupyter web applications—it is widely used in research, education, and enterprise. Beyond the core web server, the Jupyter Server Project offers a powerful collection of plugins and extensions that enable deployers to build a Jupyter web application that best suits any workflow.

In this talk, we will lean heavily on live demos to showcase some of the core functionality, features, and strengths of Jupyter Server. To get the audience familiar with Jupyter Server, we will

  1. demonstrate how to install, launch, and configure a server.
  2. customize the server overriding one of Jupyter's core services.
  3. expand Jupyter Server's capabilities by authoring a server extension

We will, then, highlight some of the new features offered by Jupyter Server 2.0, including the

  1. identity and authorization APIs for secure access to a shared server,
  2. server-side features that enable real-time collaboration in Jupyter documents,
  3. event system that powers many features in JupyterLab and beyond.

Finally, we will share some perspective on the future direction of Jupyter Server and invite the audience to get involved with the project. We will share some unique ways that the Jupyter Server Team is encouraging and empowering new contributors through its weekly "Contributing Hour".

While Jupyter Server is fairly technical component in the Jupyter stack, we believe this talk offers helpful information for audiences of all levels. Particularly, if you are deploying Jupyter Server's for your research lab or company, or you are interested in becoming a Jupyter Server contributor, you will not want to miss this!

Others
Louis Armand 2
11:00
30min
Otter-Grader: A Lightweight Solution for Creating and Grading Jupyter Notebook Assignments
Suraj Rampure, Christopher Pyles, Justin Eldridge, Lisa Yan

Otter-Grader is a lightweight open-source command-line tool for developing and grading Jupyter Notebook assignments at scale. It enables instructors to produce an assignment and its autograder from just a single notebook.

Otter was developed by Christopher Pyles, while working with Data Science Undergraduate Studies at UC Berkeley. Since its pilot in 2020, Otter has been adopted by instructors at a wide variety of institutions, from a university in Japan to a high school in North Carolina, and has been deployed in courses with enrollments ranging from 15 to 1500+.

Attendees will find our talk particularly useful if they’ve created notebooks for educational purposes, and/or if they’ve worked with grading infrastructure such as nbgrader or Gradescope.

Part 1: Authoring Assignments

We’ll start by demonstrating how to author assignment notebooks in Python using Otter.

One of the reasons Otter is so convenient is that an entire assignment and autograder can be developed in just a single “source” notebook. That notebook consists of exposition, solution code that students need to produce, inline autograder tests, and other metadata. After creating a source notebook, a single use of the otter assign command-line tool produces a student-facing version of the notebook. In this notebook, students only see the skeleton code their instructor wants them to start with (rather than the solution), and instead of seeing the nitty-gritty details of all autograder tests, they only see calls to the function grader.check, which displays the test cases that their code for a given question failed.

Part 2: Releasing and Collecting Assignments

In addition to creating a student-facing assignment notebook, otter assign also generates a portable autograder.zip file that instructors can run to compute grades. This autograder can be run anywhere that pip install otter-grader can be run – most commonly, this is in a Docker container on a personal computer or on Gradescope, a popular LMS.

We will demonstrate a common workflow in use at multiple institutions, which involves:
Hosting student-facing notebooks on GitHub.
Providing students with nbgitpuller links that open the relevant assignment on an institution-hosted JupyterHub server.
Configuring Gradescope to automatically run all autograder tests and provide feedback upon submission (or, alternatively, autograding submissions locally in a Docker container).

All of this will be illustrated from the perspective of both an instructor and a student.

Part 3: Adaptations, Shortcomings, and Future Plans

A common data science workflow is to use notebooks for exploration, but to write permanent code using an IDE. In one of our courses, we promote this workflow by distributing assignments as notebooks containing the question prompts, while requiring students to submit their work in .py files. To support this use case, we wrote a wrapper around Otter which takes a notebook containing all problem descriptions, solutions, and test code, and generates a student-facing .py file containing skeleton function definitions. Our wrapper allows us to generate our multi-format assignments using a single source document, thereby significantly reducing the likelihood of errors. We will start the third part of the talk by discussing the motivation behind this type of assignment, how we used nbconvert to support the adaptation, and how much easier this adaptation of Otter makes it to create and edit these assignments than the prior pre-Otter solution.

Then, more broadly, we will discuss shortcomings of Otter that have been identified by other instructors. Some shortcomings are pedagogical:

  • One may argue that Otter’s presentation of test cases encourages students to “guess and check” their work.
  • Additionally, depending on the domain, it can be difficult to craft autograder tests when students’ implementations vary significantly. (For instance, in cases where random sampling is involved.)

Other shortcomings are infrastructural:
- The current Otter metadata syntax isn’t supported in third-party platforms like Google Colab.
- Otter does not support question randomization in any way, e.g. it can’t create “versions” of assignments with questions in different orders, which can limit its usefulness for exams.

To conclude, we will summarize recent updates made to Otter and discuss planned future directions, including how we plan to address some of the aforementioned shortcomings.

Jupyter in Education
Louis Armand 1
11:00
30min
Using Jupyter-notebooks to document and support climate and meteorology data
Edward Comyn-Platt, James Varndell

The study of climate and meteorology is underpinned by an ability to use large datasets, however handling these datasets has typically been the role of specialists in the field. Jupyter notebooks provide a novel approach to introducing large complex datasets to a non-specialist audience, where the boundaries between documentation and use-cases becomes fluid.

The Copernicus Climate Change Service (C3S) Climate Data Store (CDS) is an entry point to a broad spectrum of data related to climate and meteorology, from output produced by weather and climate models to observation data from satellites and weather stations. Even when strict data standards are followed, each dataset has its own peculiarities and pitfalls which require some degree of documentation support. Jupyter notebooks, provide the perfect complement to user-guides that typically accompany the datasets available in the CDS. The notebooks provide examples of how to access, download and explore the data, and provide a platform where the producers of the data can demonstrate the qualities of the data and applied use cases.

The C3S is embracing this concept and building a JupyterBook-based training material to support users of the datasets available in the CDS. Such a library will instigate greater use of climate and meteorology in sectoral fields (e.g. health, tourism and transport) and could be used as an educational resource in universities and schools. This library will feature directly in the JupyterHub-based online development environment that will be integrated into the modernised version of the CDS.

This session will present the training material we have produced, and the framework we have in place such that data producers are able to easily contribute to our growing library of content. We will demonstrate the importance of this material in providing data users the information required to make use of the data, and how this fits in the plans for modernising our online development environment. This session is aimed at people who may be interested in using the training material we have produced to gain experience handling climate, weather and observation data; and aimed at people who may want to construct a similar, community developed, training material resource.

The CDS is implemented by the European Centre for Medium-Range Weather Forecasts (ECMWF) on behalf of the European Commission.

Jupyter for Research and Scientific Discoveries
Gaston Berger
11:30
11:30
30min
10 Years of Teaching with Jupyter: Reflections from Industry & Academia
Dhavide Aruliah

Teaching & learning with Jupyter notebooks ought to be straightforward, right? Unfortunately, no, not exactly; my experience—both in developing content & in live instruction—suggests otherwise. I'm drawing on a decade of teaching efforts for undergraduate/postgraduate university courses, for Software Carpentry bootcamps, and for corporate or government training engagements.

I've personally wrestled with numerous nuts-and-bolts technological obstacles to promoting computational thinking in learners with Jupyter:

  • deploying robust, "99% invisible" computational environments so learners can start quickly (i.e., without painful software installation or configuration);
  • accessing data smoothly (especially when balancing privacy/security concerns with teaching goals); and
  • managing notebook versions for collaborating instructors.

I'll describe a few technological (and non-technlogical) ways to address the above. These include Nebari (an open-source JupyterHub distribution platform), RISE (a Reveal.js Jupyter slideshow extension), Anaconda Project (for supporting reproducible, shareable data science projects), and EtherPad (for live interactive group collaboration in a text-editor interface). By design, Jupyter notebooks enable immediate learner feedback. That is, learners can modify code interactively to improve their internal mental models of what's going on and do so autonomously—this is ultimately what I want when teaching.

While tooling is critical for designing & maintaining content, I'll steadfastly reinforce my central priority: getting tools out of the way. Ideally, learners should focus on building the knowledge and skills they need to solve their own problems. By design, Jupyter notebooks enable immediate learner feedback. That is, learners can modify code interactively to improve their internal mental models of what's going on and do so autonomously—this is ultimately what I want when teaching.

Finally, I'll discuss pedagogical considerations unrelated to Jupyter itself that play a large role in designing for the goals and culture appropriate to the audience (e.g., academic vs. corporate vs. government). These logistical choices invariably get entangled with planning & execution: synchronous (live, instructor-led) delivery vs. asynchronous (recorded), remote instruction vs. in-person, spaced vs. contiguous delivery, full-length college/university courses (i.e., measured in semesters or quarters) vs. condensed workshops/bootcamps (i.e., measured in hours or days), and so on.

Audience

I’m aiming this presentation at folks from both industry & academia who are primarily interested in teaching & learning. Ideally, they are intrigued by the pedagogical opportunities that Jupyter affords. This intended audience ought to have some experience teaching a class or workshop in front of free-range learners. Prior exposure to Jupyter (ideally in instructional contexts but, if not, in their own projects) is useful but not mandatory. This audience includes practitioners from disciplines outside computer science and engineering (e.g., Finance, BioTechnology, Geology, etc.). I also hope that experienced developers—ones who frequently struggle to convey programming concepts to novices—might appreciate some of the insights I'll share.

Jupyter in Education
Louis Armand 1
11:30
55min
How to Bring Spreadsheet Users to Jupyter
Jake Diamond-Reivich

For the last 3 years, my team and I have been working with some of the largest companies in the world to help transition spreadsheets users to Python. We consistently see data science teams launching JupyterHubs for their business users, but struggling to get adoption from users who rely on spreadsheets.

I will talk about best practices in how to onboard spreadsheet users to Jupyter. This will include how to distinguish the mental model of a spreadsheet and a Python notebook as well as how to select the best spreadsheet workflows to transition to Python.

Then I will go into the reasons that data science leaders want more business users on their JupyterHubs. These primarily fall into three categories:

  1. Jupyter allows business users to work with much larger datasets than they can in spreadsheets
  2. Jupyter allows businesses users to process data much faster than they can in a spreadsheet
  3. Jupyter provides traceability and auditability for your analysis that does not exist in spreadsheets

Next I will talk about all the great open source tools available in Jupyter that can be used to help someone make this transition. These will include:

  1. Openpyxl/Xlwings– a python library to read in and/or write Excel files with Python
  2. Mito (I am one of the creators) – a spreadsheet interface for Python, inside Jupyter. Each edit in Mito generates the equivalent Python.
  3. Lux – automatically create visualizations from your DataFrames, without needing to write Python
  4. Streamlit – easily create data apps on top of your notebook

These packages cover the main categories of Python use that data science leaders want to see from their business users: ingesting data, transforming data, visualizing data, and creating apps on top of data.

Community: Tools and Practices
Room 1
11:30
30min
Jupyter for Copernicus - improve the use of Earth Observation data
Federico Fierli, Julia Wagemann

Copernicus, as the European Union’s Earth Observation program ; is providing an unprecedented amount of data and added-value information on any aspect of terrestrial environment and climate, including data from new generation satellites, reanalysis, historical datasets and forecasts. This is enabling new science, services, operational applications and businesses in the fields of meteorology, climate, air quality, oceanography and, at large, environmental monitoring.
TheJupyter ecosystem is becoming central to support users in discovering, using and analysing this data.

The European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT) is an implementer of the Copernicus programme and this talk will show how Jupyter is used to provide know-how and access toCopernicus Earth Observation data on the following aspects:

Training and awareness - jupyter-based platforms used in events to self-paced approach over a wide range of domains, including atmosphere, climate change, and marine. For example in 2022 have been held more than 50 international events making use of Jupyter environment with more than 5000 users
Data ready - using jupyter notebooks to showcase applications and grant a seamless access to data - providing notebooks and access to hubs through advanced and open platforms .
Among these, the European Weather Cloud and the WEkEO Copernicus Data and Information Access Services (DIAS) as collaboration among key international organisations and agencies (ECMWF, EUMETSAT, EEA and Mercator Ocean International)

Communications and outreach- make use of Jupyter to engage with a large public scientifically literate. This includes competition of jupyter notebooks applied to innovative data analysis and access and practical sessions using Jupyter Notebooks in largely attended massive on-line courses (MOOCs)

Jupyter for Research and Scientific Discoveries
Gaston Berger
11:30
30min
Post-exploitation in Jupyter Environments
Joe Lucas

The same great functionality that makes it such a powerful tool for developers and researchers makes Jupyter a valuable target for hackers. The confluence of data, code and network access, often as a primary tool for developing machine learning applications, give hackers opportunities to impact application development and move laterally around the network. Based on research and operations conducted by the NVIDIA AI Red Team, this presentation will cover common reconnaissance and post-exploitation activities that attackers may try and execute on hosts using Jupyter or running Jupyter infrastructure. These lessons will inform user and administrator awareness to shape defensive practices. Attendees will gain a better understanding of what activities attackers may try and do after gaining access and how to resist, monitor, and counter that activity.

Others
Louis Armand 2
12:00
12:00
30min
Restructuring and Improving JupyterHub Documentation Using the Diátaxis Framework: Experience and Lessons
Allan Wasega, Sarah Gibson

JupyterHub has a range of documentation that covers both developer and user audiences in order to help them deploy, maintain, and use their own instance of a JupyterHub. The success of an open source software project to (i) be adopted by users, and (ii) receive meaningful contributions relies heavily on the quality, navigability and accessibility of documentation so that users and developers have all the information they need to achieve what they want to do.

A framework for organising technical documentation has arisen called Diátaxis. It takes a systematic approach to understanding user requirements of documentation throughout the lifecycle of interaction with a product and posits that different user needs require different approaches in creation of the documentation, as well as a layout to navigate these different “modes” of documentation.

Between December 2022 and March 2023, the JupyterHub project will be participating in the December 2022 round of Outreachy internships with the aim of improving its documentation. The project focuses on refactoring the documentation for the JupyterHub package. As the intern in charge of this process, my work began by performing a review of the present documentation, categorising these into the diataxis framework, and then restructuring the documentation files in the repository. Once the documentation is transformed into this framework, it will be easier to identify missing and unclear documentation (those that were difficult to categorise). Subsequently, the JupyterHub team can curate resources that can fill the gaps and improve documentation that is not specific enough.

This undertaking is not without its challenges, joys, and lessons, which can be extrapolated to other open-source documentation projects. The proposed talk will focus on highlighting these areas from the point of view of the JypyterHub Outreachy intern as well as their lead mentor. Specifically, it will seek to cover three main points within the allocated talk time:

  • What is the importance of well-written and -structured documentation to an open-source project and to JupyterHub, specifically?
  • What is the Diataxis framework and why did JupyterHub select to use it to restructure its documentation?
  • What lessons can other open-source projects learn from JupyterHub’s experience to make clear and well-structured documentation?

The talk targets anyone who authors or contributes to open-source software documentation, including technical writers and team leads. The audience can be working with any programming language but should have intermediate knowledge of technical writing practices, including what it entails and some of the tools used.

Others
Louis Armand 2
12:00
30min
Using Jupyter ecosystem for more accessible open weather forecast data
Milana Vuckovic

The European Centre for Medium-Range Weather Forecasts (ECMWF) is an independent intergovernmental organisation which is producing and disseminating numerical weather and environmental predictions to national meteorological services and other users including commercial customers. As of recently, ECMWF has adopted an open data policy which is being implemented in phases from 2020 to 2026. The first phase included opening hundreds of web forecast charts and making archived data available under a Creative Commons (CC BY 4.0) open licence in 2020, followed by the production of open subset of real time medium range forecast in early 2022. The next steps in 2023 include releasing “Atmospheric Composition Support” dataset and seasonal forecast parameters currently available with 4 and 6 days delay through Copernicus CAMS and C3S programmes, without any delay.
This phased move towards free and open data represents a significant step towards more reproducible open science. However this can not be achieved by only opening the real time data. To realise the full potential of open data, it needs to be easily accessible and with the appropriate supporting information to allow users to derive information and integrate the data into their own research work or application workflows.
To facilitate this, the additional development work is being done. This work includes the design of an API to easily download the forecast data, and the development of open source Python libraries to process and visualise it. To help users understand how to retrieve and process ECMWF data using these libraries, a set of Jupyter notebooks is being created, each of them reproducing one open weather forecast chart from the downloading the data to the visualisation.
This talk will focus on Jupyter notebooks development, from the idea to realisation, through challenges and attempts for automation.

Jupyter for Research and Scientific Discoveries
Gaston Berger
12:00
30min
Visual Programming in JupyterLab with Blockly
Denisa Checiu, Carlos Herrero

Block-based programming offers the unique opportunity of teaching basic yet fundamental programming concepts without the challenge brought on by the specific syntax of text-based programming languages. Wishing to provide a smooth ramp of complexity for learners, we designed a JupyterLab extension for Blockly, such that Jupyter can now be used all throughout a student's learning journey, without the hassle of having to switch to a completely new environment at any point along the way.

In this talk, we will provide a more detailed look at the JupyterLab-Blockly extension, starting from the benefits of using it and our motivation when creating it, going through a well-documented journey of the UI, all towards a live demonstration with the kind of algorithms you can build using our standard blocks.

A relevant aspect of the extension is also its buildability. As such, we will also dive deeper into how JupyterLab-Blockly can be used as a base for other extensions, providing the perfect tools for simple robotics applications. We will go through each step a coder needs to take in order to make their own extension and the possibilities we offer in creating your own blocks, toolboxes and even programming language generators.

Finally, we will offer a glimpse into our very own cool robotics applications which were built on top of the JupyterLab-Blockly extension:
- jupyterlab-niryo, a plugin to control the Niryo One robot,
- jupyterlab-lego-boost, a plugin to communicate with the LEGO® Boost robot.

As a preview, you can read more in our blogpost: https://blog.jupyter.org/visual-programming-in-jupyterlab-with-blockly-7731ec3e113c

Jupyter in Education
Louis Armand 1
12:30
12:30
90min
Lunch
Gaston Berger
12:30
90min
Lunch
Room 1
12:30
30min
Eyes-off data science: a transparently opaque framework for data processing
Jack Fitzsimons

Did you ever notice that the most important and impactful data sources are inherently sensitive in nature - from healthcare to finance. Even seemingly banal information like your commuting patterns, purchasing habits or insurance reimbursements, can tell a nuanced story of who you are and how you can be influenced. This causes a headache for the honest data scientist who is interested in big picture analytics, not spy-stats.

Fortunately, there has been a huge movement over the past 10-15 years to create a new type of tech stack founded on the concept of privacy-enhancing technologies (PETs), which act as brokers of trust between data sources and data scientists.

In this talk we will give an overview of how we have been building (and today launching) a new python framework, which integrates directly with Jupyter via a backend extension, to make using PETs trivial in the day to day functions of a data scientist. We call it Antigranular.

We expect this project to be a long term endeavour and if you are interested in privacy, confidentiality, security and, of course, data science we would encourage you to get involved.

Sponsored
Louis Armand 2
12:30
30min
Taking notebooks to production using Hugging Face
Merve Noyan

Taking scikit-learn models to production or hosting them openly has it's own challenges, like reproducibility, safety and more. We are trying to tackle these problems with our new open-source library skops. skops provides easy APIs to host your models, automatically create widgets for inference and create model cards that documents the model for versioning. In this talk, I will walk you through how you can use skops and Hugging Face Hub for versioning models.

Sponsored
Louis Armand 1
13:00
13:00
60min
Lunch
Louis Armand 1
13:00
60min
Lunch
Louis Armand 2
13:00
60min
Lunch
Room 3 (Tutorial)
14:00
14:00
30min
From Jupyter to MLOps: Jupyter as a key integrator for MLOps
Sangwoo Shim

The MakinaRocks Link plugin is a powerful tool that empowers data scientists to create, manage, and execute complex directed acyclic graphs (DAGs) within the JupyterLab environment. With its caching mechanism, support for parallelism, and remote execution capabilities, Link provides an efficient and user-friendly solution for building and deploying data pipelines. Additionally, Link complies with JupyterLab semantics, making it an ideal development platform for MLOps tools such as MakinaRocks Runway. By using Link to develop and test DAGs, data scientists can ensure seamless integration with their operational MLOps environment.

MakinaRocks Runway provides a comprehensive solution for managing the model training and serving pipelines. It simplifies the process of retraining models with new datasets and parameters, allowing users to accomplish this with a single click. Runway tracks training parameters, pipeline source codes, and the environment used to run training for each registered model. Users can also easily construct and update HTTP APIs and real-time inferences for their models. Updating ML models within running inference services with retrained models is a straightforward process.

In this presentation, we will explore the key features of MakinaRocks Link and Runway and demonstrate how they can help data scientists streamline their workflows and improve their productivity when constructing MLOps loops. We will showcase real-world examples of DAGs built using Link and highlight the benefits of using it in conjunction with MLOps tools such as Runway. Attendees will learn how they can work together to take their data science projects from development to the operational environment.

Sponsored
Louis Armand 2
14:00
30min
Notebook Archaeology: What does an .ipynb file (not) tell us?
David Koop

Jupyter notebooks store code, results, and explanations, making them important artifacts in understanding how insights were achieved. However, these notebooks often do not record the full history of how ideas developed or analyses evolved. In many cases, approaches have been refined over time, and cells are repeated, reordered, or removed. This talk will examine both what we can learn from the details stored in .ipynb files and associated artifacts like IPython session histories as well as techniques to better record the evolution of notebooks in the future.

Inferring Past Events

A notebook represents the current state of one's work, but it also maintains information that helps us learn about what happened in the past. Specifically, the execution counts (those bracketed numbers in the left margin of the notebook) not only provide information about the order cells were executed but also can tell us when a result can no longer be seen. IPython session histories store every block of executed code but are not unambiguously connected to the notebook cells and outputs. However, notebooks and histories together provide information about patterns in notebook development including how often authors edit cells or revisit a notebook at a later date. We can infer a probable history based on these patterns, and improve our prediction when the two artifacts are connected.

Improving Future Records

While algorithms to figure out what happened in the past are useful in understanding existing notebooks, we can also simply record all of the steps in a notebook's evolution---a version history. There are a number of solutions in this space, ranging from alternate notebook formats that mesh better with version control systems to tools that automatically store each version of a cell and its outputs. We will discuss the pros and cons of these different approaches in terms of what is recorded and how it is made available. While having the full history can be useful during development, there may be cases where sharing that history is not desired. At the same time, we will discuss opportunities to move beyond documenting history to using that information to improve work in a notebook. For example, knowing how a user has made changes in the past could allow suggestions for updates in the future.

Community: Tools and Practices
Louis Armand 1
14:00
120min
Predictive survival analysis and competing risk modeling with scikit-learn, scikit-survival, lifelines, Ibis, and DuckDB (Part 2)
Olivier Grisel, Vincent Maladiere, Guillaume Lemaitre

This is part 2 of a two part tutorial. It is not recommended to attend Part 2 without having attended Part 1.

Tutorial notebooks:

  • https://github.com/soda-inria/survival-analysis-benchmark

Here is the agenda of for the full session:

According to Wikipedia:

Survival analysis is a branch of statistics for analyzing the expected duration of time until one event occurs, such as deaths in biological organisms and failure in mechanical systems. [...]. Survival analysis attempts to answer certain questions, such as what is the proportion of a population which will survive past a certain time? Of those that survive, at what rate will they die or fail? Can multiple causes of death or failure be taken into account? How do particular circumstances or characteristics increase or decrease the probability of survival?

In this two-part tutorial (morning and afternoon), we will deep dive into a practical case study of predictive maintenance using tools from the scientific Python ecosystem. Here is a tentative agenda:

Part 1 (Morning)
- What is time-censored data and why it is a problem to train time-to-event regression models.
- Single event survival analysis with Kaplan-Meier using scikit-survival.
- Competing risks modeling with Nelson–Aalen, Aalen-Johansen using lifelines.
- Evaluation of the calibration of survival analysis estimators using the integrated brier score (IBS) metric.
- Predictive survival analysis modeling with Cox Proportional Hazards, Survival Forests using scikit-survival, GradientBoostedIBS implemented from scratch with scikit-learn.
- Estimation of the cause-specific cumulative incidence function (CIF) using our GradientBoostedIBS model.

Part 2 (Afternoon)
- How to use a trained GradientBoostedIBS model to estimate the median survival time and the probability of survival at a fixed time horizon.
- Measuring the statistical association between input features and survival probabilities using partial dependence plot and permutation feature importance.
- Presentation of the results of a benchmark of various survival analysis estimators on the KKBox dataset.
- Extracting implicit failure data from operation logs using sessionization with Ibis and DuckDB.
- Hands-on wrap-up exercise.

Target audience: good familiarity with machine learning concepts, with prior experience using scikit-learn (you know what cross-validation means and how to fit a Random Forest on a Pandas dataframe).

Tutorial
Room 3 (Tutorial)
14:00
30min
The UX of computational thinking
Gabriela Vives

In this presentation we dive into the unique challenges of the user experience of Project Jupyter.

The UX of computational notebooks

Computational notebooks have changed the way we think about interactive computing. By blending together narration and code, they are widely regarded as a concrete implementation of the concept of literate programming [1]. Very popular in educational contexts, notebooks are a unique tool in promoting computational thinking, the ability to present problems and solutions in the form of algorithmic steps.

We will present some of the challenges with the UX of computational notebooks, that is distinct from both document processing tools and development environments.

JupyterLab, an open application framework

While a lot of websites are focused on guiding users towards a predefined goal (such as a purchase for e-commerce websites), JupyterLab provides a space for open-ended exploration and creation, creating unique user experience challenges that have been sparsely explored.

Photo editing software, development environments, CAD tools, all fall in this category and have similar requirements, such as allowing for a complex tiled layout or displaying a lot of personalised information on the screen.

We will show how the foundations of JupyterLab can be reused to build other tools of that category, and present two examples with the JupyterCAD and Glue-Jupyter projects.

Improving the UX of Jupyter

We are launching a new initiative focused on the UX of JupyterLab. How can we uncover issues and improve user experience through UX research without breaking what Jupyter got right and what enabled its global adoption.

We will present the first steps of our initiative, ranging from the triaging of issues to the set up of user tests.

[1] Knuth, Donald E. (1992). Literate Programming. California: Stanford University Center for the Study of Language and Information. ISBN 978-0-937073-80-3.

Sponsored
Gaston Berger
14:30
14:30
30min
Community-first open source: An action plan!
Pavithra Eswaramoorthy, Tania Allard

The open source software ecosystem has multiple mature projects with thriving communities. These communities come many different flavors but there is always a common thread of promoting kindness in communication, improving the contributor and user experience, and working to make the project more inclusive, accessible, and sustainable.

We, the presenters, recently worked to transition a company-backed open source project, Nebari (https://nebari.dev/), to be more community-oriented in its development and governance. We focused on creating a community-first foundation that build on years of learning from other leading communities, including Jupyter, NumPy, Gatsby JS, and more. In this talk, we want to share our journey and the things we learned along the way.

We aim to provide a step-by-step guide for open source projects looking to adopt more community-driven practices. We will discuss everything from repository management, and contributor and maintainer pathways, to documentation and governance principles. This talk will be most helpful for projects in their formative stages and projects transitioning from company-backed models, however we feel everyone can learn something new to implement in their communities.

Community: Tools and Practices
Louis Armand 1
14:30
30min
Inclusive and accessible scientific computing in the Jupyter Ecosystem
Stephannie Jimenez Gacha

Have you ever heard about accessibility?

3, 2, 1 ….

Maybe you have heard the term accessibility before in a particular context, but have you thought about how accessibility fits within the open source development? In this talk, we will dive into the efforts to make Jupyterlab and Jupyter interfaces accessible to a broader audience of users, including those users requiring assistive technologies. We will discuss ongoing efforts around robust accessibility testing framework for projects in the Jupyter ecosystem, addressing critical accessibility issues in Jupyterlab and developing documentation focused on best practices for accessibility compliance and disability inclusion. And, more importantly, how you can participate in making open source more accessible to all, whatever your role is: user, contributor, maintainer, advocate or any other.

At the end of the talk, the attendees will be able to:
1. Have a basic understanding of accessibility and how it ties to open source software such as Jupyterlab.
2. Learn about current efforts to make Jupyterlab more accessible.
3. Get resources for learning more about accessibility and to engage with the accessibility efforts in the Jupyter ecosystem.

Community: Tools and Practices
Louis Armand 2
14:30
30min
What's new and exciting in JupyterHub
Min Ragan-Kelley, Erik Sundell

Learn about what's new and coming soon from members of the the JupyterHub team. First, get a bit of context about what problems JupyterHub aims to solve, and how it fits into the Jupyter ecosystem. We'll present some highlights of recent developments and exciting new plans for the JupyterHub subprojects. From JupyterHub to BinderHub and repo2docker to and Zero to JupyterHub on Kubernetes, we have lots of cool new things to show you, including improved collaboration support via JupyterLab's real-time collaboration and read-only access, support for newer Pythons on Binder, and useful Grafana dashboards for monitoring your deployments. Finally, we'll let you know where you can contribute to JupyterHub to help solve the problems you face.

Organiser Choice.
Gaston Berger
15:00
15:00
30min
Environmental Data Science Book: A Computational Notebook Community for Open Environmental Science
Alejandro Coca-Castro, Anne Fouilloux

Audience

Anyone interested in Reproducible and Reusable Research outputs with FAIR executable notebooks.

Abstract

Environmental Data Science Book (or EDS Book) is a pan-european community-driven resource hosted on GitHub and powered by Jupyter Book. The resource leverages executable notebooks, regional cloud resources and technical implementations of the FAIR principles to support the publication of datasets, innovative research and open-source tools in environmental science. EDS book provides practical guidelines and templates that maximise open infrastructure services to translate research outputs into curated, interactive, shareable and reproducible executable notebooks which benefit from a collaborative and transparent reviewing process. Each notebook and its dependencies (input/output data, documentation, computational environments, etc.) are bundled into a Research Object (RO) and deposited to RoHub (a RO management platform) that provides the technical basis for implementing FAIR (Findable, Accessible, Interoperable and Reusable) executable notebooks.

To date, the community has successfully published multiple python-based notebooks covering a wide range of topics in environmental data science. The notebooks consume open-source python libraries e.g., Pangeo stack (intake, iris, xarray) and Holoviz (hvplot, panel) for fetching, processing and interactively visualizing environmental research.

In future work, we expect to increase contributions showcasing scalable and interoperable open-source developments in other programming languages e.g Julia and R, and engage with computational notebooks communities and research networks interested in improving scientific software practices in environmental science.

What is the EDS book?

  • A book: https://edsbook.org
  • An open source project: https://github.com/alan-turing-institute/environmental-ds-book
  • A community: EDS book is also an open-source collaborative project that involves and supports its members of diverse skills and backgrounds to ensure that data science is accessible and useful for everyone interested in Environmental sciences.

Impact and outreach over last 12 months

  • 10 executable notebooks (see the gallery in https://edsbook.org/notebooks/gallery.html)
  • 24 contributors in the GitHub Host Repository
  • 240 Twitter followers | 23 Mastodon followers
  • Highlighted in the Supporting Pangeo: the community-driven platform for Big Data geoscience project page, https://www.turing.ac.uk/research/research-projects/supporting-pangeo-community-driven-platform-big-data-geoscience
  • Highlighted in a FOSS4G 2022 workshop aiming to teach the basics of Pangeo, an open-source stack suited for big geoscience data https://pangeo-data.github.io/foss4g-2022/afterword/envds-book.html

Outreach

Workshops/Hackathons
- Climate Informatics 2023 Reproducibility Challenge. Co-hosted by EDS book, Climate Informatics and Cambridge University Press & Assessment with support from Cambridge University, The Alan Turing Institute and Simula Research Laboratory, https://eds-book.github.io/reproducibility-challenge-2023/

Presentations
- European Geophysical Union 2023 (EGU23), https://meetingorganizer.copernicus.org/EGU23/EGU23-13768.html
- Pangeo Community Showcase, https://www.youtube.com/watch?v=9lhbU0vbhw0
- European Geophysical Union 2022 (EGU22), https://meetingorganizer.copernicus.org/EGU22/EGU22-3739.html
- UK Conference on Environmental Data Science, https://wp.lancs.ac.uk/ceds/abstracts/abstracts-6th-july-22/#castro
- The Turing Way Fireside chat, https://www.youtube.com/watch?v=EeeRZZ3-Stc
- AGU22, Open Science Practices and Success Stories Across the Earth, Space and Environmental Sciences session, https://agu.confex.com/agu/fm22/meetingapp.cgi/Paper/1072564

Community: Tools and Practices
Louis Armand 1
15:00
30min
Notebooks for All: Accessibility & Jupyter Notebooks
Patrick Smyth, Jenn kotler

Jupyter Notebooks are a standard tool in data science and scientific research and are widely used to teach coding. Unfortunately, this important resource is currently difficult or impossible to use with assistive technologies such as screen readers. This shortcoming disproportionately burdens disabled people and in many cases blocks them from entering careers in STEM.

In 2022 and 2023, Space Telescope Science Institute, the center that performs scientific and research operations for the Hubble and James Webb space telescopes, has undertaken a project, Astronomy Notebooks for All, to research this problem and explore potential solutions through paid, user-centered feedback sessions with developers, scientists, and students with disabilities. In this talk, Jenn Kotler (Space Telescope Science Institute) and Patrick Smyth (Iota School) will discuss the results of this research and their implications for accessibility in Jupyter Notebooks. We will take a realistic look at how Jupyter Notebooks can fall short for people with disabilities by sharing individual stories of blind Jupyter users—people who have found success in STEM, but also those who were deterred by accessibility issues. Finally, we will consider accessibility work already done or under way to make Jupyter Notebooks more accessible, and strategize about ways the Jupyter community can come together to address these issues.

This talk is for a general audience, particularly for those who care about making our community more inclusive. It will be of particular interest to people who author Notebooks and want tips on how to make their work more accessible.

This work is funded by Space Telescope Science Institute. It is made possible by the efforts of the full Astronomy Notebooks for All team, including Dr. Erik J. Tollerud (STScI project scientist), Isabela Presedo-Floyd (UX/UI and Accessibility Designer at Quansight), and Dr. Tony Fast (scientist and open source advocate).

Project repository

Community: Tools and Practices
Gaston Berger
15:00
30min
Reusable JupyterHub Pytest Plugin
Sheila Kahwai, Georgiana Dolocan

Audience:

This talk targets developers who are interested in testing Python packages that use JupyterHub, or modular implementations in general. It is recommended for the audience to have prior experience with the Python language.

Summary:

JupyterHub is a modular and extensible project, with components, like the proxy, authenticator and spawner, that can be easily replaced with alternate implementations. Testing the functionality of these components against JupyterHub is important and it requires various hub setups that can sometimes become complicated.

Each of these hub components and the hub itself define their own testing infrastructure, building everything from the ground up using the pytest framework. And some of this complex work is either repetitive across JupyterHub sub-projects, or under-specified for some of them. This sparked a need to abstract these common parts into a separate testing framework.

During the 3-month Outreachy internship round of 2022, I was tasked with creating a reusable JupyterHub Pytest Plugin. The goal was to provide importable testing utilities to make it easier for contributors to write tests for the various hub components.

This talk will cover details about:
* How we identified the reusable hub functionalities across the jupyterhub repository and its components
* The integration of the Pytest plugin into various JupyterHub sub-projects
* How the community can use this pytest plugin to test their own implementations of the JupyterHub components
* The impact of this project on the maintainability and continuity of the JupyterHub project

Community: Tools and Practices
Louis Armand 2
15:30
15:30
30min
Accelerating the Open Source Silicon Ecosystem with Jupyter Notebooks
Johan Euphrosine

In this interactive session we showcase our recent work to leverage Jupyter Notebooks and Conda packages to publish and share interactive design experiments and tutorials using open source silicon toolchains.

Notebooks published at https://github.com/chipsalliance/silicon-notebooks demonstrate how run fully open source silicon flows from design to gds using publicly-hosted notebooks without having to install any tool locally.

Additionally we show how those notebooks can be scaled on a public cloud provider to explore the parameters space of various silicon designs:
- We deploy an opensource terraform solution https://github.com/GoogleCloudPlatform/rad-lab to provision jupyter notebooks with all the necessary tools pre-installed to model our experiments w/ design and flow parameters.
- Between each batch of experiments we report estimated performance metrics to a blackbox and hyperparameter optimization service (which has also an opensource implementation https://github.com/google/vizier) allowing it to suggest new parameters for future batches.
- We observe that the experiments quickly converge toward the best metrics for the given designs.
- Each of the jobs result in a standalone notebook allowing us to share, aggregate and reproduce every experiments.

Others
Louis Armand 1
15:30
30min
The easiest way to collaborate on Jupyter
Yongjin Shin, Hooncheol Shin

**Audience:
- 'Intermediate' level of programming
- Jupyter users
- Data scientists

Introduction

Development is not only for individuals, but also for enterprises and organizations where efficient collaboration is of great importance. Most developers share and manage source codes with github. Most files are managed efficiently on github, but Jupyter files - ipynb - are stored as text, making it difficult to identify diffs and resloving conflicts between versions. ipynb files consist of cells with codes, which requires manual amendments to the file text in case of merge conflicts. While Jupyter supports the Jupyter-git extension, which allows either deleting conflicting files or selecting one file over the other, it does not directly solve the conflicts within conflicting files or let users view the diffs easily. Also, there are many cases where users collaborate by sharing ipynb files. However, when opening other users’ files, it is often difficult to understand the flow of their code and the order in which the cells should be executed. Using comments or markdown syntax can alleviate these problems, but sharing detailed levels of changes by text is not the most efficient method.

Link Git

Link is a JupyterLab extension that allows users to create pipelines on Jupyter by connecting different cells. The user can connect cells in their desired order into a DAG structure to run the code. Link also provides a Link-git extension, which provides git features on Jupyter for ipynb files with pipelines.

  • Git diff check: Users can visualize the commit history of an ipynb file. The feature shows all changes made at the code level for each commit, and also how the structure of the pipelines changed. When collaborating, users are able to review the history of previous works before moving on to the next stage, and also decide from which commit they wish to begin.

  • Merge conflict management: Link-git contains a merge driver, which resolves all merge conflicts at cell levels - for both the code and pipeline structure - when a conflict occurs between different users working from their respective local environments. Using this feature, a team of developers can create an overall code framework in the form of a pipeline to facilitate the merge process of pipelines after writing codes at cell levels on their respective local environments.

Sharing pipelines and cache

Link provides features to facilitate code sharing in file formats. Conventionally, Jupyter users often share codes as ipynb files to make modifications on existing codes or to re-use them. However, on Jupyter, code cells are linearly listed and contains code cells that are not well organized, leading to difficulties in reproducing results or to make changes or additions. Link provides the following file export, import features to resolve these complications.

  • Pipeline export, import: Link users can export entire pipelines or component cells as json files. When another Link user imports these json files, they can re-open the pipeline or component cell on Jupyter. As the code includes the DAG structure, users can re-open the ordered pipelines together with the code cells, allowing them to more efficiently understand the flow of the code without any additional textual explanations. Also, users can share only the relevant code cells that are required for execution.

  • Cache export, import: Users can store the cache of each component after executing a whole pipeline, and export the cache as an archived name file(.tar.gz). When another user imports this file, they can use the pipeline results without having to reiterate the pipeline execution. This feature is a time-saving feature for users when they need to repeat certain jobs on their own pipelines, or for different users to reproduce the same code.

Finishing up

In short, Link provides a useful framework for easier collaboration with its Link-git feature and the pipeline/cache export and import features.

Community: Tools and Practices
Gaston Berger
15:30
30min
WAAAT! Accessibility Testing JupyterLab
Gabriel Fouasnon

WAAAT = Web App Automated Accessibility Testing

https://hackmd.io/@gabalafou/waaat

Have you ever wondered how to make a complex web application accessible to disabled users? There are many pieces to this puzzle, which include things like using semantic HTML, consulting disabled users, and testing your app before releasing it. In this talk, we will look at testing, specifically how you can use existing tech to increase the portion of testing that can be automated, while keeping in mind that you will always at some point need to test with disabled users.

When it comes to accessibility testing, one of the challenges—and opportunities—in working with JupyterLab is that well known existing solutions are geared towards low-interaction web pages rather than highly interactive web apps. The better part of my work for over a year has been working with designers and developers in open source to develop an accessibility testing solution capable of dealing with the complexity of JupyterLab.

In this talk, I am going to do a deep dive of this testing solution as well as the challenges faced and lessons learned. I will show-and-tell a testing system combined with a documentation system, inspired in part by the work of the W3C Accessibility Conformance Testing (ACT) group. Most importantly, I will demonstrate how two tools, Playwright and GitHub Actions, can be combined to create a powerful CI/CD system for automated accessibility testing. It's not magic. You still have to write the tests yourself. But it's a powerful combination that you can learn by coming to this talk. You can then adopt and adapt this system to many other complex web apps when you need to get serious about making them accessible to disabled users.

Intended audience

Anyone curious about automated UI testing and accessibility! Some familiarity with web accessibility will be useful—for example, knowing what WCAG and Pa11y are; if an attendee doesn't already know these terms, attending any other accessibility talk before this one should give all of the accessibility background needed for this talk. Knowledge of Playwright and GitHub Actions is not necessary, but a little bit of JavaScript web programming experience, as well as a nominal understanding of CI/CD, will make following some parts of the talk easier.

Talk outline

  1. Introduction (5 minutes)
    - Introduction to accessibility testing in the context of JupyterLab
    - Tools of the trade for accessibility testing
    - Intro to testing and reporting: automated versus manual, auditing versus regression prevention, etc.
  2. Challenges faced (5 minutes)
    - The limitations of axe-core, pa11y and the like for JupyterLab
    - Lessons learned so you do not have to
  3. A proposed framework for automated accessibility testing (5 minutes)
    - Use axe-core, but also use Playwright to write more powerful and specific tests
    - Focus on regression testing
    - Mapping tests to WCAG guidelines
  4. Call to action (5 minutes)
    - How to make fixes and keep them fixed!
    - How to use the system to accompany an accessibility fix with a regression test
  5. Questions and answers (10 minutes)

Takeaways

At the end of this talk, attendees will:

  • Know how to write accessibility regression tests for JupyterLab, where to add them, and how to get help if needed.
  • Have an example of an automated accessibility testing system that they can pattern off of to bring an accessibility testing practice to other web apps.
  • Have a better understanding of the current state of the art for automated UI testing in the browser and how to apply it to automated accessibility testing.
Community: Tools and Practices
Louis Armand 2
16:00
16:00
30min
Break
Gaston Berger
16:00
30min
Break
Louis Armand 1
16:00
30min
Break
Louis Armand 2
16:30
16:30
45min
Lightning talks

You did not had a chance to present, or had an idea during JupyterCon, here is your chance to have a 4 minutes presentation about it.

You may register during the day for the lightning talks.
At the entrance level of the conference you will find a number of index cards, a box, and pens.

  • Write clearly the title of the proposal and your name.
  • Put in int the box.

The proposal / talk does not need to be polished, it does not need to be an existing project, nor be your project.
It does not have to be about programming. It can be about waffle, or it can be about shoes. You are allowed to not have slides. It is recommended to make puns.

You have 4 (FOUR) minutes max.

At the end of the day we'll select talks at random.

Please sit near the front if you have submitted a talk.

proposals = {...}
while still_time():
    on_stage = waiting_area
    on_stage.plug_in()
    proposal = random.pick(proposals). 
    proposal.author.walk_to(waiting_area)
    waiting_area = proposal
    on_stage.present(timeout=4min)
    on_stage.unplug_and_walk_away()
MISC
Gaston Berger
17:15
17:15
15min
Wrap up

We'll wrap up for the day/week, and give you informations about the evening /next day

MISC
Gaston Berger
21:05
21:05
90min
See you at the sprints and in 2024 !
Louis Armand 2