Login Sign up

Tuesday Oct. 13, 2020, 4:45 p.m.–Oct. 13, 2020, 5:15 p.m. in Jupyter Community: Practices

Best practices for managing Jupyter-based data science projects using Conda (+Pip)

David R. Pugh

Audience level:
Intermediate

Brief Summary

This talk covers "best practices" for managing Jupyter-based data science projects using Conda (+Pip). The talk contrasts a "system-wide" Jupyter install where Conda (+Pip) are used to manage a Jupyter installation that is shared across all projects with "project-specific" Jupyter installs where Conda (+Pip) are used to manage Jupyter separately for each project.

Outline

Outline

This talk will cover "best practices" for managing Jupyter-based data science projects using Conda (+Pip). The first half of the talk will discuss the merits of a "system-wide" Jupyter install where Conda (+Pip) are used to manage a Jupyter installation that is shared across all projects. Benefits of a system-wide install of Jupyter are a common set of JupyterLab extensions available for all projects which simplifies UI/UX; no need to frequently re-build JupyterLab; quicker start for prototyping new projects as no need to install Jupyter (+dependencies). Particular focus will be given on how to create project-specific Conda environments with custom kernels allowing users to launch Jupyter Notebooks and Python consoles for each separate project within a common JupyterLab. This part of the talk will also cover the %conda and %pip magic commands and their role in development and prototyping environments in Jupyter Notebooks.

The second half of the talk will contrast the system-wide Jupyter install with a "project-specific" Jupyter install where Conda (+Pip) are used to manage separate Jupyter installations for each project. Benefits of this approach are more flexible UI/UX as JupyterLab extensions can customized for each project; ability to have different versions of JupyterLab installed on the same machine allows for experimentation with bleeding edge features; project specific Jupyter install managed with Conda (+Pip) automatically makes a data science project "binder-ready". Examples of project specific JupyterLab installations will be given, including examples of JupyterLab installations for GPU accelerated data science projects that leverage extensions such as jupyterlab-nvdashboard. Talk will wrap with a walk through of a template Bash script for creating Conda environments with custom JupyterLab installations that draws on my experience contributing to and working with repo2docker.

This talk assumes basic familiarity with Jupyter Notebooks, JupyterLab, Conda, Pip, and the Bash shell and will be contributed to the Introduction to Conda for (Data) Scientists tutorial materials that are being developing with The Carpentries Incubator. There could also be a substantial hands-on component to this talk that would make for a great tutorial but I submitted a proposal for a talk instead as I was concerned that the material would not fill a full 3-hour tutorial slot (without bringing in additional material on basics of Conda (+Pip) which are likely out of scope for JupyterCon).

Addition materials for context

I written a number of blog posts that cover the basics of using Conda (+Pip) to manage data science project environment that include various custom JupyterLab installations.