05-10, 14:00–14:30 (Europe/Paris), Louis Armand 2
Papermill has become a widely used tool for executing Jupyter notebooks. Teams use papermill for many production use cases, such as scheduled report generation, model re-training, etc. However, since papermill relies on spinning up a second process with the IPython kernel to execute code, it has several drawbacks when used in production.
This talk will introduce an alternative notebook executor that powers Ploomber, a popular open-source orchestration framework. This new executor runs notebooks in a single process, allowing us to provide capabilities for production workloads, such as interactive debugging with pdb and notebook profiling (CPU, and memory usage). The executor is integrated into the Ploomber project and can also be used from the command line, like papermill.
Some experience with Jupyter (notebook or lab) and the terminal is required. Experience with papermill is optional.
Outline:
- [0 - 2 minute] Introduction papermill
- [2- 6] Papermill's drawbacks
- [6 - 10] Running notebooks in a single process
- [10 - 16] Debugging notebook execution with pdb
- [16 - 22] Profiling notebooks
- [22 - 26] Orchestrating notebook pipelines in production with Ploomber
- [26 - 28] Summary and conclusions
- [28 - 30] Q&A
Eduardo Blancas is the Co-Founder and CEO of Ploomber, a Y Combinator-backed company developing tools to bridge the gap between interactive data work and production. Before that, he was a Data Scientist at Fidelity Investments, where he deployed the first customer-facing Machine Learning model for asset management. Eduardo holds an M.S. in Data Science from Columbia University and a B.S. in Mechatronics Engineering from Tecnológico de Monterrey.