JupyterCon 2023

Jongseob Jeon


Sessions

05-11
17:20
15min
Accelerate your ML Cycle from Model development to deployment using Jupyter (feat. Extension Link)
Jongsun Shinn, Jongseob Jeon

1. Introduction

Jupyter has been empowering many ML projects especially in the initial stage, due to its interactive and versatile nature.
On the other hand, Jupyter is commonly regarded as a less suitable option for collaboration among data scientists or for productizing developed models.
To cover these weaknesses and promote Jupyter as a key development tool throughout projects, LINK, as a Jupyter extension, offers a Pipeline feature.

2. What is LINK ?

LINK supports exporting and importing pipelines such as Kubeflow and ArgoWorkflow pipeline and also guarantees reproducibility.
Data scientists can easily make pipelines in YAML format with LINK by simply clicking the export button on UI without having to learn the complexities of using SDKs, such as KFP.
Once a pipeline is uploaded, it is difficult to handle pipeline codes when collaborating with others.
LINK supports importing pipelines in YAML format and reproduces the code and the pipeline, so other data scientists can easily compare python code in the notebook cells and execute version control for the project.

3. How could LINK be used in real world projects?

In the real world, data scientists in Makinarocks use LINK to develop anomaly detection models in the industrial domain, such as for motors and CO2 laser drill equipments.
From training to deploying a machine learning model in the real world, we found out that building a training pipeline is essential. We build these training pipelines with LINK, which allows easier integration with other MLOps tool like Kubeflow Workflow, MLFlow, Seldon Core and Stream Serving. The following are the steps we executed:
1. We made the training pipeline which train and save ML models to MLflow on jupyter notebook.
2. By simply exporting to Kubeflow pipeline, we can execute reproducible pipeline runs and track run histories.
3. We deployed our model from MLflow using Seldon Core to serve streams. Models that are deployed must have the same environment as the training pipeline built from jupyter notebook.

Data Science
Poster Placeholder