Login Sign up

Operationalizing Machine Learning (MLOps) with Jupyter Notebooks

Sukanya Mandal

Audience level:
Experienced

Brief Summary

Doing Machine Learning is one thing. But then, successfully deploying a model - operationalizing and orchestrating each phases of the entire Machine Learning life cycle for reproducibility is a challenge and requires a lot of software engineering skills. This talk will address the issues of ML Deployment and demonstrate MLOps by orchestrating Jupyter Notebooks using Kubeflow on AWS.

Outline

Even though computing resource availability has made applied ML (Machine Learning) a reality for various industrial use cases. But the major challenges are encountered by those trying to deploy ML applications. Issues such as - -- software management (heterogeneity of various frameworks - each with their own deployment methodology) -- composability (built using multiple component - each using different set of languages or toolkit, instead of monolithic architecture) -- portability -- performance optimization -- collaborative development -- scalability -- infrastructure management arises out of the blue when machine learning workflows or pipelines are to be productionized resulting in sleepless nights of the engineers.

This presentation will address these issues, highlighting a containerized approach of machine learning deployment using Jupyter Notebook and Kubeflow. Furthermore, this talk will cover Cloud based approach of deployment using Amazon Web Service capabilities.

Containerized approach strongly benefits the engineers to address the issues of composability, portability of the machine learning models (addressing heterogeneity issue highlighted above) and scalability of models when data increases!

In addition to all these, a cloud based infrastructural approach ease the engineers by providing a fully managed environment which would otherwise be an additional task if done on-premise. All we Data Scientist or Machine Learning Engineers care about is how the model works, how is the data put to use, whether or not we are being able to achieve the accurate results - all these are very different focus areas way beyond the hustle and bustle of on-premise infrastructure management. Also, if it's a startup - it would mean that the budget shooting up if one decides to go on-premise as they would require a separate team to manage the same.

The key takeaways from this talk would be the understanding of why MLOps is significant and how can you design and build your own containerized MLOps pipeline with Jupyter Notebooks using Kubeflow on AWS (with a demo).

Pre-requisites knowledge: 1. Jupyter notebook 2. Machine Learning lifecycle 3. How containers work (not mandatory but good to have)

Some background references to go by: 1. https://www.kdnuggets.com/2018/04/operational-machine-learning-successful-mlops.html 2. https://towardsdatascience.com/ml-ops-machine-learning-as-an-engineering-discipline-b86ca4874a3f 3. https://medium.com/@caiomsouza/mlops-machine-learning-and-operations-and-ai-at-scale-ffcac7e50f62 4. https://www.kubeflow.org/docs/notebooks/