Using Jupyter to build and train models is only part of the process in creating a data workflow. Managing environments, artifact handling and system resources are a few of the concepts in workflow creation. Elyra’s pipeline extension abstracts patterns in workflow development to provide a friendly interface while integrating with workflow orchestrators like Kubeflow Pipelines and Apache Airflow.
This presentation will detail how Elyra creates notebook based data pipelines with Jupyterlab, Papermill and Kubeflow Pipelines, all without having to leave your web browser. Pipeline construction typically involves an infrastructure team tasked with deploying and keeping the pipeline operational. These tasks can vary in granularity and include environmental setup (dependencies, learning frameworks, container images), artifact handling (datasets, file ingestion, intermediate files and archiving) and the assembly of the these parts into a pipeline. As the number of variations in the pipeline increase, so does the amount of work and time needed to set it up. The goal of using Elyra is to help alleviate this problem by surfacing concepts and patterns common in pipeline construction into a familiar interface and `self-serve’ model for Data Scientists and Engineers. We will demonstrate how Elyra can rapidly prototype data workflows without the need to know or write any pipeline code while still being able to take advantage of popular pipeline runtimes. We will look at how Elyra integrates with Kubeflow and Airflow, our experiences (good and bad) while developing this extension and our roadmap for the future. Attendees should have basic working knowledge of Jupyterlab and basic knowledge of Kubernetes Elyra - https://github.com/elyra-ai/elyra nteract Papermill - https://github.com/nteract/papermill Kubeflow Pipelines - https://github.com/kubeflow/pipelines