What happens when you run out of processing power on your laptop? You could scale up - get more efficient hardware, or scale out - add more machines. Whichever you choose, there are great tools for accomplishing scale within the Jupyter ecosystem. This talk presents Dask and RAPIDS for parallel and GPU computing, and how to launch and manage clusters all within JupyterLab.
What happens when you run out of processing power on your laptop? You could scale up - get more efficient hardware, or scale out - add more machines. Whichever you choose, there are great tools for accomplishing scale with the Python and Jupyter ecosystem. Dask is a parallel computing framework that scales from your laptop to a cluster of thousands of machines. RAPIDS is a GPU-computing framework that pushes traditional CPU workloads to the GPU. Dask and RAPIDS together allow you to scale both up and out! This talk will help you navigate this exciting new world, and show how easy it is to get your workloads running faster in Jupyter.
Outline
The state of single-node CPU workloads: why do we need clusters and GPU computing?
Intro to Dask (Python-native cluster computing)
Intro to RAPIDS (Python-native GPU computing)
Code examples:
Launch Dask cluster and monitor in JupyterLab with dask-labextension
Large-scale data processing across the cluster
Fast ML model training with RAPIDS
Prerequisites: a working knowledge of data science with Python (pandas, numpy, scikit-learn, etc.). No cluster computing experience necessary - this is what you will learn from the talk!