05-10, 10:30–13:00 (Europe/Paris), Room 3 (Tutorial)
Learn how to run the Python data science ecosystem in parallel with Dask in this hands-on tutorial.
Dask is an open source library for parallel computing in Python with deep integration with common Python libraries like numpy, pandas, xgboost, pytorch, xarray, and of course Jupyter. In this hands-on tutorial we will launch clusters of distributed machines, and use those clusters to process and analyze data on the cloud.
Students should be mildly familiar with Python and Pandas syntax, and be interested in the challenges of large scale computation.
Matthew is an open source software developer in the numeric Python ecosystem. He maintains several PyData libraries, but today focuses mostly on Dask a library for scalable computing. Matthew worked for Anaconda Inc for several years, then built out the Dask team at NVIDIA for RAPIDS, and most recently founded Coiled to improve Python's scalability with Dask for large organizations.
Hendrik Makait is a data and software engineer building systems at the intersection of large-scale data management and machine learning. Currently, he works as an Open Source Engineer at Coiled improving Dask and its distributed execution engine.