Hendrik Makait is a data and software engineer building systems at the intersection of large-scale data management and machine learning. Currently, he works as an Open Source Engineer at Coiled improving Dask and its distributed execution engine.
Learn how to run the Python data science ecosystem in parallel with Dask in this hands-on tutorial.
Dask is an open source library for parallel computing in Python with deep integration with common Python libraries like numpy, pandas, xgboost, pytorch, xarray, and of course Jupyter. In this hands-on tutorial we will launch clusters of distributed machines, and use those clusters to process and analyze data on the cloud.
Students should be mildly familiar with Python and Pandas syntax, and be interested in the challenges of large scale computation.