Login Sign up

Fundamentals of High-Performance Data Science with RAPIDS

Zahra Ronaghi, Sr. Data Scientist, NVIDIA

Audience level:
Intermediate

Brief Summary

The open source RAPIDS project allows data scientists to GPU-accelerate end-to-end data science workflows. Accelerate your data science applications on GPUs using RAPIDS cuDF (GPU-enabled Pandas-like dataframes) and cuML (GPU-accelerated machine learning).

Outline

We will discuss how to prepare and train datasets using Pandas and Scikit-learn on CPU, and using cuDF and cuML on GPU. GPU-accelerated dataframe manipulation with cuDF and prepping datasets for machine learning.

In the GPU-accelerated data manipulation section they will interact with dataset to transform UK population data, hospital data, and road network data in preparation for a variety of machine learning algorithms.

In the GPU-accelerated machine learning section they will utilize a variety of machine learning algorithms, such as K-means and XGBoost to analyze ideal supply locations, clusters of infected people and probabilities of infection.

Students will walk through a full setup of the model using CPU-only models, then will run the same workflow after making modifications to use GPU computing libraries from the open source RAPIDS tools, illustrating performance increases when running on GPUs.