Login Sign up

Thursday Oct. 15, 2020, 4 p.m.–Oct. 15, 2020, 4:30 p.m. in Data Science Applications

Apache Spark 3.0: Big Data Analytics with GPUs and Jupyter Notebooks

Brad Miro, Tahir Fayyaz

Audience level:
Intermediate

Brief Summary

Apache Spark is an open source engine for performing data analysis on large amounts of data. Apache Spark 3.0 provides major improvements to its SQL processing capabilities, optimizations for Pandas, and first-class support for GPUs. Apache Spark can be used for exploratory data analysis, data processing or machine learning. In this talk, you’ll learn how to do all of this from a Jupyter Notebook.

Outline

Intro to Apache Spark: History, What it is + How it works, Ecosystem, Data source interoperability, Runtime environments. What’s New in 3.0: GPU compatibility, Efficiency improvements of the Spark engine. Spark + ML: Utilize GPUs, options for available tools. Spark + Jupyter: Discussion of different kernels, options for available tools. Demo: Spark with Jupyter notebooks.