Apache Spark is an open source engine for performing data analysis on large amounts of data. Apache Spark 3.0 provides major improvements to its SQL processing capabilities, optimizations for Pandas, and first-class support for GPUs. Apache Spark can be used for exploratory data analysis, data processing or machine learning. In this talk, you’ll learn how to do all of this from a Jupyter Notebook.
Intro to Apache Spark: History, What it is + How it works, Ecosystem, Data source interoperability, Runtime environments. What’s New in 3.0: GPU compatibility, Efficiency improvements of the Spark engine. Spark + ML: Utilize GPUs, options for available tools. Spark + Jupyter: Discussion of different kernels, options for available tools. Demo: Spark with Jupyter notebooks.