Login Sign up

Monday Oct. 12, 2020, 5:15 p.m.–Oct. 12, 2020, 5:45 p.m. in Data Science Applications

Leap of faith: Transitioning from Excel to Jupyter-based applications

Itay Dafna

Audience level:

Brief Summary

We will go on a journey to explore how the evolution of data analysis and visualization has enticed data analysts from all walks of life to migrate from Excel-based analyses to rich, interactive, and deployable Jupyter-based solutions.


Excel has long been the go-to tool for data analysis. Data analysts across industries utilize Excel to conduct their analyses, as well as use it as an application building platform. However, the recent rise in the popularity of machine learning algorithms and interactive data visualization falls beyond the scope of Excel's core functionality. As a result, analysts who are looking to improve their game are finding Jupyter Notebooks to be the perfect medium to bridge this gap.

During the talk, I will draw on my recent experience in the financial services industry, where I helped such users transition from Excel-based applications to Jupyter-powered ones. Using a demo application, implemented in both Excel and Jupyter, we will compare the two implementations and identify the main factors which entice users to take the leap and upgrade from Excel to Jupyter. Specifically, we will look at how the ease of processing arrays of data in Python addresses a big pain point in Excel, where such operations require verbose and often inefficient VBA code. We will also look at how seemingly simple data visualization operations, such as programatically changing the scale type of an axis, are difficult to do in Excel, but very intuitive and straightforward in Jupyter. We will also discuss the “delight” factor that Jupyter-based applications provide and which Excel lacks – including smooth transitions, extensibility, and theming.

Finally, we will discuss “pain points” that users experience when transitioning from Excel to Jupyter. Specifically, we will look at how losing the comfort of having your data visible, accessible, and mutable on a spreadsheet requires a paradigm shift in the way users think about data analysis, and how these tasks are done in Jupyter-based environments:

  1. Input: Loading/generating data
  2. Processing: munging/cleaning up the data and applying any models and/or analysis on it
  3. Output: visualization, saving to a file or another data storage medium

In Excel, all three of these stages are combined into a single operation.

We will also mention the difficulties of deploying Jupyter-based applications vs. Excel ones, and look at how using community tools like binder and voila alleviate these challenges. Basic familiarity with Jupyter Notebooks and Python programming is recommended.