Astronomy science platforms are changing the way researchers work. Enabled by Jupyter, they allow geographically distributed users to access petabytes of data and perform complex analyses with pre-installed software libraries. Using NSF’s Astro Data Lab as an example, I will highlight how Jupyter is used in cutting-edge astronomy research, and share our wishlist for Jupyter development.
In this talk, I will present example Jupyter applications as part of astrophysical science platforms to illustrate how Jupyter Notebooks/JupyterLab can be embedded within a workflow, how they influence the way researchers work and how they can be used to train students and professional astronomers who may not be experienced in data science. The main goal is to share experiences, and facilitate follow-up discussion about possible future Jupyter developments to improve current functionality. No astronomy background is needed as I will focus on the general concept of a science platform as a suite of online tools and services that include access to datasets, visualization and analysis software, compute resources, and storage capabilities.
In astronomy, the need for science platforms is driven by not only the ever-increasing data volume but also by the complexity of datasets, which require highly specialized and diverse software libraries to be co-located with the data. Therefore, there is a marked advantage in being able to connect Jupyter Notebooks or JupyterLab to large datasets to perform analysis and/or data visualization efficiently. I will showcase two different astronomy projects as ongoing successful example applications. Namely, the Astro Data Lab at NSF’s NOIRLab (National Optical-Infrared Astronomy Research Laboratory) is an online astronomy science platform serving large public astronomical datasets including databases with tables ranging from 500,000 to 65 billion rows. Since opening its doors in 2017, the Data Lab now has over 1,300 registered users from different countries, ranging from students to faculty/researchers, and educators. Secondly, the talk will demonstrate how the Dark Energy Spectroscopic Instrument (DESI) team employs Jupyter as the primary way to enable their geographically distributed collaboration (450 researchers from more than 70 institutions) to access petabytes of data with pre-installed software libraries. They also use it as a quick way to perform integration testing of software releases.
Both the Data Lab team and the DESI team take advantage of the user-friendly format of Jupyter Notebooks to create examples and tutorials to train new users, and to illustrate key functionalities. They both maintain their tutorials and examples on GitHub for version control and contributions (DESI tutorials). In the case of the Data Lab, each user account comes with the latest collection of example Notebooks which range from “Getting Started” novice level, to more technical “How To” notebooks, complete scientific use cases, and also include Educational (high-school activities, and upcoming La Serena School for Data Science notebooks) and Contributed notebooks from the astronomy community. The full collection is automatically updated based on our notebooks-latest GitHub repository.
In order to embed a Jupyter Notebook server as part of the science platform, we needed to enable users to query databases directly within a Notebook (via a QueryClient), and save their output to virtual storage associated with their Data Lab accounts (via a StoreClient). I will highlight how these pieces connect together to make Jupyter an integral part of a science platform, and present a list of what works well as-is, and what features we would like to see being developed and/or improved. Broadly speaking, several other astronomy data centers are either currently using – or planning on using – science platforms with Jupyter as a core part of their workflow. Wide usage of Jupyter within astronomical software platforms would make workflows much easier to transport from one platform to another, and continue to shape the future ways in which researchers conduct their science.