Jupyter meets the Earth uses research in geosciences to drive developments in Jupyter, aiming to: (1) facilitate discovery and use of diverse sources of data (2) empower researchers to utilize scalable computing resources (3) enable researchers to create custom interactive applications (4) better communicate results to consumers of research–scientists, policy makers, students & the general public.
The Jupyter meets the Earth project uses domain questions in the geosciences as drivers of the software development process, following our long-standing practice in Jupyter of a close dialog between the use cases of research and education and the design of our technology. Through this process, we aim to build tools that close the gap between interactive, scientist- and question- driven exploratory computation and the analysis of heterogeneous and rich data at scale. The full project description and link to the proposal are available on the Jupyter Blog.
In this federally-funded project (from the EarthCube program at the National Science Foundation), we also aim to establish this kind of science-cyberinfrastructure collaboration as a productive pattern for the development of open source scientific tools that is more widely recognized by funding agencies and academic research institutions. Community-driven Scientific Open Source projects have traditionally seen limited federal funding, which has mostly focused on projects that prioritize either only the scientific questions or the infrastructure design and development. We believe that a dialogue between the scientists and the software developers where both parties find productive spaces where to make a contribution, will ultimately lead to more effective and impactful tools as well as contributing to robust, reproducible scientific research.
The aims of this presentation are to Introduce the Jupyter meets the Earth project, domain-use cases, and aspirations for technical developments to the community. Demonstrate the impact and value of tools developed by the Jupyter community in geoscience research. Articulate the value of this science/technology collaboration in funded scientific projects. Solicit feedback and ideas from the community, and highlight avenues for participation and opportunities for partnership.
Presentation outline:
What is Pangeo? And how has it demonstrated the value of investing in community driven open-source software to advance research in the geosciences?
Domain applications & their connections to interactive computing with Jupyter
Hydrology: Recent releases of large sample hydrological datasets have provided unique opportunities for comparative studies, process understanding and model development. By capitalizing on publicly available but unorganized/individual intensively monitored watersheds, we intend to synthesize, organize and disseminate a comprehensive dataset to the wider hydrological community. Here, we employ the Jupyter machinery (JupyterLab, Widgets, and Voila) to interactively transform the unorganized raw data to a quality controlled and gap filled accessible hydrological data. The machinery focuses on reproducibility and ease of use to facilitate contributions from different groups with individual watershed raw data.
Cryosphere Science: We study ice on Earth, with focus in Greenland and Antarctica using the recently launched ICESat-2 satellite, a LIDAR instrument that measures the return time of laser light aimed at the Earth. We aim to produce improved analysis, visualization and statistical modeling tools that combine ICESat-2 data with other remote sensing products. This type of multimodal data analysis in the cloud is a challenging but increasingly common research pattern in multiple disciplines. This project stems from a collaboration with the icepyx team (a tool used to retrieve the data from ICESat-2), statisticians, Jupyter developers and glaciologists.
Climate Science: Earth System Models (ESMs) simulate the coupled dynamics of Earth’s atmosphere, oceans, land, and ice. The Coupled Model Intercomparison Project Phase 6 (CMIP6) is expected to produce more than 10 Petabytes of data from more than 30 different ESMs from modeling groups around the world. The expectation is that climate scientists will be able to directly compare results from one ESM with another. We use open source scientific Python tools such as Xarray, Dask, and Jupyter in conjunction with highly-scalable cloud computing platforms to analyze and visualize the massive datasets produced by these models.
Geophysics: To build models of the subsurface of the Earth, we use numerical simulations and optimization to estimate physical properties from geophysical data. We make use of dask to run large computations on HPC facilities, widgets to interactively design the computation and explore our results, and are working on extensions to stream and visualize intermediate results in a large computation
Jupyter technologies currently serving geoscience researchers and future development goals under this project including:
Using JupyterBook to enable researchers to render markdown and notebooks into a webpage for public outreach, and can optionally enable readers to contribute Pangeo at High-Performance Computing and Data Storage Centers. These developments are being led primarily by the team at NCAR and include: (a) Intake-ESM for cataloging and improving accessibility of large Earth System Model datasets, (b) improvements to dataset collection search and discovery through the JupyterLab interface, (c) dask-jobqueue for launching large Dask clusters via HPC-oriented task management systems, such as SLURM and PBS
How to get involved
We are an interdisciplinary team of geoscientists and Jupyter developers and we welcome participation from all backgrounds! We do not assume a background in the geosciences or deep technical experience with Jupyter development.
This work was supported by the NSF Earth Cube Program under awards 1928406, 1928374.