JupyterCon 2023

Federated collaborative workflows for Jupyter
05-10, 15:30–16:00 (Europe/Paris), Gaston Berger

Cloud Storage for Synchronization and Sharing (CS3) platforms, like ownCloud or Nextcloud, have been widely deployed in the research and educational space, mostly by e-infrastructure providers, NRENs (National Research & Education Networks) and major research institutions. These services, used usually in daily workflows by hundreds of thousands of users (including researchers, students, scientists and engineers) remain largely disconnected, and are developed and deployed in isolation from each other.
The same can be said about Jupyter deployments, the de-facto standard for data analysis in the scientific and research communities: each institution has its own configuration, deployment strategy and, more importantly, their customized way of giving users access to their (siloed) data and code.

The EU-fundend project CS3MESH4EOSC was started to address these major technical, but also societal challenges. Science Mesh, its main asset, was idealized to provide an interoperable platform that easily integrates and extends sync & share services, applications (like Jupyter) and software components within the full CS3 community. Such federated service mesh provides a frictionless collaboration platform for hundreds of thousands of users, offering easy access to data across institutional and geographical boundaries.

This presentation will focus on the development of the cs3api4lab, a plugin created by the project to connect Jupyter to the Science Mesh. It brings features like easy to configure access to CS3 services’ backends, sharing and parallel access to notebooks right from within the Jupyterlab interface. We will also discuss its applicability outside of the Mesh and, finally, on the project vision for collaborative scientific analysis.

See also: Presentation (8.3 MB)

Diogo Castro is a full stack software engineer, currently working in the storage group of the CERN IT department. He has been contributing to CERN's Jupyter based, Service for Web-based ANalysis (SWAN), since he joined CERN in 2017 and, more recently, contributed to CERNBox, CERN's sync and share service, and AFS, a distributed filesystem used by CERN researchers.

Experienced System Architect and R&D Project Manager with 20+ years of enterprise software design and development. Founder and leader of Big Data Lab in Software Mind. He is now involved in CS3MESH4EOSC project (https://cs3mesh4eosc.eu/), leading tasks on Reference cloud interoperability platform and distributed Data Science environments.

He developed Big Data solutions before it became mainstream. In the years 2005-2008 he was involved in development of technology for the first web-scale Semantic Web startup: garlik.com, his team started using Hadoop in February 2006, as one of the first companies in the world.

He participated in and lead many commercial projects which included Big Data, high volume and high velocity solutions, in various sectors. He was a Work Package leader and provided Big Data architecture in in a number of EU-funded research projects.

Connect: https://www.linkedin.com/in/marcinsieprawski/