Balaji Alwar is working with Berkeley Research, Teaching, and Learning and Computing, Data Science and Society to design and scale the Berkeley DataHub, a service that provides interactive computing environments to educators and students across campus using open source tools in the Jupyter ecosystem and beyond. He has a bachelor's degree in computer science and graduated with a master's degree in education technology from Harvard. Previously, he was a product lead for a research project focused on upskilling at the Harvard Kennedy School. He is passionate about using technology for public goods that offer immersive and equitable learning experiences.
Over the last 5 years, UC Berkeley has utilized a campus-wide JupyterHub to advance the learning objectives of courses across the University, including Engineering, Data Science, Natural Science, and Social Sciences disciplines. UC Berkeley hosts a cloud-based Jupyterhub for educational purposes, with up to 5000 users per week and 11,000 users per semester. More than 60+ courses from 20+ departments within UC Berkeley teach using Jupyter Hubs. A core constituency of courses are foundational courses in Data Science and quantitative methods, with a long tail of advanced courses.
Basic use cases for JupyterHubs at UC Berkeley are i) Instructors demoing notebooks during their lectures, ii) Graduate Student Instructors running notebooks during their labs, and iii) Students working on individual homework and assignments. However, there is no one size fits all solution that could solve the diverse instructional needs of the Berkeley community. A multi-level support team listens to the needs of the instructors and engineers our hubs to solve their requirements. This poster will describe several innovative use cases where the Jupyterhub server is used to expand beyond the basic use case. In each case, specific engineering was required to adapt the base image or to have specific extensions adapted to use cases.
Database Hub - Database fundamentals in Data Science Courses:
Deployed MongoDB and PostgreSQL server per student for a Data Science course that teaches the fundamentals of Data Engineering. Instructors teach the fundamentals of databases by connecting their Jupyter notebooks to databases mounted on the persistent volume disk for each student. Students perform large computations in their Jupyter notebooks and store the values in the databases.
In addition, the core data science methods class deployed the SQLite database on the Network File Sharing (NFS) server to teach database fundamentals. This deployment did have occasional problems with race conditions causing errors with the NFS server.
Shiny Hub - Teaching quantitative Social Science using Shiny Dashboard:
Shiny dashboarding functionality was deployed for a quantitative political science course to explore relationships between different variables. Students click on nbgitpulller link which launches into the Shiny hub and they can explore visualizations through the Shiny dashboard.
Shiny via Jupyterhub was a fast and seamless use case for R users (Predominantly Social Science instructors) to build/engage with dashboards.
Biology Hub - Share complex genomic datasets with students through shared storage:
Biology instructors utilize common “shared read-write” directories to store large and complex datasets (running into a few GBs) from genetics data. Biology instructors also act as hub admins and have write access to these directories. Students load datasets into their Jupyter notebooks from shared directories with read access. The Genomics workflow also means that the Biology hub requires more compute than the base hub.
Linux Desktop hub - Explore simulations in engineering courses by using open-source GUI tools:
The efficiency of the Jupyterhub in authentication, scaling resources, and shared file server makes a compelling use case to deliver a more graphical user interface, e.g. Linux Desktop.
The “Designing Information System and Devices'' course in the Electrical Engineering and Computer Science department uses pyqt5 which is a python-based cross-platform software for a visual demonstration using the Linux desktop hub.
Deployed a new Linux Desktop hub to launch a desktop environment and installed the Jupyter-qgis application which is extensively used in Civil Engineering class to teach geospatial explorations.
Deployed a Julia-based simulation called FUND model which is used to calculate the Global Cost of Carbon as part of a course lecture using this hub.
RTC Hub - Facilitates collaboration in project-based work through Real-Time-Collaboration (RTC) pilot feature:
Experimentally deployed an RTC hub which supports a course about reproducible computing. We enabled Real-Time Collaboration (RTC) for Spring 22 semester where multiple students worked as part of projects on a single Jupyter notebook file.
This pilot deployment ran into data corruption issues which were isolated to be caused by RTC. We were able to provide valuable feedback to the upstream JupyterLab developer team based on the learnings from this deployment.
Workshop Hub / High School Hub
In an outward facing deployment, this hub has provided a small amount of compute which allows for the demo of materials to non-UC Berkeley users.
Open Hub - Support new users with open access to a generic hub:
Foster innovative use with an open hub that anyone with Berkeley uses without management overhead. In many cases this has dynamically led to new deployments where instructors can try out new materials and understand the teaching workflow.