Desktop GIS software in the cloud with JupyterHub: A QGreenland workshop success story

Matt Fisher
Jupyter Blog
Published in
6 min readAug 2, 2023

--

👋 We are Trey Stafford and Matt Fisher, co-authors of the QGreenland data package’s source code. This year, we had the pleasure of running a hands-on geospatial data and open science QGreenland Researcher Workshop. It was important for attendees to participate in the workshop in a hands-on way while minimizing the negative impacts of installing software, requiring expensive personal computers, and troubleshooting unique computer configurations. We felt a JupyterHub was a good fit for our workshop for this reason, if it could accommodate our need for running QGIS — a desktop application.

In this blog post, we will introduce QGreenland, describe our experience using JupyterHub in the cloud for our workshop’s computing environment, and discuss challenges we overcame to enable our attendees to use QGIS in a cloud graphical desktop environment. Finally, we will highlight some workshop outcomes and discuss opportunities for enhancement based on new developments in the Jupyter ecosystem.

In our workshop, 25–30 international learners (including from Germany, India, France, Canada, Poland, and the United States) used QGIS in a JupyterHub’s browser-based Linux desktop environment to collaboratively test, explore, visualize, and process Earth science data simultaneously with the same user experience they expect from using QGIS on their personal computers! Better yet, getting started was as simple as logging in. Our workshop was a success story not just in education, but also in open source and collaborative development, and we want to share what we learned.

A map depicting Greenland is displayed within QGIS software running in a cloud-based desktop environment. A visualization of September 2012 Arctic sea ice concentration is overlaid.
Note the browser tabs at the top of this screenshot; this is a full desktop-based GIS environment running QGreenland in the cloud!

The JupyterHub used by the QGreenland 2023 Researcher Workshop was generously provided by the NASA CryoCloud team, whose mission is to help researchers transition to cloud-based collaboration.

About QGreenland

QGreenland is an open-source Greenland-focused geospatial data package for QGIS, a community-owned graphical Geographic Information System (GIS) platform. Researchers and members of the public leverage QGreenland’s ready-to-use interdisciplinary datasets to do field planning, teach about glaciers, and much more.

QGreenland’s MIT-licensed source code uses community-maintained open software like GDAL and PyQGIS to automate data normalization and populate the QGIS project with important information like data provenance and the order of layers in the QGIS Layers Panel. Check out our documentation to learn more! QGreenland also has a YouTube channel with tutorials produced by CIRES Education and Outreach.

A map depicting Greenland is displayed within QGIS software running in a cloud-based desktop environment. Visualizations of bathymetric depth, Greenland ice sheet thickness, and Arctic sea routes are overlaid.
Once QGIS is installed, opening QGreenland is as easy as double-clicking the included “.qgs” file. Here, a representative view of QGreenland v3 alpha in QGIS is displayed with newly updated layers: Arctic sea routes (National Geospatial Intelligence Agency), bathymetric depth (General Bathymetric Chart of the Oceans (GEBCO)), and ice thickness (IceBridge BedMachine Greenland v5) layers.

Based on user research, QGreenland has enabled:

  • the public to more easily access data gathered by researchers visiting Greenland: “In Greenland, people are often asking, ‘how can we find the data the foreign scientists bring back from Greenland?’ Now we can directly utilize much of it.”
  • researchers to plan field work: “Being able to use QGreenland at our field station was critical to our research process!”
  • educators to develop interactive lessons about Greenland and climate change: “
using QGreenland for presentations because it is presentation quality already.”

QGreenland’s 2023 researcher workshop

One of the QGreenland team’s most important forms of direct user interaction and support is facilitating workshops. Most recently, we hosted a 3-day (total of 9 hours) virtual workshop for researchers focused on working with geospatial data in an open science framework. All of the materials covered in the workshop were built using open-source tools and are MIT-licensed and published on GitHub.

A “personal computer” in the cloud

We decided early on that we wanted to use JupyterHub to solve the diverse problems that come with “bring your own device” workshops. We experimented with administering our own JupyterHub on Kubernetes, but the setup overhead was too high for our short workshop. CryoCloud’s JupyterHub enabled us to avoid this overhead and focus on serving our participants. Because the software that comprises CryoCloud is open-source and developed in collaboration with the communities CryoCloud serves, we could directly contribute to curating a computing environment ideal for our participants.

JupyterHub is known for providing access to Jupyter Notebooks via JupyterLab, but it turns out it can also be used to host pretty much any interactive web based application! The jupyter-server-proxy project enables this, and there are additional packages that make running specific applications easier. jupyter-rsession-proxy makes it easy to run RStudio inside JupyterHub, jupyter-vscode-proxy allows running code-server (fully open source self-hosted version of Visual Studio Code) inside a JupyterHub, etc. Pertinent to our use case is jupyter-remote-desktop-proxy, which lets you run a complete Linux desktop environment inside your JupyterHub! This was critical for our workshop, as it allowed us to use QGIS — purely desktop software, not adapted for the web — from inside a web browser. Workshop participants did not need to install anything. This enabled participants to focus on the content of our workshop rather than the logistics of setting up and debugging tools on their varied machines.

The CryoCloud JupyterHub enabled each of our workshop participants to provision their own compute environment (JupyterLab + Linux Desktop) with all of our workshop’s dependencies pre-installed. It also set everyone on equitable footing — someone accessing the workshop on a 10 year old laptop would get the same computing resources as someone on a brand new MacBook Pro.

Challenges scaling QGreenland

The CryoCloud JupyterHub already had jupyter-remote-desktop-proxy and QGIS installed, so we could validate this approach to our workshop quickly. However, to use QGreenland at this scale, we needed to solve a couple of usability problems. The first issue was a user experience problem: the operating system did not have appropriate file type associations for QGIS, so files like the QGreenland project file would not open in QGIS when double-clicked in the desktop file browser. We quickly discovered a solution and integrated it with a simple pull request to the Docker image we were using.

The second problem was a performance problem: QGIS would take several minutes to open QGreenland from the hub’s shared storage drive. After some investigation, it turned out this was due to us loading multiple GB of data from an NFS share! While a long term solution might involve getting QGIS to load data directly from cloud object storage (like S3), we instead decided to go a different route — provision each user a small, fast and temporary Elastic Block Store disk. At the start of the workshop, we provided all users a small script that would copy the dataset from NFS to this faster disk once, and this drastically reduced load times from about 5 minutes to under 3 seconds! You can follow our debugging process on this issue, and find the JupyterHub config used to provision these disks here.

By overcoming these challenges, we created a smooth, intuitive, and performant computing experience for all of our participants, most of whom had never been exposed to this sort of collaborative computing environment.

Outcomes

The workshop participants engaged in small group work to complete various exercises, group discussions, and data scenarios. Each group produced Jupyter Notebooks and GitHub Discussions posts as deliverables. We created an outcomes webpage to summarize our participants’ accomplishments. One highlight was participants’ insightful commentary on FAIR & CARE principles.

Based on these outcomes, we consider our workshop a success. While we put in a significant amount of time creating our materials, CryoCloud’s cloud costs and our time investment in preparing computing resources were relatively small. For approximately 25 people, our cloud costs break down to roughly $1/person/day!

Conclusion

The CryoCloud JupyterHub met our workshop needs and provided a delightful experience for administrators and participants alike, and we are excited for what’s next. JupyterLab 4 and jupyter_collaboration v1.0.0, a real-time collaboration extension, were just announced, and the CryoCloud team is currently working to integrate these new releases into their hub. Real-time collaboration will enable exciting cloud use cases, like small groups working together on the same notebook without a screen share, or organizers providing technical support in a live notebook. We anticipate running this workshop again. We are excited to use JupyterHub again and look forward to experimenting with these new features!

Acknowledgements

Reviewers

In alphabetical order, thanks to Twila Moon, Yuvi Panda, Tasha Snow, and Alyse Thurber for their time contributing to this post!

CryoCloud

Snow, Tasha, Millstein, Joanna, Scheick, Jessica, Sauthoff, Wilson, Leong, Wei Ji, Colliander, James, PĂ©rez, Fernando, James Munroe, Felikson, Denis, Sutterley, Tyler, & Siegfried, Matthew. (2023). CryoCloud JupyterBook (2023.01.26). Zenodo. https://doi.org/10.5281/zenodo.7576602

2i2c

2i2c is a non-profit organization that runs open-source infrastructure for collaborative computing, and maintains the CryoCloud JupyterHub used in this workshop. You can see the complete configuration of this JupyterHub in this public repository.

--

--