How we made Jupyter Notebooks collaborative with Yjs

Published in

Jupyter Blog

9 min readJun 4, 2021

Collaborative editing — à la Google Docs — is a feature that you still rarely find in applications. One of the few good things that came out of this pandemic is that more people seem to care about making their applications fit for remote collaboration. Of course, they always cared about real-time collaboration. It’s just a very hard feature to add to your application. It took the Jupyter project several years to land this feature. Finally, they ended up with a solution that is based on the Yjs Framework which I authored. This article gives an overview of all the work that was put into Jupyter Notebooks and finally describes how we want to make even more components collaborative.

Jupyter Notebooks started in 2011 (back then IPython Notebook) as an effort to make data science reproducible and more accessible by visualizing data dynamically in a notebook. Basically, it allows you to write markdown and code directly in a document. The code can be executed interactively directly from the notebook and shows the result immediately below the code. This is great for data scientists that want to share their research with others. But it’s also a great tool for learning programming because it doesn’t require setting up a programming environment.

Right from the beginning, collaborative editing was on the agenda for Jupyter Notebooks. In 2012, core Jupyter contributor and creator/lead of JupyterHub, @minrk, wrote in the GitHub issue tracker:

[..] This will finally make decent live collaboration feasible, which is our single most-requested and highest-priority new feature. (source)

Back then, everyone tried to replicate Google Docs’ collaborative editing functionality. Google Docs was released in 2006 and was the first web application that supported collaborative editing on rich text. In many ways, Google Docs was ahead of its time. It will take years for others to reproduce this functionality. Even today, collaborative editing is far from being ubiquitously available, even though the technology has been available since the ‘80s.

No wonder, that the first collaborative Jupyter Notebook implementation, Colaboratory (or Colab), was created by Google engineers. They rewrote the UI for Jupyter Notebooks and gave it a collaborative notebook model via Google’s Realtime API, which was deprecated in 2017. This history underscores how challenging it is to build real-time collaboration into applications: when the Google Realtime API was deprecated, Colab lost its real-time collaboration capabilities, a gap which continues to this day.

In 2013 William Stein launched CoCalc, a Jupyter notebook service with collaborative editing support right from the beginning. Like Colaboratory, CoCalc wrote a new UI for Jupyter Notebooks, while reusing other parts of the Jupyter architecture. They made different choices and implemented a custom solution for conflict resolution. I highly recommend watching the below talk where William Stein shared his experience about the state-of-the-art solutions for shared editing back then.

Exploration of different solutions for collaborative editing by CoCalc.

Still, the open-source JupyterLab project didn’t include collaborative editing. What follows is a series of discussions about the best shared-editing solution to integrate into Jupyterlab to make collaborative editing available to the users of the open-source project. In 2017 Brian Granger, Chris Colbert, and Ian Rose shared their work that integrated the Google Realtime API into the existing JupyterLab project, where they showed an awesome demo of what collaborative editing could become.

During my research, I found several PRs by Ian Rose that separated the view (how the Jupyter editor is rendered) from the model (how the data is represented). I haven’t talked to him, but I assume what he found is that it is helpful to have observable data structures that can be synced using some framework for conflict resolution. In this case, he just happened to use the Google Realtime API for synchronization. His work included the abstract factory IModelDB for creating observable data structures that are used to this day. In theory, one just needs to implement the IModelDB interface with observable data structures that synchronize automatically through some real-time API.

But the provided solution was still based on a proprietary API that requires you to hand over your data to Google services. So the Jupyter community was looking into implementing their own solution for conflict resolution. Different people started to explore Concurrent Replicated Data Types (“CRDTs”) for automatic conflict resolution on their observable data structures. This technology has become very popular in recent years as a solution to synchronize data that can be manipulated by many peers at the same time. If you are interested in the topic, I recommend reading some introductory material on https://crdt.tech/.

Lumino (formerly PhosphorJS) is a JS toolkit that underlies the JupyterLab IDE, by providing a rich toolkit of widgets, layouts, events, data structures, and a plugin system at the foundation of the JupyterLab extension system. Specifically for our use-case, it provides observable data structures that are used as a model for all Jupyter packages. In 2017, Chris Colbert started the ambitious endeavor to build high-performance CRDT data structures that can be used as an observable data model. In theory, we could have used that to make any application, that is based on Jupyter data structures, collaborative. Although the Lumino CRDT is little known, to this day it remains the second-fastest CRDT implementation that works on the web.

jupyterlab/lumino

Lumino is a library for building interactive web applications - jupyterlab/lumino

github.com

In 2019, Vidar Tonaas Fauske, Ian Rose, and Saul Shanabrook started work to integrate the Lumino CRDT into JupyterLab. Their work lived for a time in JupyterLab#6871 and has later been moved to a separate repository JupyterLab/rtc.

This is basically where I come in. While the Lumino CRDT is pretty awesome, in 2020 Brian Granger created a Lumino CRDT performance benchmark that revealed critical performance and algorithm issues (such as the so-called interleaving anomoly). In the process, Brian discovered my CRDT implementation Yjs and the two of us began to discuss CRDT implementations and Yjs in particular.

To add collaborative editing functionality rivaling Google Docs, we need a bunch of features aside from automatic conflict resolution. For example, we expect that we never revert changes from other users when we hit the undo button. So we need a selective undo-manager that somehow ignores changes from remote users. Something like this is really hard to implement correctly on top of a CRDT.

Brian eventually asked me to work with QuantStack to bring collaborative editing to JupyterLab. Yjs has ready-to-use solutions for most problems related to building collaborative applications and is the only CRDT implementation that beats the Lumino CRDT in performance.

yjs/yjs

A CRDT framework with a powerful abstraction of shared data Yjs is a CRDT implementation that exposes its internal data…

github.com

When I heard that JupyterLab was pushing for collaborative editing for 8 years, I was determined to produce results as fast as possible. I appreciate all the work that came before me because the codebase was already clearly separating the view from the model. My work was just to exchange the existing observable data structures with Yjs’ shared types (which is a fairly similar concept). After one month of work, I was able to produce the first prototype.

[WIP] Collaborative editing using Yjs by dmonad · Pull Request #9785 · jupyterlab/jupyterlab

This PR implements collaborative editing in JupyterLab using the Yjs shared editing framework. Yjs is an open-source…

github.com

But there was a problem. Another group, led by Eric Charles, also acquired funding to work on collaborative editing and they chose another approach. While I simply replaced the existing observable data structures, they were trying to reuse the existing data structures. I wanted to make full use of Yjs’ features and didn’t want to build extra abstraction layers just to be able to switch to another CRDT implementation. For some time, it seemed we could not reconcile our approaches.

After many discussions with Eric, we finally came up with a compromise that I’m now really excited about. Yjs and ModelDB only provide raw data structures to build collaborative applications. Our plan was to build a notebook model with an easy-to-use API to manipulate, observe, and synchronize changes on the notebook. This would make it possible for other applications to keep compatibility with JupyterLab without forking the whole JupyterLab repository. My hope is that other notebook-related products like CoCalc or VSCode will eventually use this collaborative model to provide cross-compatibility with other Jupyter services. Everything collaborative, of course.

Since February Eric Charles, Carlos Herrero, Jeremy Tuloup, and I have been working on designing and integrating this collaborative model into JupyterLab. Others can use the @jupyterlab/shared-models package from the npm registry to build their own interfaces for Jupyter Notebooks using the same shared editing technology. Our changes have finally been merged into JupyterLab and are already available in the alpha releases of JupyterLab v3.1.0. Simply start JupyterLab with the --collaborative flag to enable collaborative editing.

Making the separation between model and shared data structure has been quite a revelation for me. Yjs’ shared types are very powerful and allow you to make any kind of application collaborative. But shared models that define an application-specific API make it easier for developers to manipulate the data without understanding how the data is represented in the CRDT. This is particularly relevant because CRDT implementations are almost always schemaless (Cambria being the exception). A well-maintained model could ensure that the model is compatible with previous versions. In the future, I want to define more shared models for things that are not trivial to represent in Yjs like calendars, contacts, drawings, and graphs.

Next steps

I’m currently working with Bartosz Sypytkowski on a Rust implementation of Yjs. The Rust implementation will be the baseline for all other ports of the Yjs CRDT to other languages. Thanks to Pierre-Olivier Simonard and PyO3 we already have the template to create language bindings from the Yrs CRDT to a Python package “y-py”. In the coming months, we will implement a Python CRDT that is fully compatible with the web-based CRDT that is used in JupyterLab. This will allow for backends and frontends to efficiently exchange data that can be manipulated simultaneously.

yjs/y-crdt

Yjs ports to other programming languages (WIP). Yrs "wires" is a Rust port of the Yjs framework. The Ywasm project…

github.com

Currently, we still send HTTP PUT requests to save a document to the file system via the Jupyter Server. Once we have the Yjs CRDT working in Python, we will create a Python implementation of the shared notebook model which will allow the Jupyter Server to directly access the collaborative state and synchronize that with the file system. Moreover, this will finally resolve a long-awaited issue that allows the kernel to write the output directly to the notebook without first connecting to a client. Why is that important? This will enable you to run a notebook and then close the browser to run your computations overnight. The kernel will compute the output in the background, and write it to the shared model which is then saved to the filesystem by Jupyter Server.

We want to provide more than a collaborative text editing experience. Yjs provides the data structures to make any kind of application collaborative. Users of Yjs use it to build collaborative drawing-, and diagraming- solutions. Relm.us, for example, uses the collaborative data structures that Yjs provides for modeling a 3D world that thousands of users can visit to work together. We are exploring heavily how we can make Jupyter widgets collaborative using the same technology.

Carlos Herrero is working on a drawio widget for JupyterLab that automatically synchronizes using a custom shared model that he developed. We want to produce documentation on how you can create custom collaborative widgets for JupyterLab. Jupyter widgets will be able to leverage the rich ecosystem of the Yjs CRDT to create widgets. You want, for example, to add a WYSIWYG rich-text editor widget? Sure, just add the awesome TipTap editor to your widget, as it already uses Yjs as the default shared editing technology.

Jeremy Tuloup is working on a WASM-powered JupyterLab distribution that I’m particularly excited about. JupyterLite runs entirely in the browser using only static assets. The code cells are executed using a WebAssembly-based Python runtime. This is already awesome mad science. Now he integrated our collaborative editing approach but uses WebRTC to synchronize peers without the need to set up a central server for conflict resolution. This gives you an offline-ready, collaborative editing experience without setting up any server. Pretty cool!

I hope this got you excited about the future of Jupyter. For sure, I am.

At last, I hope we can appreciate the amazing open-source work that so many people put into this project. I’m looking forward to a future where people take collaborative editing for granted. It has been a rough road and we all needed to learn our lessons to arrive at a solution that works.

Funding acknowledgement: My work on this effort at QuantStack has been funded by Schmidt Futures and the Alfred P. Sloan Foundation through grants to Cal Poly San Luis Obispo.

Jupyter Blog

How we made Jupyter Notebooks collaborative with Yjs

jupyterlab/lumino

Lumino is a library for building interactive web applications - jupyterlab/lumino

yjs/yjs

A CRDT framework with a powerful abstraction of shared data Yjs is a CRDT implementation that exposes its internal data…

[WIP] Collaborative editing using Yjs by dmonad · Pull Request #9785 · jupyterlab/jupyterlab

This PR implements collaborative editing in JupyterLab using the Yjs shared editing framework. Yjs is an open-source…

Next steps

yjs/y-crdt

Yjs ports to other programming languages (WIP). Yrs "wires" is a Rust port of the Yjs framework. The Ywasm project…

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in Jupyter Blog

Written by Kevin Jahns

No responses yet