elife sprint: Integrating Stencila and Binder

Daniel Nüst
Jupyter Blog
Published in
5 min readDec 10, 2018

--

This article reports on a project, integrating Stencila and Binder, which started at the eLife Innovation Sprint 2018 (#eLifeSprint). A longer version has been cross-posted on multiple blogs (eLife Labs, Stencila, o2r).

eLife, an open science journal published by the non-profit organisation eLife Sciences Publications from the UK, hosted the first eLife Innovation Sprint 2018 as part of their Innovation Initiative in Cambridge, UK: “[..] a two-day gathering of 62 researchers, designers, developers, technologists, science communicators and more, with the goal of developing prototypes of innovations that bring cutting-edge technology to open research communication.” One of the 13 projects at the excellently organised event was an integration of Binder and Stencila.

This article reports on the project’s results and changes made to Binder-related tools. Today, Binder has first class Stencila support. Read the full story at the eLife Labs blog post or try out to open Stencila documents from any online code repository on mybinder.org with the click of a single button:

Click the Binder badge to open a Stencila document on mybinder.org

This project is a collaboration between Min from the Simula Research Laboratory, Norway, a core developer of Binder and related tools, Daniel from the o2r project at the Institute for Geoinformatics, Germany, and Nokome, initiator and developer of Stencila. The final changes were made with the help of Tim. Thanks! The project was also part of the Mozilla Global Sprint 2018, see mozilla/global-sprint#317.

The building blocks and a challenge

Stencila Desktop is an office suite for reproducible research documents. It allows scientists to use languages like R and Python within familiar and intuitive word processor and spreadsheet user interfaces to lower the barriers to reproducible research for those with little or no software development skills. Binder (a part of Project Jupyter) makes it simple to generate reproducible computing environments from code repositories (e.g. GitHub or GitLab, see binder examples) and mybinder.org is the most prominent example. Binder uses repo2docker, for generating Dockerfiles (a human- and machine-readable recipe for setting up a computational environmentt used by the popular Docker container software) and building Docker images from software projects. While containers have become a commodity for developers, researchers still struggle to grasp and control the complexity of computational environments. This is where the two building blocks join: Running Stencila as part of a Binder helps researchers to communicate their work openly, to collaborate effectively with other scientists, and to ensure a high quality and transparency of their workflow and findings. Min and Daniel formulated their goal in the sprint project form: “[..] to connect them so that users can edit reproducible documents (DAR files) as part of a Binder project”.

Connecting Stencila and Jupyter: nbstencilaproxy

Stencila has “execution contexts”, an equivalent to Jupyter’s “kernels”. The contexts use code dependency analysis and return execution results as data values to enable a reactive, functional execution model. To open Stencila documents on binder, these execution contexts must be installed and configured in the environment created by repo2docker. This is achieved with a new software project initiated at the sprint: nbstencilaproxy - a Jupyter notebook server extension and proxy for Stencila.

The projects consists of a Python module with the Jupyter notebook server and “non-server” extensions of the same name, and a bundled JavaScript module (of the same name). The Python module allows proper versioned installation, dependency management, and installation from an established software repository. It takes care of the plumbing between the user interface and the services in the background, so that the binder is viewable over one port in the browser, while the many different background components run on their own ports. The “no server” extension adds a “Stencila session” menu entry and conveniently lives in the same directory structure as the server extension. The JavaScript module manages the required JavaScript dependencies and provides an well-defined structure for the code files. It serves the Dar document and provides access to the Stencila host.

repo2docker was extended with automatic detection of Dar documents, including the languages and execution contexts. As with most Binder repositories, a no configuration is needed for most common user cases: users can open a Dar document on Binder and trust the required environment to provide all required software. Daniel created a few example repositories to provide a starting point for users. The binder team generously welcomed the changes to mybinder.org and and the examples to the binder examples organisation on GitHub:

https://github.com/binder-examples/stencila-py contains Python code cells, using both the Jupyter and plain Python execution contexts:

Click the Binder badge to open a Stencila document with Python code on mybinder.org

https://github.com/binder-examples/stencila-r contains R code cells and two plots:

Click the Binder badge to open a Stencila document with R code on mybinder.org

One of the cool features of Stencila are the reactive cells, as demonstrated in a tweet following the feature release:

Binder + Stencila is a demonstration of the power that the Open Source and Open Science community can foster. Many people are working together on the organisational and technological challenges of science today towards full research transparency and reproducibility, even if we use computers to an unprecedented level. Many small contributions on “side projects” such as these can make a difference, and connecting these two great projects hopefully helps to solve some problem in science down the road.

Join the public Stencila and binder chats to stay in touch or get help. We look forward to see scientists using nbstencilaproxy for communicating their work and new challenges that come with it.

--

--

Daniel is an RSE and postdoc at the Chair of Geoinformatics, TU Dresden. He develops tools & infrastructures for open + reproducible geoscientific research.