Simplifying and speeding up Binder builds with BuildKit
The Binder Project allows users to build reproducible, sharable environments for interactive computing. To accomplish this, Binder uses a tool called repo2docker to generate an executable Docker image using the Reproducible Execution Environment Specification.

The first time a repository is launched on Binder, repo2docker must build the reproducible environment for it. This process can take a long time because of all the dependencies that need to be installed and turned into the image. As a result, Binder launches can feel slow and clunky, which is a poor UX for workflows that are designed around quick interactive sessions.
repo2docker was built several years ago, and followed patterns that were commonplace at the time. However, in the past few years the Docker community has made significant advances in optimizing the image building process. One-such improvement is the creation of BuildKit, a replacement for Docker’s historical build system that is much more sophisticated. However, repo2docker hasn’t leveraged these improvements because it was still using the original Docker Build system.
So, we’ve decided to spend a few cycles modernizing repo2docker’s image building logic by using the more modern BuildKit API (via ‘docker buildx build’). This allows for optimizations like build parallelization, better build caching, and supporting some Dockerfile
features that Binder didn't support earlier (particularly, COPY --chown
). It also lays a foundation for significantly simplifying the repo2docker build infrastructure and leveraging more of BuildKit's parallelization functionality. For example, we'd like to leverage BuildKit's Kubernetes driver which distributes builds much more efficiently and in parallel.
Authors of Binder repositories won’t need to take any action¹, and they’ll simply notice that mybinder.org (and any other community-run BinderHub instance) will be a bit snappier at building images.
If you’d like to learn more about the changes that enabled this, check out this mybinder.org pull request which has links to the repo2docker pull requests that added this functionality. We’re excited keep improving Binder, and are hopeful that this makes the experience of using mybinder.org and community Binders a little bit better.
Authors and acknowledgements
Yuvi Panda is a co-founder and the Technical Lead at 2i2c. He is passionate about building participatory open infrastructure for scientific & educational use cases. He is a Project Jupyter team member primarily focused on infrastructure related projects (JupyterHub, Binder, etc). He also wrote most of the code for this `docker buildx` transition, and shepherded it through to deployment on mybinder.org as well. He is ex-Wikimedia and ex-GNOME. Let’s eliminate accidental complexities wherever we find them.
Chris Holdgraf is a co-founder and the Executive Director of 2i2c. He is on the Executive Council of Project Jupyter, and co-leads the JupyterHub and Binder team as well as the Jupyter Book team). He was previously a post-doctoral researcher in the Department of Statistics at UC Berkeley, and a Community Architect with the Division of Data Science at Berkeley. He’s interested in using open infrastructure to support interactive computing workflows in research and education.
Many thanks to @minrk, @manics, and @consideRatio for their help reviewing and shaping this work.
Footnotes
¹ Unless they were relying on undocumented implementation details of the old builder — in particular, the presence of a /.dockerenv
file to detect if you are running in repo2docker scikit-learn/scikit-learn#30835 has an example.