
The Big Split™
IPython has grown a great deal over the years. As of 3.0, IPython includes:
- an interactive shell
- a REPL protocol
- a notebook document fromat
- a notebook document conversion tool
- a web-based notebook authoring tool
- tools for building interactive UI (widgets)
- interactive parallel Python based on the above REPL protocol
While all of these are part of the same story of tools for the lifecycle of a computational idea, they are increasingly becoming distinct projects that happen to live in a single repo. One significant part of the development is that pieces like the notebook and protocol are not even specific to Python, so it doesn’t make sense anymore that they reside in a project called Interactive Python. This is the impetus for Project Jupyter, announced at SciPy 2014, which is the new home of language-agnostic projects that began as part of IPython, such as the notebook.
If anyone has been confused by what Jupyter is[1], it’s the exact same code that lived in IPython, developed by the same people, just in a new home under a new name.
As IPython has matured, the interfaces between these different components have stabilized:
- the notebook format
- the REPL protocol
There has been growing tension in the project as we work to stabilize some APIs, such as the notebook format and message protocol, while we also start work on highly unstable experimental projects, such as the interactive widgets. The tension comes from the divergent maturity of these components, and associated appropriate release cycle of the different pieces. It simply no longer makes sense to release all of these pieces at the same time.
We think the maturity of these APIs has reached the point that we can reasonably separate development of the different components, and rely on these interfaces to communicate between the projects. That should allow the projects to have their own release cycles, with more frequent bugfix releases, and grow their own developer and user communities.
This means that IPython 3.0 will be the last monolithic major release containing all of these projects. IPython 4.0 will primarily consist of splitting IPython into these subprojects.
Many of these pieces (REPL protocol, notebook-related tools) are language-agnostic, and not appropriately called IPython anymore.
What it will mean
Once this is done and we’ve made the 4.0 releases of everything, each project will have its own package. To get the notebook, you would:
pip install notebook
jupyter notebook
And nbconvert:
pip install nbconvert
jupyter nbconvert
etc.
This is not true yet, and these commands won’t work until we have made stable releases of these packages.
What it means now: moar repoz
Splitting IPython means creating new repos for each of the subprojects, which we did last week. We created eleven new repos (!), which you can find on the ipython and jupyter GitHub organizations.
Since we haven’t made a release of any of these packages, it’s a pain to get up and running with all of these, since you need to clone each one from master. The only way to automatically resolve dependencies to GitHub repos is with the deprecated, discouraged, and disabled-by-default dependency_links. This means you need to explicitly clone and install each of the dependencies in order, which you can do with a requirements.txt:
curl -O https://gist.githubusercontent.com/minrk/2b74fb7465a3702e2ce4/raw/requirements.txt
pip install -r requirements.txt
We’ve put a requirements.txt in each repo with the git URLs of the unreleased dependencies, so for any given repo, you should be able to get working on master with:
git clone https://github.com/jupyter/notebook
cd notebook
pip install -r requirements.txt -e .
As we start releasing the dependencies, we’ll trim down these requirements files as they become unnecessary.
Updated Imports
For the most part, each of the new packages was an IPython subpackage, relocated to a new top-level package. This means your import should only need to replace the IPython.subpackage
part with its new jupyter name and you should be set. We've also added shims to IPython, so that imports that worked in 3.0 ought to keep working even after the next release, with a warning pointing to the new location.
If you want to live in The Future™, here are most of the updated imports, after the move:
- IPython.utils.traitlets ⇒ traitlets
- IPython.config ⇒ traitlets.config
- IPython.html ⇒ notebook
- IPython.nbconvert ⇒ nbconvert
- IPython.nbformat -> nbformat
- IPython.parallel ⇒ ipyparallel
- IPython.qt ⇒ qtconsole
- IPython.terminal.console ⇒ jupyter_console
IPython.kernel has split in a slightly more complex way. For the most part, it’s split into these two packages:
- IPython.kernel ⇒ jupyter_client,
ipykernel
The client code is now in jupyter_client, so if you were importing kernel managers, clients, or specs, they would come from jupyter_client. This includes IPython.kernel.zmq.session
, which has moved to jupyter_client.session
. IPython's kernel-side code is in ipykernel.
The kernel.zmq subpackage is also removed, so anything in IPython.kernel.zmq
will be top-level in ipykernel
or jupyter_client
, for example:
IPython.kernel.zmq.session
⇒jupyter_client.session
IPython.kernel.zmq.kernelapp
⇒ipykernel.kernelapp
How we did the split
We did most of the split in two stages. The first stage was to split out the subpackages, still within the main ipython/ipython repo (referred to as IPython Prime to avoid ambiguity). We did these mostly one at a time. For example, we moved IPython/html to notebook. We also added shims, so that anything that tried to import from IPython.html woud see a warning, but still work.
This allowed us to keep IPython master working (for pip install -e
development installs, but not regular installs), and the tests running on Travis, to verify that we weren't breaking the universe.
Phase II
Once all of the new packages were separated within the IPython Prime repo, the time to perform the actual split had arrived. IPython’s git repo isn’t huge, but it is over 50MB. Since simply copying the repo and removing unused files would preserve the history of all of IPython in every repo, that would mean that each repo would also start at over 50MB. We didn’t want to increase the size of a clone of IPython/Jupyter past 500MB, but we also didn’t want to lose all of the history on the migrated files. To accomplish that, we used a combination of git filter-branch
and bfg
to prune the history of the relocated files. We put together a few scripts that run filter-branch to exclude the history of any files not in a specified whitelist, following history across renames as best we can, using git log --follow
. Since we are not running filter-branch on the original IPython repo, we felt comfortable being relatively aggressive with the cleaning, as all the true history is preserved in IPython Prime.
IPython’s build, release, and docs machinery is old and has grown many gnarly bits over time, much of which is irrelevant to the new packages, so these were not included in most of the new packages.
We split the packages in dependency order, so each new package only depended on other packages already split. After making each new repo, we would get it back to working order by adding basic setup.py, readme, license files, and .travis.yml. We would then update the contents of the new package to stop using the shim imports, which would be measured by the shim warnings. Once the tests were passing in the new repo, the package would be removed from the IPython repo. This process was repeated until the IPython repo contained only the shims.
Don’t look at me, I’m genutils
Part of what made the split possible to do in such a short time was ackowledging that there were a few common utilities that we use everywhere that we couldn’t decouple in a reasonable amount of time. During Phase I, we found a few imports of utilities that made sense while everything was in one repo, but were really unnecessary. These were handled by moving utilities into new packages, or duplicating small amounts of utility code to avoid the dependency.
The rest that we couldn’t split has been dumped into an ipython_genutils package, which should not be used by anyone for anything. This is a package that shouldn’t exist, has zero public APIs, and should be monotonically decreasing in functionality. The plan is to never add code to this repo, and to slowly move the functionality to downstream repos, or new standalone packages. Hopefully we can accomplish this relatively soon, and the genutils package can go away.
Still to do
There is actually one split left to do: we are going to split the html widgets out of notebook
into ipywidgets
as a standalone package, but we need to figure out some detangling first, specifically what the installation process for projects with kernel-side and client-side components. In the long term, we want to split ipython_widgets (Python-side) and jupyter_widgets (client-side js), but the messaging interface between them is still an active research project, and they cannot feasibly be decoupled at this point.
We still have some work to do migrating issues, and getting docs put together on the new repos. We have some scripts for the issue migration, and some PyCon sprinters have been helping get the docs back on their feet (thanks!). We are also going to need a lot of new docs on installation and usage. We also need to write some basic configuration migration for the new Jupyter locations of files, no longer in the .ipython directory.
Conclusion
We’re going to do our best to make this a smooth transition. One thing you can help with that is to report issues on the new repos. If you have a problem in the notebook, please report it at notebook
instead of on IPython Prime. The same goes for the other repos.
- I saw “Jupyter is like IPython, but language agnostic” immediately after the announcement, which is a great illustration of why the project needs to not have Python in the name anymore, since it was already language agnostic at the time. ↩︎