IPython Parallel in 2021

Published in

Jupyter Blog

5 min readNov 30, 2021

This post describes work funded by Bodo, Inc.

IPython Parallel’s a bit of an odd duck in the parallel computing space. In a world with dask, ray, bodo, pyspark, and other parallel computing tools, what is IPython Parallel’s role in 2021?

Context

IPython Parallel began (in 2006!) as a natural extension of the Jupyter messaging protocol(though it predates the Jupyter name by a few years): when you have a protocol for REPL-style remote code execution, what can you do with multiple remote execution environments? This question led to the development two basic models for parallel execution in IPython Parallel:

“multiplexed execution” — where you send code explicitly to different workers, and
“load-balanced execution” — where you tell IPython Parallel what to run, and a scheduler takes care of assigning each task to an available worker.

What IPython Parallel doesn’t do

To IPython, your tasks are black boxes (either Python functions or blocks of Python code as text) and you are in complete control of where and when your tasks run, as well as any dependencies or side effects they may have.

IPython Parallel specifically doesn’t and won’t do several things that other tools might:

infer dependencies or relationships between tasks (execution graph)
manage data distribution or locality
construct parallel algorithms or represent ‘natively’ parallel distributed data structures, such as distributed arrays or data frames

Everything is very explicit and implemented at the client-level in IPython Parallel, meaning that while it doesn’t do these things, you can. It also means that many of the things that have overhead costs associated with parallel computing (communication, data movement) have particularly high costs in IPython Parallel.

IPython Parallel doesn’t hide anything from you, for better and worse.

What IPython Parallel does

IPython Parallel makes explicit parallel computations interactive.

IPython Parallel is not just parallel Python, it’s parallel IPython. That means you have all the power of IPython’s inspection, interactivity, magics, and debugging available on all of your distributed workers. This makes it especially well-suited to prototyping and experimentation.

IPython Parallel also presents standard APIs such as Python Executors, compatible with many other implementations, to make it easy to migrate to and from IPython Parallel, enabling developers to write code that uses a single multicore laptop or a thousand cores on an HPC cluster or cloud.

IPython Parallel in 2021

Interactive progress across parallel engines. Those progress bars are interactive widgets running locally, *and on each remote engine!*

For a lot of today’s workloads, my default recommendation is: use dask or bodo or another modern tool. IPython even makes this easier if you already happen to have an IPython Parallel cluster, you can tell it to “become dask,” and off you go:

dask_client = rc.become_dask(ncores=1)

However, where IPython Parallel can shine, is making traditional SPMD (e.g., MPI) workloads interactive, especially for prototyping and debugging.

If you have an MPI simulation and you wish you could pause it in the middle, poke around and make plots and interact with just one node or all of them with all the interactive tools available to you in Jupyter, IPython Parallel may be the tool for you.

Recent developments

Because I see the main problem IPython Parallel solves well is adding interactivity to direct parallel execution, the focus of recent developments has been on improving that story, based on feedback from users who are often in traditional HPC environments like SLURM or PBS, or using MPI in the cloud.

Much of the feedback on challenges over the years have been around:

Scaling to larger numbers of engines (1,–10,000)
Better feedback and recovery when things go wrong
Security requirements

Thanks to Bodo, I’ve been able to spend a lot of time this year addressing several of those issues, now available as IPython Parallel 8.0.

Last year, Tom-Olav Bøyum developed a broadcast scheduler as part of his Master’s thesis, which vastly improves the efficiency of sending the same task to all engines (the main thing we do in IPP+MPI). This work is now mostly done and landed in IPython Parallel 7. We’ve also addressed longstanding issues with registering large numbers of engines (5–10,000) at once, a common source of failure on HPC clusters.

One of the most frustrating / challenging things when working with MPI or other parallel computations is dealing with hangs and restarting tasks. Through the new Cluster API, IPython Parallel now has one of its most requested features: the ability to send signals to one or all engines, and forcefully restart the cluster if it’s stuck.

We’ve also improved interactive feedback with progress bars, live streaming output, a newJupyterLab plugin based on dask-labextension, and support for the Jupyter widget protocol, so you can instantiate widgets on engines and interact with them directly from a notebook.

Security questions have also been raised, because IPython (and Jupyter’s) use of ZeroMQ assumes a ‘trusted network’, typically localhost or BSD sockets for notebooks. Since IPython Parallel usually runs across a network, security can be more of a concern (whereas notebooks typically only use standard HTTPS connections over a network), and the level of security in the Jupyter protocol alone may not be sufficient. In the past, we’ve resorted to tunneling TCP over SSH for more secure network traffic, but still always implicitly trusting localhost.

IPython Parallel 7.1 enables CurveZMQ for full authentication, encryption, and forward-secrecy at the transport level, solving a longstanding security shortcoming of IPython Parallel.

IPython Parallel in the future

Scaling is the biggest challenge for IPython Parallel due to its communication model. However, that scaling is on the order of IPython engines, which is not the same as the number of cores. You can benefit from this today if your tasks are already multi-threaded, e.g., through OpenMP threads in numpy, or your own Python threads or multiprocess breakdown of tasks.

Better support of multi-level parallelism, where each “engine” may represent a multicore node, as individual nodes get bigger and bigger would mean moving the bar where IPython Parallel communication is the bottleneck back two orders of magnitude on large machines, because one ‘engine’ could represent 128 cores or more. Good support for 1000 128-core nodes would give IPython Parallel some pretty comfortable headroom for our target use cases, scale-wise.

Also related to scaling, bringing the BroadcastView to maturity should allow us to completely replace the DirectView scheduler, as it should be able to match or beat it in almost every scenario with some further development. There are some features lacking, especially when it comes to error handling, but those can certainly be addressed in time.

Finally, I’ll invite you to get involved. Especially if you are interested in prototyping parallel code, or making traditional MPI-style code interactive, check out IPython Parallel 8 and let us know how it goes, or contribute your use case as an example.