Jupyter Blog

The Jupyter Blog

Follow publication

A C++ API for Vega-Lite

In this post, we present the first public release of XVega, a C++ library for producing Vega-Lite charts.

Data science workflows differ from traditional software development in that engineers make use of available tools to explore and reason about a problem. In such exploratory work, engineers load data, crunch numbers, produce simple visualizations and iterate… Progress happens in quick incremental iterations, which is possible when tooling does not get in the way.

This kind of interactive computing is generally associated with the Python or R programming languages. However, with the advent of the Cling C++ interpreter from CERN, and the subsequent development of the xeus-cling Jupyter kernel, new possibilities have opened up in this space.

The Jupyter stack — that started in the scientific Python community has evolved into a language-agnostic framework that can now be leveraged by C++ developers. It bridges the gap between the countless scientific computing libraries and tools available in C++ and the Jupyter ecosystem.

The scientific C++ stack now has numerous projects under its belt — such as xtensor, xframe, etc. However, there is little support for visualization — especially for interactive plots. While there exist matplotlib-cpp and matplotplusplus (with their plotting API resembling the original matplotlib library) — they suffer from the same cons as the original library does (such as the imperative API and the confusion between dual object-oriented and state-based interface).

Owing to all these shortcomings, along with the observation that JupyterLab comes with existing support for Vega and Vega-Lite Charts (through the application/vnd.vegalite.v3+json MIME type), one can leverage this support to bridge the gap rather than reinvent the wheel. Apart from standalone use — one could also integrate such a system into other projects such as xeus-SQLite.

The main idea is to programmatically fill in a JSON that conforms to the Vega-Lite specification and respects the notion of grammar of graphics. It is analogous to what Altair did for Python. We will expose different APIs responsible for filling in certain parts of the JSON.

The fundamentals with XVega are still the same, i.e. the three essential elements of a Chart are Data, Marks and Encodings as usual and importing the library is as simple as writing two statements:

#include "xvega/xvega.hpp"using namespace xv;

The experience is similar to what Altair offers and, hence, the central piece to the library is the Chart() object — which knows how to emit the JSON dictionary representing the data and visualization encodings.

For those unfamiliar with the Vega ecosystem, a quick recap for the above terms is given below:

  • Marks — What graphic should represent the data?
  • Encodings — Mapping between Data and Visual Elements of the Chart (such as x-axis, etc.).
  • Encoding Types: Quantitative (real-valued), Nominal (unordered categorical), Ordinal (ordered categorical), Temporal (time-series).
Basic usage of XVega showcasing the essential elements — Data, Marks and Encodings.

The core strength of using such a system is the separation of specification and execution. The declarative API makes it easy to specify what should be done rather than focus on incidental details of the how. It means that rather than having a special hist() function for plotting a histogram, passing bin=True does the job.

Simply stating bin=True bins the x-axis giving us the Histogram directly — without using a dedicated function.

We can of-course customize the binning parameters with a Bin()object instead. And while we are doing that, let’s add a colour encoding as well to get a sense of the 3rd dimension.

More control can be achieved using a custom Bin() object — used to set the binning parameters.

Another plus of using Vega-Lite is the possibility of using transformations within the specification rather than doing it before.
(E.g., one can do linear regression as a part of this declarative API).

Usage of layering and transformations in XVega.

Lastly, support for Interactions and Selections is a no-brainer. It’s as simple as defining what to use and adding it to the Chart() object.

Zooming and Panning along with Tooltips using Interval Selection in XVega

Developing such a system for C++ comes with its own challenges and to provide a seamless experience like Altair, several things are needed to be taken care of:

  • Multiple types for a single entity: the Vega-Lite specification allows variables of different kinds (such as a boolean type and an integer type may be equally valid for a particular property). Variants and Visitors in C++ allow us to achieve this.
  • Out of order keyword arguments: Method chaining is the classical approach to tackle out-of-order keyword arguments in C++ and is what is used in XVega indeed.
  • Optional fields: A lot of values in the Vega-Lite specification are optional, and this is made possible by the optionally contained values in C++ (i.e. using std::optional).

Installation

You can install XVega with conda or mamba:

mamba install -c conda-forge xvega

or

conda install -c conda-forge xvega

What is coming?

XVega is still at an early stage and under active development. We are currently working on integrating it with the xeus-sqlite and other SQL Jupyter kernels to enable the visualization from SQL queries. We are also working on improving the compilation time of XVega with Cling.

Acknowledgements

This work on XVega was funded by QuantStack. Thanks to Sylvain Corlay and Johan Mabille for their continuous support.

It is an exciting time for the interactive C++ ecosystem, as so much innovation is happening in the Cling and Jupyter projects. There is a lot more to come for sure. If you are interested in helping us build that future, come talk to us at Gitter and GitHub.

About the Author

My name is Madhur Tandon, and I currently work with QuantStack as a Scientific Software Engineer. Before joining QuantStack, I have worked with Mozilla, Deepnote, INCF (International Neuroinformatics Coordinating Facility), TCS Research and Elucidata. I have also been a speaker at JupyterCon 2020 and PyData Delhi 2017 and 2018. I graduated from IIIT-Delhi this year with a Bachelor’s degree in Computer Science with Honors. Besides core Data Science and Machine Learning, I am interested in tools that enable and enhance data scientists’ workflow and experience.

No responses yet

Write a response