Genomic data visualization in Jupyter

Jean-David Harrouet
Jupyter Blog
Published in
3 min readFeb 8, 2021

--

If there is one thing that recent events tell us, it is that genomics is a large source of data, and that its manipulation and understanding allow for the quick development of new drugs and… vaccines.

We decided to build upon the Jupyter ecosystem and enrich its capabilities in this space, with a genomics data visualization widget.

In doing so, we chose to leverage the igv.js JavaScript library, developped by the Integrative Genomics Viewer (IGV) team, and used by the web app of the same name. This is all done thanks for the power of ipywidgets , which helps linking the JS genomic representation object to our Jupyter notebook.

Hence, we are proud to announce the release of ipyigv, a Jupyter widget to render genomics data, based on igv.js!

Installing ipyigv

ipyigv is available on PyPI and conda-forge. It can be installed with both pip and mamba / conda:

mamba install -c conda-forge ipyigv

You can find additional installation instruction on the project's page on GitHub.

Rendering genomic data

igv.js consumes genomic data in two chunks:

  • the genome itself as documented here
  • specificities of the genome underlined as Tracks, which are displayed alongside or on top of the genome

ipyigv uses the same logic, with a helper function to use the public genomes available. Data is then displayed in an IgvBrowser widget, a wrapper for the igv.js browser.

To make things easier, there are a few public genomes made directly available via a helper Bunch , PUBLIC_GENOMES .

Here is what it looks like:

Creating a genome browser with data from a public genome

Now that we have created a genome browser, we may add tracks to it. Tracks may be of different sorts, about ten according to the igv.js documentation. Describing all the kinds of tracks is out of the scope of this article, but it should be noted that:

  • a common Track class is defined with all the common properties. Each track type corresponds to a subclass inheriting Track .
  • in order to ease things out, some class introspection was introduced in order to identify the type of Track based on the extension of the containing file. As shown below, this allows for instantiating a track by just using the Track constructor, without knowing the name of the actual subclass being instantiated (e.g. AnnotationTrack in the example below). Type inference is made through the type of file, or by specifying a type property.

Now, let's add a Track to our browser, then remove it:

Adding and removing a track to/from our browser

In this first version, two other functions come in as handy:

  • browser.search('chr3:1-190,100,300') would position the browser at the requested position in the genome;
  • browser.dump_json() prepares a JSON representing the configuration of the browser. Handy if you would like to reinstantiate another browser with the same configuration without redoing it manually. Use browser.out to output the JSON content.

We hope you'll have fun manipulating genome data in Jupyter, and sharing visual knowledge thanks to ipyigv!

Acknowledgments

The development of ipyigv by Jean-David Harrouet at QuantStack was funded as part of the PLASMA project, led by Claire Vandiedonck, Pierre Poulain, and Sandrine Caburet, associate professors at Université de Paris.

Sponsors to the PLASMA initiative include:

About the author

Jean-David Harrouet is an innovator helping companies with their digital transformation.

He believes that the right mix of coding and business acumen is a way to make life better for many people.

--

--