IBM Brings Jupyter and Spark to the Mainframe

For the past few years, Project Jupyter has been collaborating with IBM on a number of initiatives. Much of this work has happened in the Jupyter Incubation Program, where IBM has been working on extensions for Jupyter that add dashboarding, improved notebook content management, a kernel gateway, and declarative interactive widgets.

IBM has also released the Data Science Experience (DSX), a Jupyter Notebook based data science cloud platform. “Jupyter notebooks are the primary user interface for our portfolio of machine learning offerings,” says Jean-Francois Puget, of IBM Analytics.

In parallel to that work, IBM has been investing significant resources to bring the power of Jupyter and its many kernels to traditional mainframe systems based on z/OS. Early in 2016, IBM and Rocket Software released the z/OS Platform for Apache Spark. This platform enables optimized abstraction and real-time analysis of structured and unstructured enterprise data using the full power of Apache Spark. This release also included a complement of open sources tools (http://zos-spark.github.io/), including a Spark/Scala Kernel for Jupyter called Apache Torre.

Last month, IBM made several announcements around Project Jupyter as part of a broader open data science platform strategy on z/OS.

First, Continuum Analytics has joined the z/OS partner ecosystem to collaborate with IBM and Rocket Software to bring Anaconda to z/OS. This will include a full blown Jupyter experience on z/OS for Python and R. With this release, the Jupyter Notebook provides z/OS users with a unified way to access the many powerful analytics and machine learning tools from Apache Spark and Anaconda, including scikit-learn, Pandas and dask.

Second, IBM announced IBM Machine Learning, which combines its Data Science Experience interface with Spark ML for z/OS to provide an end to end machine learning experience for z/OS.

“Using notebooks, data scientists can ingest data, explore it, visualize it, and build models using either Spark ML or packages available from Anaconda. Leveraging Jupyter notebooks provides all the flexibility data scientists need. IBM Machine Learning will leverage Anaconda on z/OS,” explains Puget.

"Every second of every day, worldwide, there are 7K tweets, 60K google searches, and 1.2M business transactions (orders, payments, claims, etc) on z/OS. We seek to improve the economics, performance and time to value of data science as it relates to big data within large corporations and organizations," says Dan Gisolfi, of IBM Emerging Technologies.

He continues, "Data science on big data requires compute to be co-located with the data. This implies that when the data is stored on the mainframe we need to enable data scientists to use technologies like Anaconda and Jupyter natively on z/OS. Our approach is to establish an open data science platform on the mainframe so that clients can leverage the best of open source technology and portable skills at the source of origin of business data. By using Jupyter natively on z/OS, our customers can reduce or eliminate many of the costly ETL workloads they run on a daily basis to move data elsewhere for downstream analytics."

Project Jupyter is pleased to see IBM embracing Jupyter as an open-standard for multi-language data science and analytics. While many users are familiar with the Jupyter Notebook, this work by IBM illustrates the power of the underlying Jupyter architecture, which provides a fully open set of specifications and standards for interactive computing across any programming language. The Jupyter architecture includes the Jupyter Notebook Document Format and the Jupyter Message Specification. If you are interested in learning more about the Jupyter architecture, we encourage you to attend the first annual JupyterCon in NYC, August 22-25 of 2017.

We would also like to acknowledge and thank IBM for being a Platinum Sponsor of the NumFOCUS Foundation, which is Project Jupyter’s parent non-profit organization. NumFOCUS provides Jupyter with fiscal sponsorship, legal structure and organizational support. NumFOCUS, along with Jupyter’s formal governance model provides corporations with a varieties of way to engage with our open-source community and software development process.