Setting up a “Production Ready” TLJH deployment

yuvipanda
Jupyter Blog
Published in
4 min readJun 4, 2021

--

The Littlest JupyterHub is an extremely capable hub distribution that I’d recommend for situations where you expect, on average, under 100 active users.

The Littlest JupyterHub is a distribution of JupyterHub for single VM instances, best-used with 1–100 users.

Why not Kubernetes?

The primary reason to use Zero to JupyterHub on k8s over TLJH in cases with a smaller number of users is to reduce costs — Kubernetes can spin down nodes when not in use. However, you’ll always have at least one node running (for the hub / proxy pods) and the extra complexity that comes with it — particularly around needing to build your own docker images — may not be worth it. TLJH works perfectly well for these cases!

What is ‘production’?

A JupyterHub that you can run securely without lots of intervention from the person who created it is what I’ll call a production-ready JupyterHub. It’s a pretty arbitrary standard. In this blog post, I’ll lay out what I want in the TLJH hubs I run before I let users on them.

Authentication

Use a real Authenticator, not the default FirstUseAuthenticator. The default authenticator is pretty insecure, and should really not be used in production. If you don’t know what to use, I’ll suggest the Google or GitHub authenticators.

Enable HTTPS

Enable HTTPS. An absolute security requirement now, and TLJH makes it quite easy. You do need to get a domain for this to work, which can be a source of friction. Totally worth it, though.

Resource Limits

In many systems, a single user can often write code that accidentally crashes the whole system. By default, TLJH doesn’t have any memory limits enforced per-user, but it is very easy to configure it to enforce memory limits. Tuning these to match your needs will help prevent a single student from accidentally taking down your whole hub. I’d highly recommend checking how much memory your typical notebook uses, and making sure you have user limits set to above that.

Sizing your VM correctly

If you choose a VM that’s too big, you’ll end up spending a lot of cash for unused resources. If it’s too small, your users will not have the resources they need to do their work. TLJH provides some helpful docs estimating your VM size, and you can always resize your VM afterwards if you get it wrong.

Disk backups

TLJH contains everything on the VM’s disk — your user environment, users’ home directories, current hub configuration, etc. It is very important you back this up, to recover in case of disasters. Automated disk snapshots from your cloud provider are an easy way to do this. Most major cloud providers offer a way to do this — Google Cloud, Digital Ocean, AWS, etc. Some let you automate it as well — Google & AWS certainly do, I’m not sure about other cloud providers. This isn’t the best way to do backup — there’s approximately 1 billion ways to do so. However, this is an absolute minimum, and it might just be enough.

If you want to be more fancy, I’d suggest using a separate disk / volume for your user home directories, possibly on ZFS, and snapshot much more aggressively. Talk to your nearest google search bar for your options.

Pin your public IP

Some cloud providers change your VM’s public IP address if you start / stop them. This can be pretty bad — you’ll have to change your domain’s DNS entry, and re-acquire HTTPS. A hassle! You can tell your cloud provider to hang on to your IP even if your VM goes down / changes. And you should! DigitalOcean doesn’t require this, but Google Cloud does. I think AWS does too, but I’m not sure how you can reserve the public IP for it — since it’s usually a domain name itself.

Base environment setup + snapshot

TLJH has a shared conda environment that is used by all users. Everyone can read from it, but only users who are admin can write to it (via sudo). This is one of TLJH’s core design trade-offs - admins can install packages the way they are used to, without requiring a separate image-build step. But it also means the admin can mess it up - conda environments can be sometimes fickle! So it’s not a bad idea to spend some time in the beginning setting everything up - python packages, JupyterLab extensions, etc. Then make a disk snapshot, so you can revert to it if things go bad. This is where having a separate disk for your user home directories comes in handy, so you can reset your hub environment without losing your user home directories.

SSH admin access

The TLJH documentation strives hard to make sure SSH isn’t required for setup and most common usage. However, if your TLJH breaks in certain ways, you can no longer access the machine — since all access is via TLJH! For this, I recommend making sure someone who is admin has SSH access to the VM. Most cloud providers offer a way to set the root ssh key on creation. If not, you can follow the many guides on the internet to making it happen.

You can also just put your ssh keys in $HOME/.ssh/authorized_keys, and ssh in as jupyter-<username>@<hub-ip>. This works for any / all users!

Others?

I’m sure this isn’t the end — probably need something about firewalls, monitoring and automated system package upgrades. But hey, great start!

--

--