A new direction for Pijul's hosting service

Wednesday, May 24, 2023
By Pierre-Étienne Meunier

I gave a talk at GOTO Aarhus yesterday, where I announced a change of direction for the Nest, the hosting service we’ve been running for a few years already. Its new incarnation will be open source and serverless.

Some context

The first version of the Nest, using old versions of the formats and algorithms, started operating in 2016. The goal at the time was to make it easy to test Pijul at scale, in particular the apply and unrecord algorithms, as well as the protocol. This goal has been achieved, as we have seen less and less failures over time.

Other secondary goals included:

When the new repository format came out with the alpha release in November 2020, I obviously had to rewrite large parts of the Nest to account for it. Then, the OVH Strasbourg fire in March 2021 changed priorities a little bit and prompted for the development of a better replication/backup strategy. This was completed during the two weeks outage following the fire.

I described earlier how this architecture works. It has been working fine for about two years, except for one thing: database replication. The repository replication hasn’t given any single problem, but the database replication cannot be made stable. This is partly due to the fact that we’re using the backup servers as local caches to speed some transactions up. And also possibly due to the undeprovisionning of the machines on which these things run.

Concretely, for the three machines currently used by the Nest (one in France, one in Canada, one in Singapore), this currently results in switchovers as soon as the leader machine is under too heavy a load to send its heartbeats on time to the others. As a side-effect, the Postgres server on the leader fails to send its “WAL” files, which can cause occasional crashes and data loss, not acceptable if this service were to be used industrially.

While provisioning bigger machines sounds like an obvious fix, it doesn’t feel right: not enough computing power should result in delays, lost connections and downtime, not in data loss. Not to mention the Rube Golberg machine that would result from using an orchestrator to manage our tricky replication setup.

What’s next?

So, we don’t just want to “fix” the Nest, we also want to move on, since its goal has been fulfilled. Our next target is to build a service that:

Today I’m please to announce a project ticking all these boxes. Indeed, the new Nest is a collection of TypeScript programs running on Cloudflare Workers (Cloudflare’s FaaS solution), plus some WASM code to fake interactions with an actual Pijul repositories. The choice of Cloudflare is somewhat arbitrary, and we would like to make our code generic enough to be run on other platforms.

Abusing FaaS key-value stores

One major challenge for FaaS scripts without any access to an actual hard drive is to fake being a full-blown Pijul repository. Fortunately, Pijul is built on top of Sanakirja, a highly generic storage layer (in addition to being faster than other key-value stores). Sanakirja has a number of advantages for this job:

Pushing changes to a repo

There is still very little documentation on this new setup, but now that we have a working prototype we will expand the manual to include documentation on this.

Meanwhile, here’s a basic help: the domain meant to interact with the Pijul CLI tool is dot.pijul.org. You can authenticate with your signing key, by installing the latest Pijul beta (1.0.0-beta.5) and then adding something like the following to .pijul/config in your repository:

# Allows you to just use `pijul push`
default_remote = "nest"

[[remotes]]
name = "nest"

# The address of my repository, in this case pmeunier/nest (adjust that line to your own repos.
http = "https://dot.pijul.org/pmeunier/nest"

# This line uses your patch signing keys to authenticate with the Nest.
headers.Authorization = { shell = "pijul client https://nest.pijul.org/auth" }

Release schedule and self-hosting

By its serverless design, this project is split into a number of packages, most of them responsible for an entire feature. All of these services will ultimately be released under the AGPL-3.0 license. I’ll start releasing these one by one, starting with the UI today.

Also, we’re now offering pro accounts (5€/month), allowing Nest users to define private repositories without a storage limit (storage is billed independently above 100Mb, at 0.01€/Gb·day).

Bugs and contributions

This is an entire new design, in particular using Cloudflare Workers in new and different ways (building large datastructures on top of their platform). Obviously, there will be bugs. Please be patient while we fix them.

Also, we welcome contributions, feel free to join on our Zulip and Discourse.