Pijul 0.4, Improvements and breaking changes

Sunday, April 2, 2017

Today, we released a new version of Pijul. Still alpha, but lots of updates. Also, still GPL2+ (as in version 0.3, but unlike before).

This version comes with breaking changes, not only in the repository format (that wouldn’t really be breaking), but also in the patch format. Since Pijul is versioned with itself, we’ve been forced to figure out what to do for our own repositories.

In the first two weeks of a public, usable Pijul, we’ve received tons of comments, most of them positive, encouraging and forgiving. We’re really grateful to everyone for that. We’re also using the nest a lot, and monitoring it quite closely to detect potential performance improvements.

In Pijul 0.4, we tried to implement all breaking changes implied by the feedback we’ve received, and our own use of Pijul.

Fast patch load (breaking everything)

Often in Pijul (and very often on the Nest), we want to load just the header of a patch, i.e. its name, authors, timestamp and dependencies. Since Pijul 0.3 was mostly designed when serde wasn’t available on Rust stable, we were still using the rustc-serialize crate, which did the job well, but was not nearly as flexible as serde. So I tried serde instead.

Switching took about ten minutes, and all tests passed immediately, which tells a lot about how easy it is to manage a medium-size project in Rust.

But, as we were using CBOR to store our patches, the change to serde broke some things in the patch format, for some reason we’ve not investigated. One possible cause is that CBOR has many different ways to encode the same things, and they are not really in line with the Rust memory layout.

Now, since the format was already broken, and we don’t ever want to change the patch format again, I decided to also switch to bincode. That format is simpler than anything I had seen before, and works well for our use case. Moreover, it is so minimalist that if it were to change in the future, we could easily maintain our own backwards-compatible fork.

Therefore, the guarantee we’d like to have for Pijul 0.4 is that the whole history can always be recovered by cloning the repository locally. This is unfortunately impossible between 0.3 and 0.4 repositories, because of the change in the patch format.

Monorepos

Pijul’s focus on patches (as opposed to snapshots) could allow it to handle giant repositories with a very large number of files, like in the case of “monorepos”. In particular, even though the commands are not yet implemented, the theory and algorithms allow for partial checkouts. “Partial” here means cloning just a subfolder of a repository, and still producing patches that can interact with the rest of the repository.

Unfortunately, the datastructures in the 0.3 repository format were not quite ready for that. In particular, the mechanism for cloning locally or via HTTP involves the .pijul/changes.… files (similar to the index in Git), which had to be downloaded and loaded each time.

The new format changes the complexity of:

  • applying/unrecording a patch: writing the index is in O(1).
  • pulling: linear in the number of new patches.
  • pushing: linear in the size of the index, but with a much smaller constant than before.

And the format is also much smaller to download, since it is only edited using memory maps, and no large part of the file ever “shifts”.

Other changes

Pijul 0.3 still had a number of important bugs when we released. Many have been fixed:

  • adding a huge number of files/applying huge patches didn’t work.
  • the diff algorithm was always quadratic, it is now linear when a file has not changed.
  • some paths were mixed up.
  • the file names were not printed when recording, and the output of “pijul record” was confusing.
  • nothing worked on Windows, and now Pijul compiles fine on Windows 10 / VS2017. I’m not committing to building binaries for every single patch now, but I’ll try at least for minor versions.
  • RSA keys were not accepted, now they are (it is not well tested on the Nest, though).

Important missing features

The main missing feature in Pijul 0.4 is the correct and easy handling of SSH keys. This limitation actually comes from Thrussh.

We’d like to make it easy for all users (on all platforms) to use SSH keys with Pijul, but we’re not there yet. I’ve added RSA support to Thrussh since Pijul 0.3, but loading keys is still painful.

If you have any idea how to do that, in particular on Windows, we’d like to hear from you! Use the issues pages on the nest, for example, or the Rust reddit, or tweet @pijul_org.

About the Nest

The Nest is our hosting facility for Pijul repositories. In the future, we want to make it easier than ever to anyone to collaborate on any project, in a planned (”let’s create a branch”) or unplanned (”oops! this really was a new feature”) way. The hope is that Pijul and the Nest can be infinitely forgiving, allowing people to make mistakes, and recover from anything with one simple command.

Learning web, and learning emails

Before writing the Nest, I was a complete beginner at web applications. In some way, Rust is, too!

The Nest is based on a number of cool crates:

  • Everything is asynchronous with Tokio.
  • TLS is handled by Rustls, itself based on *ring*.
  • SSH is handled by Thrussh.
  • Our HTTP stack is home-grown and ugly, but fully asynchronous and really fast, with only two allocations per connection.
  • Our PostgreSQL client is home-grown too, but could be open-sourced if anyone wants to maintain it for more than our own needs. Feel free to contact me!

I’d like to thank the Rust community, and in particular the authors of Rust and Tokio, for making it so easy to build stuff this fast so easily. One thing I had not envisioned though, is that no matter how fearless our concurrency is, no matter how subtle our patch theory is or how efficient our algorithms can be, it is 2017, and sending email, in particular to gmail users, is still incredibly hard.

The current solution in the Nest uses Amazon’s SES service. Yet another protocol to learn, still doing the same old thing: sending emails…

Bad news for Nest users

The recent changes in the patch format are really bad, as they mean that Nest users have to reset the (luckily small) history of all their repositories to be able to interact with the Nest again. For all the repositories we maintain, we have reset their history by deleting the .pijul. All repositories on the Nest have been emptied, which does not contradict our (quite harsh) terms of service.

As the user (pijul_org) with the most repositories on the Nest, we know how it feels, and we are sorry.

This is hopefully the only time we will ever do this, as the patch and repository formats were designed before most commands, and have been carefully redesigned in the past few days to take our current vision for Pijul into account.