Updates on nest.pijul.com

Sunday, July 2, 2017

As nest.pijul.com is slowly stabilising, I’d like to blog about a few lessons learnt during its development so far.

Problem statement

Nest.pijul.com contains servers for three different protocols: SSH, HTTP and HTTPS. I’m currently the only author, and I don’t have too much time to dedicate to this. Also, I’m pretty bad at system administration/devops, and I don’t particularly enjoy it.

I was also interested in trying out Tokio when I started, because it is fun, fast and reliable. It also solves a number of security issues easily: for instance, timeouts don’t require their own thread, which makes it harder to attack the server.

My current stack

There weren’t many things ready in Rust when I started. This is my current stack today, including things I had to write:

  • I wrote and use Thrussh for the SSH server. It is a fast and complete implementation of SSH2. Its main feature, when it comes to security, is that the standard pieces of an SSH server (allow users to open session and run arbitrary commands) is not implemented, which for a noob in sysadmin like me, sounds much better than “secured-by-the-greatest-configuration-file-ever”.

  • Back when I started, Hyper was still synchronous, hence I decided to write my own implementation of HTTP. Its main difference with other crates I had tested (such as Iron) is its asynchronous nature, and its string-based interface, which makes it easy to extend (and optimally efficient in space and time). I might switch to Hyper soon, especially since 0.11 (the first asynchronous version) is released, mostly because this kind of things needs to be tested more thoroughly than by just one application.

  • I chose to support only Rustls for the SSL layer. This might not be totally rational, but I’d like to see that crate grow. Also, after my experience writing Thrussh, I wanted to use it and help (I’ve even contributed minor changes).

  • There was no Tokio interface to PostgreSQL either, and the excellent Postgres crate was not obvious to port to Tokio. Fortunately, the Postgres protocol is quite small, and I wrote an interface, called Pleingres.

That first version was not too ergonomic in the beginning, but I complemented it with a few macros, including a compiler plugin, to make it easier to build SQL queries, while still writing SQL.

That plugin replaces field names in structs with numbers. For instance, the following will be rewritten as "SELECT id, is_active FROM users WHERE login = $1", and an instance of pleingres::Request will be generated.

#[derive(Debug)]
#[sql("SELECT id, is_active FROM users WHERE login = $user")]
pub struct UserReq {
    pub user: String,
    pub is_active: bool,
    pub id: Option<Uuid>
}

Note that this philosophy is the opposite of ORMs like Diesel, which are great, but seem a little scary when it comes to scalability: are all my queries writable in Diesel? Will I need to rewrite all my code if not?

One of the reasons of my skepticism is that some of my more complex queries for nest.pijul.com use on the order of five JOINs, and some need three intermediate tables built using WITH. Also, I usually test my queries in the postgres client to decide what I want.

  • Any webservice has to interact in one way or another with external services. As an example, it is really hard (and sometimes impossible, depending on your IP) to convince Gmail to accept your emails if you try to send them yourself. Some hosting providers don’t really care about the sender reputation of their IPs. I tried (NixOS makes it really easy to install a secure SMTP server), failed, and after a while decided to use Amazon SES. I wrote an interface for it, which can send multipart (HTML) emails.

  • Another way to interact with external services is to use OAuth. Some of your users might actually want you to send a little notification to Google, Github, Facebook, Twitter… that you’re trying to access your account. It would be hard to provide a reusable interface, since all these services have to be handled by your application via callback URLs, but I tried to at least add types, which make it easier to read their docs: Google OAuth and Twitter OAuth.

One issue with using the Github OAuth API from an asynchronous server is that some browsers (including Firefox and Chromium) sometimes send the same request twice, aborting one of them seemingly randomly. Now, the Github API sends tokens and requires more than one step. If both requests sent by the browser call the Github API, they can invalidate each other’s token.

  • My router is hardcoded, using match and strings. Each page is served by a function, and the routing graph is the server’s call graph. The router part itself is actually split between several modules.

  • As for templates, I initially started with mustache, using the Mustache crate. I even wrote a macro to force reloading on every request in debug mode (but obviously not in release mode). However, as my template directory grew, keeping it organised became challenging. Also, Mustache can panic at runtime on some types, which is scary.

I’ve recently switched to Maud, which takes a while to compile, but checks types, and makes it easier to reuse code (even though I was using Mustache’s includes).

  • I also use the combination of Pulldown-cmark for Markdown processing, and Ammonia for sanitising the output.

And for system administration?

Securing a system is hard. There are many components, sometimes moving fast, sometimes with breaking changes. There are several configuration files for each of them, with different syntaxes. Except when this is your main job, these files tend to break at the moment you’ve completely forgotten their syntax. Configuring and installing a new machine is not atomic, and not reversible.

Fortunately, I use a tool called NixOps to solve all these problems. Nix is a small programming language to describe machine configurations. Every deployment uses a form of copy-on-write, which means that deployments (new packages/configuration changes) are atomic.

There is no container, and it is super easy to test a future deployment locally.

The only problem was, in order to guarantee reproducibility, the interface with Rust had to recompile everything from scratch with every deployment. Even a small mistake (for instance a new file I would forget to add in a commit) would cause a full rebuild, in release mode, of all dependencies (which would take about half an hour).

This is why I wrote a tool called Nix-Rust to reuse already built crates, and even share them between projects. Also, I use Nicolas Pierron’s overlay to get different version of Rust. And I use nightly, initially to get impl trait, but now also for my SQL syntax extension.