Functional semantics in Sanakirja

Thursday, March 3, 2016

I announced Sanakirja, our new database backend, on reddit and twitter, but so far I had not had the time to blog about it. This post explains some of the ideas behind this new crate.

First of all, the darcs repository is available there:

darcs get http://pijul.org/sanakirja

As is the case in Pijul, we welcome any patch/comment/contribution, sent for instance to our mailing list (we expect this unfortunate situation to end with Pijul 0.3 and its nest).

Functional semantics

In order to implement branches (as described here), we needed the ability in LMDB to “clone” just the root node (root of the B-tree) of a database, without cloning the whole thing. For Pijul, this means that forking a project can finally be in complexity O(log h), where h is the number of lines ever written in history. This complexity is actually the optimal one, since opening a file on a standard file system also has roughly that same complexity.

For other projects using Sanakirja, this means that it is possible to keep several versions of the same database without duplicating the whole contents: the two versions share all their common parts.

What we have

The version on the darcs already implements this: put doesn’t modify anything in any database, instead returning a pointer to the root of the new tree. It doesn’t have any overhead compared to LMDB, although the running time is probably not yet at the level of LMDB. In particular, we still need to figure out if our workaround for pagesize differences is necessary or not: one of the main goals of Sanakirja is to provide a file format portable between platforms.

The following example works with the current version in the darcs.

extern crate sanakirja;

fn main(){
    let env=sanakirja::Env::new("/tmp/test").unwrap();
    let mut bindings=Vec::new();
    let db_name="my cool database";
    {
        let mut txn=env.mut_txn_begin();
        let mut db=txn.open_db(db_name.as_bytes()).unwrap_or(txn.create_db());
        for i in 0..100 {
            let sx=format!("{}",i);
            let sy=format!("{}",(i*i)%17);
            db=txn.put(db,sx.as_bytes(),sy.as_bytes());
            bindings.push((sx,sy));
        }
        let mut root_db=txn.root_db();
        root_db = txn.put_db(root_db, db_name.as_bytes(), db);
        txn.set_global_root(root_db);
        txn.commit().unwrap();
    }
    let txn=env.txn_begin();
    let db=txn.open_db(db_name.as_bytes()).unwrap();
    for (ref a,ref b) in bindings {
        let bb=txn.get(db,a.as_bytes(),None).unwrap();
	println!("{:?} {:?} {:?}",a,b,bb.as_slice() == b.as_bytes());
    }
}

What’s missing

The three main things that are missing before we can start using Sanakirja in Pijul are:

Deleting/updating bindings.

This is not very hard to implement. Actually, copy-on-write allows to not worry too much about the memory layout inside B-nodes: we can just allocate in the last part of the node, and compact the node when rewriting. The only thing needed is a counter of how much space is available on each B-node (which we already have).

Reference counting / freeing

The format already has reference counting, but it doesn’t free its memory yet, mostly because we’ve not implemented commands like delete or update, which would need freeing things.

The price of large values

We also need to profile our code, and update the format to get optimal performance.

Right now, for very large values, we need to concatenate mmap’ed parts of the file together. Since pages can only be mmap’ed at addresses multiples of the page size, we could not do this for instance on SPARC, which has a different page size compared to other architectures. This has the unfortunate consequence that all nodes need to be twice the page size.

One possible workaround would be that get, on values larger than half a page, returns an iterator on bytes instead of a byte slice. Also, in this way, very large values could be read much more efficiently.