Sanakirja

Starting from our next release (0.3), Sanakirja will be the database backend of Pijul. We were using Symas’ excellent LMDB library, but were forced to change, mostly because we needed to clone databases, storing only the differences between versions.

The documentation is here.

A preview version (nothing stable yet, not even the file format) is available from our darcs:

darcs get http://pijul.org/sanakirja

I repeat: this is not ready for production.

Performance

Here is a quick benchmark of LevelDB, LMDB and Sanakirja doing different operations, on a linux 64 bits, in a tmpfs.

The code for these benchmarks can be found there:

darcs get https://pijul.org/sanakirja_benchmarks

The following graph shows, for each n, the time spent in “put” functions after n bindings have been inserted:

Then, the following graph shows, for each n, the time spent in “get” functions after n bindings have been retrieved:

Our design is quite close to LMDB’s. The difference in performance might be due to our using B trees instead of their B+ trees (the tree is probably about half as high in this benchmark, even though there might be more nodes in total). More benchmarks would be needed for the more Pijul-realistic case of many values for a single key.

Come talk to us on irc.freenode.net, #pijul, if you know how to improve our design.

Name

By the way, “sanakirja” means “dictionary” in Finnish, as it is a dictionary datastructure, which was designed and written in Finland, as the logo below indicates: