Format of a Pijul repository

When running tree in a Pijul repository with one patch, we get the following result:

.pijul
├── changes.wUAdAoNq
├── hooks
├── id
├── local
│   └── ignore
├── meta.toml
├── patches
│   ├── AKm4JUd3Z5LRLpg1jezgmhLWGyzR27Qv9ATDFSb7QquqvtMwqEoNiiuVP1u2ho7V8qbprRCRPsmGY6bumYjoH8JZ.gz
│   └── AKm4JUd3Z5LRLpg1jezgmhLWGyzR27Qv9ATDFSb7QquqvtMwqEoNiiuVP1u2ho7V8qbprRCRPsmGY6bumYjoH8JZ.sig
├── pristine
│   ├── db
│   └── db.lock
└── version

4 directories, 9 files

There are a number of files:

Pristine

This directory contains two files, one called db and another one called db.lock. File db is a Sanakirja database, which is essentially a collection of B trees stored in a file. Sanakirja databases have the following properties:

  • They are transactional: committing a transaction is an atomic operation, and transactions can be cancelled entirely.

  • They achieve transactionality using the copy-on-write strategy, which enables readers to read in parallel to one writer. Before a transaction is committed, readers only see the pre-commit version of the database. Moreover, multiple writers still exclude each other.

  • B trees of size n are clonable in time and space O(log n). In practice, the complexity of cloning is often much lower (i.e. independent from the database size), depending on the history of the database.

The .lock file is used to exclude multiple processes to access db concurrently, since there is no inter-process transaction synchronisation in Sanakirja at the moment.

Pijul stores a number of tables, defined in module libpijul::backend:


# #![allow(unused_variables)]
#fn main() {
pub struct Dbs {
    /// A map of the files in the working copy.
    tree: sanakirja::Db<self::file_id::UnsafeFileId, self::inode::Inode>,
    /// The reverse of tree.
    revtree: sanakirja::Db<self::inode::Inode, self::file_id::UnsafeFileId>,
    /// A map from inodes (in tree) to keys in branches.
    inodes: sanakirja::Db<self::inode::Inode, self::file_header::FileHeader>,
    /// The reverse of inodes, minus the header.
    revinodes: sanakirja::Db<self::key::Key<PatchId>, self::inode::Inode>,
    /// Text contents of keys.
    contents: sanakirja::Db<self::key::Key<PatchId>, sanakirja::value::UnsafeValue>,
    /// A map from external patch hashes to internal ids.
    internal: sanakirja::Db<self::hash::UnsafeHash, self::patch_id::PatchId>,
    /// The reverse of internal.
    external: sanakirja::Db<self::patch_id::PatchId, self::hash::UnsafeHash>,
    /// A reverse map of patch dependencies, i.e. (k,v) is in this map
    /// means that v depends on k.
    revdep: sanakirja::Db<self::patch_id::PatchId, self::patch_id::PatchId>,
    /// A map from branch names to graphs.
    branches:
        sanakirja::Db<self::small_string::UnsafeSmallStr, (NodesDb, PatchSet, RevPatchSet, u64)>,
    /// A map of edges to patches that remove them.
    cemetery: sanakirja::Db<(self::key::Key<PatchId>, self::edge::Edge), self::patch_id::PatchId>,
    /// Dependencies
    dep: sanakirja::Db<self::patch_id::PatchId, self::patch_id::PatchId>,
    /// Files touched by patches.
    touched_files: sanakirja::Db<self::key::Key<PatchId>, self::patch_id::PatchId>,
    /// Partial checkouts: branch -> partial
    partials: sanakirja::Db<self::small_string::UnsafeSmallStr, self::key::Key<PatchId>>,
}
#}

Let's look at each of these tables:


# #![allow(unused_variables)]
#fn main() {
    /// A map of the files in the working copy.
    tree: sanakirja::Db<self::file_id::UnsafeFileId, self::inode::Inode>,
    /// The reverse of tree.
    revtree: sanakirja::Db<self::inode::Inode, self::file_id::UnsafeFileId>,
#}

The "tree" and "revtree" tables are used to represent the working directory tree, and assign "inodes", which are just arbitrary integers, to files and directories.


# #![allow(unused_variables)]
#fn main() {
    /// A map from inodes (in tree) to keys in branches.
    inodes: sanakirja::Db<self::inode::Inode, self::file_header::FileHeader>,
    /// The reverse of inodes, minus the header.
    revinodes: sanakirja::Db<self::key::Key<PatchId>, self::inode::Inode>,
    /// Text contents of keys.
#}

The inodes and revinodes table map inodes (i.e. files in the working directory) to vertices in the graph, plus states of the file. The values in the inodes table have type FileHeader, which contains an inode, plus the pre-record "state" of the file, i.e. what the user did with the file recently.


# #![allow(unused_variables)]
#fn main() {
    contents: sanakirja::Db<self::key::Key<PatchId>, sanakirja::value::UnsafeValue>,
#}

The contents table just maps nodes in the graph to their binary contents.


# #![allow(unused_variables)]
#fn main() {
    /// A map from external patch hashes to internal ids.
    internal: sanakirja::Db<self::hash::UnsafeHash, self::patch_id::PatchId>,
    /// The reverse of internal.
    external: sanakirja::Db<self::patch_id::PatchId, self::hash::UnsafeHash>,
#}

The internal and external tables map internal patch ids (64-bit integers) to external patches (SHA512 hashes), in both directions.


# #![allow(unused_variables)]
#fn main() {
    /// A reverse map of patch dependencies, i.e. (k,v) is in this map
    /// means that v depends on k.
    revdep: sanakirja::Db<self::patch_id::PatchId, self::patch_id::PatchId>,
#}

The revdep table contains a reverse mapping of the dependencies. Dependencies of a patch are stored in the patch file itself, but the unrecord command needs to know whether a patch is still depended upon before accepting to remove it. This table provides that information.


# #![allow(unused_variables)]
#fn main() {
    /// A map from branch names to graphs.
    branches:
        sanakirja::Db<self::small_string::UnsafeSmallStr, (NodesDb, PatchSet, RevPatchSet, u64)>,
#}

The branches table is a map from branch names to a triple of:

  • A NodesDB, representing the graph of the branch.
  • A PatchSet, mapping internal patch identifiers to their sequence number (a 64-bit integer) on the branch.
  • A RevPatchSet, mapping sequence numbers to internal patch identifiers. In practice, this table is implementing a randomly-addressable stack.

# #![allow(unused_variables)]
#fn main() {
    /// A map of edges to patches that remove them.
    cemetery: sanakirja::Db<(self::key::Key<PatchId>, self::edge::Edge), self::patch_id::PatchId>,
#}

The cemetery table is used in unrecord, and is used to indicate what to do in the situation where two patches change the label of the same edge, and we unrecord one of them. What should be the label of the resulting edge? The cemetery table stores which patch deletes which edge.

This table duplicates information contained in the patches, but makes things more self-contained and faster to read when needed.


# #![allow(unused_variables)]
#fn main() {
    /// Dependencies
    dep: sanakirja::Db<self::patch_id::PatchId, self::patch_id::PatchId>,
#}

The dep table maps internal patch identifiers to their dependencies. This piece of information is also contained in the patches, but having it here makes the database more self-contained and faster to access.


# #![allow(unused_variables)]
#fn main() {
    /// Files touched by patches.
    touched_files: sanakirja::Db<self::key::Key<PatchId>, self::patch_id::PatchId>,
#}

The touched_files table maps vertices in the graph (vertices representing files) to patch identifiers. It is used to tell which patches to pull if we only want the subset of patches that touched a certain file or directory in the repository.


# #![allow(unused_variables)]
#fn main() {
    /// Partial checkouts: branch -> partial
    partials: sanakirja::Db<self::small_string::UnsafeSmallStr, self::key::Key<PatchId>>,
#}

The partials table maps branch names to the subset of the repository tracked by this branch. When we do a partial clone, we fully apply all the patches related to the part of the repository we want to get. However, these patches might also touch other parts that we don't want to output. This table tells what parts of the repository we want to work on in that case.