Model

On this page, we hope to clarify the main ideas behind Pijul.

Why care about patch theory?

Good question. The answer is, because having a simple mental model of a tool allows to predict what it does. Pijul uses only patches, which are atomic components of team work. In Pijul, a branch is exactly a set of patches.

Many users of other distributed version control systems expect their favorite tool to work in a way that is somewhat different from how these systems really work.

In contrast, the goal of Pijul is to build a tool whose behavior matches intuition, which hopefully will allow people with different skills to work together on the same project, even at fast paces.

Pushouts and distributed algorithms

Pijul is inspired by a result from category theory, giving a hint of how to analyze systems for collaborative edition, and what a good data structure might look like.

In category theory, if two co-authors Alice and Bob edit a file at the same time, yielding versions A and B, respectively, the pushout P of their changes is a version from which anything that can be obtained from A and from B can also be obtained from P.

Such a common version does not always exist (in which case Alice and Bob’s patches conflict). However, the above paper proposes a solution to cope with such situations. In that paper, and in Pijul, the definition of a file is slightly expanded, to included these cases, and “artificially add” all pushouts of all possible patches to the space of files. The resulting structure is a graph of byte chunks (which can be for instance binary blocks, lines, or smaller parts, such as indentation).

The relation between them, described by the graph edges, is the order in which they appear in the files. This graph acts as a “cache” of applied patches, whereas darcs, being conceptually stateless, needs to check for every new patch, whether it can live together with all previously applied patches (of course, darcs as a tool is smarter than this, for instance tags and caches make it more efficient).

Efficient algorithms

Of course, this isn’t enough to build a fast system, that moreover handles general conflicts and allows to reverse patches. In Pijul, we have a strong emphasis on algorithmic complexity. In particular, we wanted none of our algorithms to have a complexity more than O(log h), where h is the size of history.

This is actually the optimal complexity, since the call to fopen required to read a patch (a commit) is also in complexity O(log h) on good file systems.

Contrarily to what we hear sometimes, Pijul is not a rewrite of darcs in Rust. The theory is different, and the algorithms have a different complexity. Then of course, Rust was chosen to gain an extra performance factor, but rewriting darcs in Rust would be a lot of work for really minor improvements (darcs is already really good, and Haskell is a great language).

How does it work in practice?

The full picture, including conflicts and multiple files, is fairly non-trivial. However, let’s consider the case of a patch P addind two lines at different positions to a single file, like changing:

A
B
C

into:

A
X
B
Y
C

In this case, the information recorded by the patch is simply that a two lines are introduced: one containing X, between A and B, and the other one containing Y between B and C. Nothing new so far. However, Pijul repositories also keep track of what patch introduced A, B and C, which allows our patch to record the information that it depends on those patches.

In Pijul, this extra information allows the user to see a repository as just a set of patches. Just as in darcs, there is no explicit /ordering/ of patches, just dependencies, computed automatically.

Benefits

Implicit branching

In Pijul, the state of a repository is exactly the (non-ordered) set of patches that were applied to it since its creation. Therefore, from a conceptual point of view, branching just means creating new patches.

Sometimes (and in some projects), it is hard to tell exactly what precise feature we’re going to work on when we start. Pijul lets you describe your edits after you’ve made them, instead of beforehand.

Of course, it is sometimes useful to keep several related versions of a project separate, and tag the patches created in them differently. Pijul implements forks for that purpose, which are essentially clones of a whole repository sharing much of the space disk and compilation products.

Free cherry-picking

Being based on patches, Pijul offers an extremely intuitive interface to cherry-picking: if Alice makes a patch that fixes an important bug, and then goes on and makes changes more experimental (even in the same “branch”), Bob can pull just the bug-fixing patch and apply it to his repository.

In systems like git or mercurial, cherry-picking is also possible, but once a commit is cherry-picked from a remote branch, that commit’s representation changes, producing artificial conflicts on further cherry-picks from that same branch.

Easy sending of patches

Because Pijul’s primary objects are patches, sending patches by any medium is natural and easy, and works exactly as if these patches were being pushed by Pijul, locally or over ssh. This means that they will be applied to the remote repository exactly where they belong.

Fast annotate/blame

Annotate/Blame are commands, respectively from darcs and git, to identify authors of portions of the files in the repository. They have historically sufferred important incompleteness and/or performance problems. They both need to lookup each line in the history of the repository/of the branch (which can be pretty big).

In contrast, Pijul has this information at hand: the patch introducing every line can be quickly looked up (in time logarithmic in the size of history), and its author(s) can be identified.