Why Pijul?

Pijul for Git users

The main difference between Pijul and Git is that Pijul deals with changes (or patches), whereas Git deals only with snapshots (or versions).

There are several advantages to using patches. First, patches are the intuitive atomic unit of work. As such, they are easier to understand than commits. And actually, Git users often reason in terms of patches, displaying commits as differences between snapshots.

Patches can be merged according to intuitive formal axioms (more on that below). This yields several nice properties, which we explain now.

Patch commutation

In Pijul, for any two patches A and B, either A and B commute, (in other words, A and B can be applied in any order), or A depends on B, or B depends on A.

  • [Use case: early stage of a project] Patch commutation makes Pijul a highly forgiving system, as you can "unapply" (or "unrecord") changes made in the past, without having to change the identity of new patches (without "rebasing", in git terms).

    This tends to happen pretty often in the early stages of a project, when most things are still uncertain. With Pijul, exploring new features and new implementations comes at no extra cost in time.

    Git, however, would prompt you to think carefully before sharing a branch with others, that contain many new features.

  • [Use case: the project becomes stable] As your project grows, patch commutation saves even more time: imagine a project with two main branches, a stable one containing only the original product, and bugfixes, and an unstable one, where new features are constantly added.

    The team working on the unstable branch is likely to discover old bugs, and fix them in the stable branch too.

In Pijul, maintainers of the stable branch can simply pull only the patches they are interested in. Those patches do not change when imported, which means that pulling new patches in will work just as expected.

In Git, importing just a few commits from another branch is called "cherry-picking", and involves changing the identity (the hash) of those commits. This usually works the first time. However, when done again, as the maintainer of a stable branch probably wants to do, this often causes counter-intuitive conflicts. The reason is that Git has no way to know that these changes come from the same commit, since the commit has changed its identity after being cherry-picked.

Associativity

In Pijul, patch applications is an associate operation, meaning that applying some patch A, and then a set of patches (BC) at once, yields the same result as applying (AB) first, and then C.

With branches, the first scenario looks like this: Bob creates A (the orange commit), while Alice creates B (in blue), C (in green), and Bob finally merges both B and C at once.

The second scenario would look like the following, with Bob creating commit A, and then pulling B. At that moment, Bob has both A and B on his branch, and wants to pull B from Alice.

Note that this is different from patch reordering: here, we apply A, then B, then C, in the same order in both scenarios.

Using math words such as "associative" for such a simple operation may sound like nitpicking, because intuition suggests that it should always be the case. However, Git doesn't guarantee that property, even if A, B, and C do not conflict (see this example for more).

Predictable merges

The theory behind Pijul is relatively simple, and can be roughly seen as coauthors trying to edit a graph of lines collaboratively, by only adding new vertices and edges, or changing the labels of existing edges.

A little more work is required to show that this can be made into a fast system with the properties we want on patches, but that's about it.

The definition of merging two independent patches A and B into repository R is also quite simple: R+{A, B} is the unique smallest repository containing all of R, plus all the changes of A and B.

This makes Pijul extremely predictable (modulo a few possible remaining bugs, this project is still beta).

This stands in contrast with Git, which uses heuristics for a number of important operations:

  • In Git, even if one is careful enough to plan branches in advance, the solution to a merge is not guaranteed to be unique. Git just picks one arbitrarily. Occasionally, Git will merge lines in unexpected places, like in this example. This is pretty bad, and can happen on real life source code.

    On security-sensitive code, this might even turn into serious financial problems, even if all contributors are trustworthy.

    The solution to this is to ignore the testing efforts made on both branches, and restart the testing-and-debugging loop from scratch, with potentially serious implications in terms of time and money.

  • Git has no true notion of files, and uses heuristics to recover files from changes. This is certainly better than its predecessors (such as SVN), but sometimes causes artificial conflicts.

Pijul for Darcs users

Pijul is mostly a formally correct version of Darcs' theory of patches, as well as a new algorithm for merging changes. Its main innovation compared to Darcs is to use a better data structure for its pristine, allowing for:

  • A sane representation of conflicts: Pijul's pristine is stored in a "conflict-tolerant" data structure. Many patches can be applied to it, and the presence or absence of conflicts are only computed afterwards, by looking at the pristine.

  • Fast algorithms: Pijul's pristine can be seen as a "cache" of applied patches, to which new patches can be applied directly, without having to compute anything on the repository's history.

However, Pijul's pristine format was designed to comply with axioms on a specific set of operations only. As a result, some of darcs' features, such as darcs replace, are not (yet) available.

Better conflicts for everyone

Conflicts are a normal thing in the internal representation of a Pijul repository. Actually, after applying new patches, we even have to compute where the conflicts are.

In particular, patches editing sides of a conflict can be applied without resolving the conflict. This guarantees that no information ever gets lost.

This is different from both Git and Darcs:

  • In Git, conflicts are not really handled after they are output to files. For instance, if one commits just after a conflict occurs, git will commit the entire conflict (including markers).

  • In Darcs, conflicts can lead to the exponential merge problem, which might cause it to take several hours to merge even a two-lines patch.