What is a version control system?

A version control system is a piece of software that allows different authors to work collaboratively and asynchronously on a file, keeps track of their changes and alerts them when their edits are conflicting.

It can also be used by single authors to review and revert their changes.

It is distinct from a parallel editor, in which all authors edit the same file concurrently. In a parallel editor, the authors are forced to share all their edits with others, often restricting their creativity and causing data loss.

Why a new version control system?

There are basically two approaches to version control: either snapshot-based (git, mercurial, svn or their ancestors), or patch-based (darcs). Historically, patch-based systems have been very simple to learn and use, but slow, whereas snapshot-based systems can be extremely fast, but are usually hard to use for more than simple operations. As an example, cherry-picking is not intuitive in git/mercurial, and merging is sometimes plain wrong.

Pijul combines these approaches, by representing its data in a more general way than snapshot-based systems, that allows for a patch-based interface.

What’s wrong with existing, widely-used, mature solutions?

Some things are really wrong, like using three-way merge for distributed version control: there are examples (even real-world ones) where Git, Mercurial and SVN just do the wrong thing. See this article for more details.

Less objectively, our experience with patch-based tools make us believe that this is possibly the simplest way to control versions.

Where does the name come from?

Pijul is the mexican name of Crotophaga sulcirostris, a bird known to do collaborative nest building.

How does it compare to others?

It improves on darcs by speed, support for branches, and better support of security features. Compared to git/mercurial, the workflows are quite different:

  • In Pijul, when you work with others, you often type commands such as “send my changes to others”.
  • In Git/Mercurial, one of the most common workflows is called “pull request”, by which you instruct the system to “ask the project to compare their current version with the version I have produced since the last time we did this, then compute the minimal set of changes that would “look consistent” with our changes”, where “consistent” is not what you think.

Did you solve the “exponential merge problem” darcs has?

Yes, we solved the exponential merge problem. There are two minor caveats, though:

  • Pijul does not (yet) have an equivalent of darcs replace. In other words, Pijul works in polynomial time for all patches that systems other than darcs know of. We’ve not yet thought all the theory of this through, but it might be added in the future.

  • Although most patches are inversible in Pijul, patches resolving conflicts are not. They can still be unrecorded, but not rollbacked (the standard inverses delete lines). It is currently unclear whether this will become possible.

Is it possible to refer to a specific version?

Yes! Although nothing is implement so far, we currently see two possibilities:

  • The first one is a “git mode”, in which patches are explicitly ordered, by remembering the latest patch applied to a repository, and adding explicit dependencies to all subsequent patches. In this model, a patch hash contains enough information to compute the whole state: indeed, that state is just the transitive closure of dependencies of a single patch. However, this would yield an algorithm in O(|P|), where P is the set of patches in the given states.

  • The other possibility (suggested by Nicolas Schabanel) is to use a linear hash function, such as a bitwise xor, to compute unique universal identifiers for sets of patches. Such a hash function would be cryptographically weak for the set itself, but generating collisions would require crafting patches with a given hash, which is hard. Then, these identifiers can be easily mapped to sets of patches by Sanakirja (our backend), which supports sharing between different versions of a map.

    This is non-trivial, since it requires exchanging these version identifiers when patches are pulled, in order to make sure all version numbers are known on all repositories that are in the same state (but we already need to compute this for other reasons).

What is the license of Pijul?

Pijul uses new scientific research to make version control distributed again. In research, results are hard to get, but we thought our ideal of a censorship-free, downtime-free code hosting tool was worth it.

This restricts our choice of a license to AGPL 3, since anyone wanting to centralize Pijul will have to give back their source code, so that others can run the same distributed service.

This means in particular that Pijul is free software, and can therefore be used freely to develop any kind of projects, including commercial ones.

But maybe we’ve missed something, and the AGPL actually prevents some use of Pijul that we’ve not thought of, and that does not aim at centralizing the internet. If this is the case, please discuss your idea with us on the mailing list.

Pijul is trivial for whoever knows category theory.

This is relatively false. Category theory has certainly been an inspiration for Pijul, but categories are neither algorithms nor data structures in themselves. In order to get the semantics we wanted, especially the handling of multiple files, rollbacks and unrecords, designing and implementing new algorithms and data structures was at least as useful as learning theoretical stuff.

Can Pijul handle large files?

To some extent, yes, and even sometimes faster than others (see our performance page). However, there is no support right now for binary files, but there will be one very soon. The main issues are the diff algorithm, and the handling of conflicts.

Does Pijul use lock files?

This one generally comes from darcs users whose remote server has crashed. We do use lock files, but in a completely transparent way: a Pijul user never needs to interact with them. Of course, concurrent application of very large patches exclude each other. See our performance page to see whether this is a problem for your particular case.

Pijul 0.2 is not as fast as advertised / as Pijul 0.1

Version 0.2 is not really fast, essentially because it uses an optimal diff algorithm, which is really costly when files get big (equivalent to git-diff --minimal). This will change very soon.

Is Pijul interoperable with other systems?

Not yet, although the darcs team is working on it. Pijul’s patches do not store exactly the same information as in other systems. However, since Pijul generalizes both git/mercurial/bazaar/svn and darcs, it should not be too hard to convert our patches to these tools.

If you’re interested in the task, please contact us: the patch format is not 100% stable and fixed yet.

Why not a git merge algorithm instead of a whole VCS?

Indeed, we could have attempted to fix just that one bug in git merge where merging a whole branch at once, or all its commits individually, doesn’t always do the same.

Git contains the whole history of a project, so that’s clearly possible. However, that would mean losing the patch-based nature of Pijul, which we think is more intuitive than the commits one (at the cost of being algorithmically more challenging).

In particular, cherry-picking in Pijul is done in such a way that you can cherry-pick from a branch, and then later merge other patches from that same branch. In git, cherry-picking means reorganizing history in such a way that the commits don’t have the same identity after the cherry-pick.

Moreover, git users sometimes report problems related to committing the inverse of a merge commit. In Pijul, there are no explicit “merge patches”, merging patches from a branch is equivalent to pulling from the branch.

Do files merged by Pijul always have the correct semantic?

No. Semantics depends on the particular language you’re using, and Pijul doesn’t know about them.