Modular monorepos

Friday, January 7, 2022
By Pierre-Étienne Meunier

A recent change introduced a new structure in Pijul repositories, allowing one to merge unrelated repositories while preserving commutation across the different source repos.

Pijul’s main internal datastructure is essentially a graph where some of the vertices represent file names, and others represent byte chunks. Patches add and delete some vertices in that graph. Most algorithms start from a specific vertex present in all repositories and called the “root” vertex.

Now, the new change is actually quite simple to describe: instead of having files and directories immediately below the root vertex, we add one extra level immediately below it. This isn’t really different from an unnamed subdirectory at the root of repositories. The consequences on a number of algorithms are actually nontrivial, since the root stops being unique, a hypothesis that was used in a number of places.

Important note: not all commands related to this feature are implemented. However, implementing it now will allow us to avoid changing the format again after announcing a stable version.

Commutation across repositories

What this changes allows to do is, since root directories are no different from regular ones, we can now clone a repository A “into” a subdirectory of another repository B, by simply applying all patches from A to B, and then moving all the new roots A had into a subdirectory of B.

Then, when a new patch p is created in A, p doesn’t need to know about B’s tree. Knowing about A’s root is enough to make it directly applicable to the files in B, even after moving them to the proper subdirectory.

One minor inconvenience

This change is fairly transparent from a user point of view, except for one thing: if the first patch in a repository creates the root, we don’t want that patch to introduce any other file, for else subsequent file additions would depend on the patch introducing the root. Indeed, if that patch also adds a file a, later patches would depend on that initial patch, which could easily be mistaken as later file additions depending on the addition of a.

For that reason, the first record in a repository creates two patches: one introducing a root vertex, and another one actually doing the more intuitive records.

Backwards-compatibility

This change is backwards-compatible: repositories without the extra root vertex can still be handled by our algorithms, and the next record will introduce a root and move the files.