On the road to Irmin v2

by Thomas Gazagnaire on May 13th, 2019

Over the past few months, we have been heavily engaged in release engineering the Irmin 2.0 release, which covers multiple years of work on all of its constituent elements. We first began Irmin in late 2013 to act as a Git-like distributed and branchable storage substrate that would let us escape the perils of POSIX filesystems.

The Irmin libraries provide snapshotting, branching and merging operations over storage and can communicate via Git both on-disk and remotely. Irmin today therefore consists of many discrete OCaml libraries that compose together to form a set of mergeable data structures that can be used in MirageOS unikernels and normal OCaml daemons such as Tezos.

In this blog post, we wanted to explain some of the release engineering ongoing, and to highlight some areas where we could use help from the community to test out pieces (and hopefully find your own uses in your own infrastructure for it). The overall effort is tracked in mirage/irmin#658, so feel free to comment on there as well.

ocaml-git

Irmin is parameterised over the exact communication mechanisms it uses between nodes, both as an on-disk format and also the remoting protocol. The most important concrete implementation is Git, which has turned into the world’s most popular version control system. In order to seamlessly integrate with Irmin, we embarked on an effort to build a complete re-implementation of Git from scratch in pure OCaml.

You can read details of the git 2.0 release on this blog, but from a release engineering perspective we have steadily been fixing corner cases in this implementation. The development ocaml-git trees feature fixes to https+git, for listing remotes, supporting authenticated URIs and more.

These fixes are possible because users tried end-to-end usecases that found these corner cases, so we’d really like to see more. For example, our friends at Robur have submitted fixes from their integration of it into their upcoming CalDAV engine. The Mirage canopy blog engine can now also push/pull reliably from pure MirageOS unikernels between nodes, which is a huge step.

If you get a chance to try ocaml-git in your infrastructure, please let us know how you get along as we prepare a release of the git libraries with all these fixes (which will be used in Irmin 2.0).

Wodan

Irmin’s storage layer is also well abstracted, so backends other than a Unix filesystem or Git are supported. Irmin can run in highly diverse and OS-free environments, and so we began engineering the Wodan filesystem as a domain-specific filesystem designed for MirageOS, Irmin and modern flash drives. See the OCaml Workshop 2017 abstract on it for more design rationale)

As part of the Irmin 2.0 release, Wodan is also being prepared for a release, and you can find Irmin 2.0 support in the source. If you’d like a standalone block-device based persistence environment for Irmin, please try this out. This is the preferred backend for using Irmin storage in a unikernel.

Tezos and irmin-pack

Another big user of Irmin is the Tezos blockchain, and we have been optimising the persistent space usage of Irmin as their network grows. Because Tezos doesn’t require full Git format support, we created a hybrid backend that grabs the best bits of Git (e.g. the packfile mechanism) and engineered a domain-specific backend tailored for Tezos usage. Crucially, because of the way Irmin is split into clean libraries and OCaml modules, we only had to modify a small part of the codebase and could also re-use elements of the Git 2.0 engineering effort we described above.

The irmin-pack backend is currently being reviewed and integrated ahead of Irmin 2.0 to provide a significant improvement in disk usage -- more information to come soon. There is a corresponding Tezos branch using the Irmin 2.0 code that will be integrated downstream in Tezos once we complete the Irmin 2.0 tests.

Irmin-GraphQL and “browser Irmin”

Another new area of huge interest to us is GraphQL in order to provide frontends a rich query language for Irmin hosted applications. Irmin 2.0 includes a builtin GraphQL server so you can manipulate your Git repo via GraphQL.

If you are interested in (for example) compiling elements of Irmin to JavaScript or wasm, for usage in frontends, then the Irmin 2.0 release makes it significantly easier to support this architecture. We’ve already seen some exploratory efforts report issues when doing this, and we’ve had it working ourselves in Irmin 1.0 Cuekeeper so we are excited by the potential power of applications built using this model. If you have ideas/questions, please get in touch on the issue tracker with your usecase.

This post is just the precursor to the Irmin 2.0 release, so expect to hear more about it in the coming weeks and months. This is primarily a call for help from early adopters interested in helping the project out. All of our code is liberally licensed open source, and so this is a good time to tie together end-to-end usecases and help ensure we don’t make any decisions in Irmin 2.0 that go counter to some product you’d like to build. That’s only possible with your feedback, so either get in touch via the issue tracker, on discuss.ocaml.org via the mirageos tag, or just email us.

A huge thank you to all our commercial customers, end users and open source developers who have contributed their time, expertise and financial support to help us achieve our goal of delivering a modern storage stack in the spirit of Git. We look forward to getting Irmin 2.0 into your hands very soon!