May 13, 2019 by Thomas Gazagnaire
On the road to Irmin v2
Over the past few months, we have been heavily engaged in release
engineering the Irmin 2.0 release,
which covers multiple years of work on all of its constituent
elements. We first began Irmin in late 2013 to act as a
Git-like distributed and branchable storage substrate
that would let us escape the perils of POSIX filesystems.
The Irmin libraries provide snapshotting, branching and merging
operations over storage and can communicate via Git both on-disk and
remotely. Irmin today therefore consists of many discrete OCaml
libraries that compose together to form a set of mergeable data structures
that can be used in MirageOS unikernels and normal OCaml daemons such
In this blog post, we wanted to explain some of the release
engineering ongoing, and to highlight some areas where we could use
help from the community to test out pieces (and hopefully find your
own uses in your own infrastructure for it). The overall effort is
tracked in mirage/irmin#658, so
feel free to comment on there as well.
Irmin is parameterised over the exact communication mechanisms it uses
between nodes, both as an on-disk format and also the remoting
protocol. The most important concrete implementation is Git, which
has turned into the world’s most popular version control system. In
order to seamlessly integrate with Irmin, we embarked on an effort to
build a complete re-implementation of
Git from scratch in pure OCaml.
You can read details of the git 2.0 release
on this blog, but from a release engineering perspective we have steadily
been fixing corner cases in this implementation. The development
ocaml-git trees feature fixes to https+git,
for listing remotes, supporting
authenticated URIs and
These fixes are possible because users tried end-to-end usecases that
found these corner cases, so we’d really like to see more. For
example, our friends at Robur have submitted fixes
from their integration of it into their upcoming CalDAV engine.
The Mirage canopy blog engine can now also
push/pull reliably from pure MirageOS unikernels between nodes, which
is a huge step.
If you get a chance to try ocaml-git in your infrastructure, please
let us know how you get along as we prepare a release of the git
libraries with all these fixes (which will be used in Irmin 2.0).
Irmin’s storage layer is also well abstracted, so backends other than
a Unix filesystem or Git are supported. Irmin can run in highly
diverse and OS-free environments, and so we began engineering the
Wodan filesystem as a
domain-specific filesystem designed for MirageOS, Irmin and modern
flash drives. See the OCaml Workshop 2017 abstract on
it for more design
As part of the Irmin 2.0 release, Wodan is also being prepared for a
release, and you can find Irmin 2.0
in the source. If you’d like a standalone block-device based
persistence environment for Irmin, please try this out. This is the
preferred backend for using Irmin storage in a unikernel.
Tezos and irmin-pack
Another big user of Irmin is the Tezos blockchain,
and we have been optimising the persistent space usage of Irmin as their
network grows. Because Tezos doesn’t require full Git format support,
we created a hybrid backend that grabs the best bits of Git (e.g. the
packfile mechanism) and engineered a domain-specific backend tailored
for Tezos usage. Crucially, because of the way Irmin is split into
clean libraries and OCaml modules, we only had to modify a small part
of the codebase and could also re-use elements of the Git 2.0
engineering effort we described above.
The irmin-pack backend is
currently being reviewed and integrated ahead of Irmin 2.0 to provide
a significant improvement in disk usage -- more information to come soon.
There is a corresponding Tezos branch
using the Irmin 2.0 code that will be integrated downstream in Tezos
once we complete the Irmin 2.0 tests.
Irmin-GraphQL and “browser Irmin”
Another new area of huge interest to us is
GraphQL in order to provide frontends a rich
query language for Irmin hosted applications. Irmin 2.0 includes a
builtin GraphQL server so you can manipulate your Git repo via
If you are interested in (for example) compiling elements of Irmin to
makes it significantly easier to support this architecture. We’ve
already seen some exploratory efforts report issues
when doing this, and we’ve had it working ourselves in Irmin 1.0 Cuekeeper
so we are excited by the potential power of applications built using
this model. If you have ideas/questions, please get in touch on the
issue tracker with your
This post is just the precursor to the Irmin 2.0 release, so expect to
hear more about it in the coming weeks and months. This is primarily
a call for help from early adopters interested in helping the project
out. All of our code is liberally licensed open source, and so this
is a good time to tie together end-to-end usecases and help ensure we
don’t make any decisions in Irmin 2.0 that go counter to some product
you’d like to build. That’s only possible with your feedback, so
either get in touch via the issue tracker, on
discuss.ocaml.org via the
or just email us.
A huge thank you to all our commercial customers, end users and open
source developers who have contributed their time, expertise and
financial support to help us achieve our goal of delivering a modern
storage stack in the spirit of Git. We look forward to getting Irmin
2.0 into your hands very soon!