Irmin: September 2020 update
This post will survey the latest design decisions and performance improvements
made to irmin-pack
, the Irmin storage backend used by
Tezos. Tezos is an open-source blockchain technology,
written in OCaml, which uses many libraries from the MirageOS ecosystem. For
more context on the design of irmin-pack
and how it is optimised for the Tezos
use-case, you can check out our previous blog post.
This post showcases the improvements to irmin-pack
since its initial
deployment on Tezos:
- Faster read-only store instances
- Improved automatic flushing
- Staging generic serialisation operations
- More control over
Index.merge
- Clearing stores
Faster read-only store instances
The Tezos use-case of Irmin requires both read-only and read-write
store handles, with multiple readers and a single writer all accessing the same
Irmin store concurrently. These store handles are held by different processes
(with disjoint memory spaces) so the instances must use files on disk to
synchronise, ensuring that the readers never miss updates from the writer. The
writer instance automatically flushes its internal buffers to disk at regular
intervals, allowing the readers to regularly pick up replace
calls.
Until recently, each time a reader looked for a value – be it a commit, a node,
or a blob – it first checked if the writer had flushed new contents to disk. This
ensured that the readers always see the latest changes from the writer. However,
if the writer isn't actively modifying the regions being read, the readers make
one unnecessary system call per find
. The higher the rate of reads, the more
time is lost to these synchronisation points. This is particularly problematic
in two use-cases:
-
Taking snapshots of the store. Tezos supports exporting portable snapshots of the store data. Since this operation only reads historic data in the store (traversing backwards from a given block hash), it's never necessary to synchronise with the writer.
-
Bulk writes. It's sometimes necessary for the writer to dump lots of new data to disk at once (for instance, when adding a commit to the history). In these cases, any readers will repeatedly synchronise with the disk even though they don't need to do so until the bulk operation is complete. More on this in the coming months!
To better support these use-cases, we dropped the requirement for readers to
maintain strict consistency with the writer instance. Instead, readers can call
an explicit sync
function only when they need to see the latest concurrent
updates from the writer instance.
In our benchmarks, there is a clear speed-up for find
operations from readers:
[RO] Find in random order with implicit syncs
Total time: 67.276527
Operations per second: 148640.253086
Mbytes per second: 6.378948
Read amplification in syscalls: 3.919739
Read amplification in bytes: 63.943734
[RO] Find in random order with only one call to sync
Total time: 40.817458
Operations per second: 244993.208543
Mbytes per second: 10.513968
Read amplification in syscalls: 0.919588
Read amplification in bytes: 63.258072
Not only it is faster, we can see also that fewer system calls are used in the
Read amplification in syscalls
column. The benchmarks consists of reading
10,000,000 entries of 45 bytes each.
Relevant PRs: irmin #1008, index #175, index #198 and index #203.
Better flushing for the read-write instance
Irmin-pack uses an index to speed up find
calls: a pack
file is used to store pairs of (key, value)
and an index
records the address in pack where a key
is stored. A read-write instance has
to write both the index
and the pack
file, for a read-only instance to find
a value. Moreover, the order in which the data is flushed to disk for the two
files is important: the address for the pair (key, value)
cannot be written
before the pair itself. Otherwise the read-only instance can read an address for
a non existing (key, value)
pair. But both pack
and index
have internal
buffers that accumulate data, in order to reduce the number of system calls, and
both decide arbitrarily when to flush those buffers to disk.
We introduce a flush_callback
argument in index
, which registers a callback
for whenever the index decides to flush. irmin-pack
uses this callback to flush
its pack file, resolving the issue of the dangling address.
Relevant PRs: index #189, index #216, irmin #1051.
Faster serialisation for Irmin.Type
Irmin uses a library of generic operations: functions that take a runtime representation of a type and derive some operation on that type. These are used in many places to automatically derive encoders and decoders for our types, which are then used to move data to and from disk. For instance:
val decode : 'a t -> string -> 'a
(** [decode t] is the binary decoder of values represented by [t]. *)
(** Read an integer from a binary-encoded file. *)
let int_of_file ~path = open_in_bin path |> input_line |> decode Irmin.Type.int32
The generic decode
takes a representation of the type int32
and uses
this to select the right binary decoder. Unfortunately, we pay the cost of this
runtime specialisation every time we call int_of_file
. If we're invoking
the decoder for a particular type very often – such as when serialising store
values – it's more efficient to specialise decode
once:
(** Specialised binary decoder for integers. *)
let decode_int32 = decode Irmin.Type.int32
let int_of_file_fast ~path = open_in_bin path |> input_line |> decode_int32 contents
The question then becomes: how can we change decode
to encourage it to be
used in this more-efficient way? We can add a type wrapper – called staged
–
to prevent the user from passing two arguments to decode
at once:
module Staged : sig
type +'a t
val stage : 'a -> 'a t
val unstage : 'a t -> 'a
end
val decode : 'a t -> (string -> 'a) Staged.t
(** [decode t] needs to be explicitly unstaged before being used. *)
By forcing the user to add a Staged.unstage
type coercion when using this
function, we're encouraging them to hoist such operations out of their
hot-loops:
(** The slow implementation no longer type-checks: *)
let int_of_file ~path = open_in_bin path |> input_line |> decode Irmin.Type.int32
(* Error: This expression has type (string -> 'a) Staged.t
* but an expression was expected of type string -> 'a *)
(* Instead, we know to pull [Staged.t] values out of hot-loops: *)
let decode_int32 = Staged.unstage (decode Irmin.Type.int32)
let int_of_file_fast ~path = open_in_bin path |> input_line |> decode_int32 contents
We made similar changes to the performance-critical generic functions in
Irmin.Type
, and observed significant performance improvements.
We also added benchmarks for serialising various types.
Relevant PRs: irmin #1030 and irmin #1028.
There are other interesting factors at play, such as altering decode
to
increase the efficiency of the specialised decoders; we leave this for a future
blog post.
More control over Index.merge
index regularly does a maintenance operation, called merge
, to ensure fast
look-ups while having a small memory imprint. This operation is concurrent with
the most of other functions, it is however not concurrent with itself: a second
merge needs to wait for a previous one to finish. When writing big chunks of
data very often, merge
operations become blocking. To help measuring and
detecting a blocking merge
, we added in the index
API calls to check whether
a merge is ongoing, and to time it.
We mentioned that merge
is concurrent with most of the other function in
index
. One notable exception was close
, which had to wait for any ongoing
merge
to finish, before closing the index. Now close
interrupts an ongoing
merge, but still leaves the index in a clean state.
Relevant PRs: index #185, irmin #1049 and index #215.
Clearing stores
Another feature we recently added is the possibility to clear
the store. It is
implemented by removing the old files on disk and opening fresh ones. However
in irmin-pack
, the read-only instance has to detect that a clear occurred. To
do this, we add a generation
in the header of the files used by an
irmin-pack
store, which is increased by the clear operation. A generation
change signals to the read-only instance that it needs to close the file and
open it again, to be able to read the latest values.
As the header of the files on disk changed with the addition of the clear
operation, the irmin-pack
stores created previous to this change are no longer
supported. We added a migration function for stores created with the previous
version (version 1) to the new version (version 2) of the store. You can call
this migration function as follows:
let open_store () =
Store.Repo.v config
in
Lwt.catch open_store (function
| Irmin_pack.Unsupported_version `V1 ->
Logs.app (fun l -> l "migrating store to version 2") ;
Store.migrate config ;
Logs.app (fun l -> l "migration ended, opening store") ;
open_store ()
| exn ->
Lwt.fail exn)
Relevant PRs: index #211, irmin #1047, irmin #1070 and irmin #1071.
Conclusion
We hope you've enjoyed this discussion of our recent work. Stay tuned for our next Tezos / MirageOS development update! Thanks to our commercial customers, users and open-source contributors for making this work possible.
Open-Source Development
Tarides champions open-source development. We create and maintain key features of the OCaml language in collaboration with the OCaml community. To learn more about how you can support our open-source work, discover our page on GitHub.
Stay Updated on OCaml and MirageOS!
Subscribe to our mailing list to receive the latest news from Tarides.
By signing up, you agree to receive emails from Tarides. You can unsubscribe at any time.