The New Replaying Benchmark in Irmin

by Nicolas Goguey on Oct 4th, 2021

As mentioned in our Tezos Storage / Irmin Summer 2021 Update on the Tezos Agora forum, the Irmin team's goal has been to improve Irmin's performance in order to speed up the Baking Account migration process in Octez, and we managed to make it 10x faster in the first quarter of 2021. Since then, we've been working on a new benchmark program for Irmin that's based on the interactions between Irmin and Octez. This won't just help make Irmin even faster, it will also help speed up the Tezos blockchain process and enable us to monitor Irmin's behavior in Octez.

Octez is the Tezos node implementation that uses Irmin to store the blockchain state, so Irmin is a core component of Octez that's responsible for virtually all the filesystem operations. Whether a node is launched to produce new blocks (aka “bake”) or just to participate in peer-to-peer sharing of existing blocks, it must first update itself by rebuilding blocks individually until it reaches the head of the blockchain. This first phase is called bootstrapping, and once it reaches the blockchain head, we say it has been bootstrapped. Currently, the bootstrapped phase processes 2 block per minute, which is the rate at which the Tezos blockchain progresses. The next goal is to increase that rate to 12 blocks per minute.

Irmin stores the content of the Tezos blockchain on a disk using the irmin-pack library. There is one-to-one correspondence between the Tezos block and the Irmin commits. Each time Tezos produces a block, Irmin produces a commit, and then the Tezos block hash is computed using the Irmin commit hash. The Irmin developers are working on improving the irmin-pack performance which in turn will improve the performance of Octez.

A benchmark program is considered “fair” when it's representative of how the benchmarked code is used in the real world—for example, the access-patterns to Irmin. A standard database benchmark would first insert random data and then remove it. Such a synthetic benchmark would fail to reproduce the bottlenecks that occur when the insertions and removal are interleaved. Our solution to “fairness” is radical: replaying. Within a sandboxed environment, we replay a real world situation.

Basically, our new benchmark program makes use of a benchmarked code and records statistics for later analysis. The program is stored in the irmin-bench library and makes use of operation traces (called action traces) when Octez runs with Irmin. Later, the program replays the recorded operations one at a time while simultaneously recording tonnes of statistics (called stat traces). Data analysis of the stat traces may reveal many interesting facts about the behaviour of Irmin, especially if we tweak:

  • the configuration of Irmin (e.g., what’s the impact of doubling the size of a certain cache?)
  • the replay parameters (e.g., does Irmin's performance decay over time? Does irmin-pack perform as well after 24 hours of replay as after 1 minute of replay?)
  • the hardware (e.g., does irmin-pack perform well on a Raspberry Pi?)
  • the code of Irmin (e.g., does this PR have an impact on performance?)

This benchmarking process is similar to the record-replay feature available with TezEdge.

Recording the Action Trace

By adding logs to Tezos, we can record the Tezos-Irmin interactions and thus capture the Irmin “view” of Tezos. We’ve recorded action traces during the bootstrapping phase of Tezos nodes, which started from Genesis—the name of the very first Tezos block inserted into an empty Irmin store.

The interaction surface between Irmin and Octez is quite simple, so we were able to reduce it to eight (8) elementary operations:

  • checkout, to pull an Irmin tree from disk;
  • find, mem and mem_tree, read only operations on an Irmin tree;
  • add, remove and copy, write only operations on an Irmin tree;
  • commit, to push an Irmin tree to disk.

It’s important to remember that Irmin behaves much like Git. It has built-in snapshotting and is compatible with Git itself when using the irmin-git library. In fact, these operations are very similar to Git, too.

Sequence of Operations

To illustrate further, here's a concrete example of an operation sequence inside an action trace:

ygWh3cg

This shows Octez’s first interaction with Irmin at the very beginning of the blockchain! The first block, Genesis, is quite small (it ends at operation #5), but the second one is massive (it ends at operation #309273). It contains no transactions because it only sets up the entire structure of the tree. It precedes the beginning of Tezos' initial protocol called “Alpha I”.

Benchmark Benefits

Our benchmark results convey the sheer magnitude of the Tezos blockchain and the role that Irmin plays within it. We’ve recorded a trace that covers the blocks from the beginning the blockchain in June 2018 all the way up to May 2021. It weighs 96GB.

Although it took 34 months for Tezos to reach that state, bootstrapping so far takes only 170 hours, and replaying it takes a mere 37 hours on a section of the blockchain that contains 1,343,486 blocks. On average, this corresponds to 1 per minute when the blocks were created, 132 per minute when bootstrapping, and 611 per minute during replay.

On this particular section of the blockchain, Octez had 1,089,853,521 interactions with Irmin. On average, this corresponds to 12 per second when the blocks were created, 1782 per second during bootstrapping, and 8258 per second during replay.

The chart below demonstrates how many of each Irmin operation occur per block (on average):

4yKd8iQ

This next chart displays where the time is spent during replay:

u5Fv2Zb

With irmin-pack, an OCaml thread managed by the index library is running concurrently to the main thread (i.e., the merge thread), a fraction of the durations (shown above) are actually spent in that thread. Refer to this blog post for more details on index's merges.

The following chart illustrates how memory usage evolves during replay:

F0bORTg

On a logarithmic scale, this last chart shows the evolution of the write amplification, which indicates the amount of rewriting (e.g., at the end of the replay, 20TB of data have been written to disk in order to create a store that weighs 73GB).

PhNqloN

The merge operations of the index library are the source of this poor write amplification. The Irmin team is working hard on improving this metric:

  • on the one hand, the new structured keys feature of the upcoming Irmin 3.0 release will help to reduce the pressure on the index library,
  • on the other hand, we are working on algorithmic improvements of index itself.

Another nice way to use the trace is for testing. When replaying a trace, we can recompute the commit hashes and check that they correspond to the trace hashes, so the benchmark acts as additional tests to ensure we don't compromise the hashes computed in Tezos.

Complex changes to Tezos can be simulated first in Irmin. For example, the path flattening in Tezos feature (merged in August 2021) can now be tested earlier in the process with our benchmark. Prior to the trace benchmarks, we first had to make the changes in Tezos to understand their repercussions on Irmin directly from the Tezos benchmarks.

Lastly, we continue to test alternative libraries and compare them with the ones integrated in Tezos; however, using these alternative libraries to build Tezos nodes has proven to be more complicated than merely adding them in Irmin and running our benchmarks. While testing continues on most new libraries, we can definitely use replays to compare our new cactus library as a replacement for our index library.

Future Directions

While the action trace recording was only made possible on a development branch of Octez, we would next like to upstream the feature to the main branch of Octez, which would give all users the option to record Tezos-Irmin interactions. This would simplify bug reporting overall.

Although the first version only deals with the bootstapping phase of Tezos, an upcoming goal is to make it possible to benchmark the boostrapped phase of Tezos as well. Additionally, we plan to replay the multiprocess aspects of a Tezos node in the near future.

The first stable version of this benchmark has existed in Irmin’s development branch since Q2 2021, and we will release it as part of irmin-bench for Irmin 3.0 in Q4 2021. This release will allow integration into the Sandmark OCaml benchmarking suite.

Follow the Tarides blog for future Irmin updates.