Florence and beyond: the future of Tezos storage

by Craig Ferguson on Mar 4th, 2021

In collaboration with Nomadic Labs, Marigold and DaiLambda, we're happy to announce the completion of the next Tezos protocol proposal: Florence.

Tezos is an open-source decentralised blockchain network providing a platform for smart contracts and digital assets. A crucial feature of Tezos is self-amendment: the network protocol can be upgraded dynamically by the network participants themselves. This amendment process is initiated when a participant makes a proposal, which is then subject to a vote. After several years working on the Tezos storage stack, this is our first contribution to a proposal; we hope that it will be the first of many!

As detailed in today's announcement from Nomadic Labs, the Florence proposal contains several important changes, from the introduction of Baking Accounts to major quality-of-life improvements for smart contract developers. Of all of these changes, we're especially excited about the introduction of sub-trees to the blockchain context API. In this post, we'll give a brief tour of what these sub-trees will bring for the future of Tezos. But first, what are they?

Merkle sub-trees

The Tezos protocol runs on top of a versioned tree called the “context”, which holds the chain state (balances, contracts etc.). Ever since the pre-Alpha era, the Tezos context has been implemented using Irmin – an open-source Merkle tree database originally written for use by MirageOS unikernels.

For MirageOS, Irmin’s key strength is flexibility: it can run over arbitrary backends. This is a perfect fit for Tezos, which must be agile and widely-deployable. Indeed, the Tezos shell has already leveraged this agility many times, all the way from initial prototypes using a Git backend to the optimised irmin-pack implementation used today.

But Irmin can do more than just swapping backends! It also allows users to manipulate the underlying Merkle tree structure of the store with a high-level API. This “Tree” API enables lots of interesting use-cases of Irmin, from mergeable data types (MRDTs) to zero-knowledge proofs. Tezos doesn't use these more powerful features directly yet; that’s where Merkle proofs come in!

Proofs and lightweight Tezos clients

Since the Tezos context keeps track of the current "state" of the blockchain, each participant needs their own copy of the tree to run transactions against. This context can grow to be very large, so it's important that it be stored as compactly as possible: this goal shaped the design of irmin-pack, our latest Irmin backend.

However, it's possible to reduce the storage requirements even further via the magic of Merkle trees: individuals only need to store a fragment of the root tree, provided they can demonstrate that this fragment is valid by sending “proofs” of its membership to the other participants.

This property can be used to support ultra-lightweight Tezos clients, a feature currently being developed by TweagIO. To make this a reality, the Tezos protocol needs fine-grained access to context sub-trees in order build Merkle proofs out of them. Fortunately, Irmin already supports this! We extended the protocol to understand sub-trees, lifting the power of Merkle trees to the user.

We’re excited to work with TweagIO and Nomadic Labs on lowering the barriers to entering the Tezos ecosystem and look forward to seeing what they achieve with sub-trees!

Efficient Merkle proof representations

Simply exposing sub-trees in the Tezos context API isn’t quite enough: lightweight clients will also need to serialize them efficiently, since proofs must be exchanged over the network to establish trust between collaborating nodes. Enter Plebeia.

Plebeia is an alternative Tezos storage layer – developed by DaiLambda – with strengths that complement those of Irmin. In particular, Plebeia is capable of generating very compact Merkle proofs. This is partly due to its specialized store structure, and partly due to clever optimizations such as path compression and inlining.

We’re working with the DaiLambda team to unite the strengths of Irmin and Plebeia, which will bring built-in Merkle proof support to the Tezos storage stack. The future is bright for Merkle proofs in Tezos!

Baking account migrations

Trees don’t just enable new features; they have a big impact on performance too! Currently, indexing into the context always happens from its root, which duplicates effort when accessing adjacent values deep in the tree. Fortunately, the new sub-trees provide a natural representation for “cursors” into the context, allowing the protocol to optimize its interactions with the storage layer.

To take just one example, DaiLambda recently exploited this feature to reduce the migration time necessary to introduce Baking Accounts to the network by a factor of 15! We’ll be teaming up with Nomadic Labs and DaiLambda to ensure that Tezos extracts every bit of performance from its storage.

It's especially exciting to have access to lightning-fast storage migrations, since this enables Tezos to evolve rapidly even as the ecosystem expands.

Storage in other languages

Of course, Tezos isn’t just an OCaml project: the storage layer also has a performant Rust implementation as part of TezEdge. We’re working with Simple Staking to bring Irmin to the Rust community via an FFI toolchain, enabling closer alignment between the different Tezos shell implementations.

Conclusion

All in all, it’s an exciting time to work on Tezos storage, with many open-source collaborators from around the world. We’re especially happy to see Tezos taking greater advantage of Irmin’s features, which will strengthen both projects and help them grow together.

If all of this sounds interesting, you can play with it yourself using the recently-released Irmin 2.5.0. Thanks for reading, and stay tuned for future Tezos development updates!