Decompress: The New Decompress APIby Romain Calascibetta on Aug 26th, 2019
RFC 1951 is one of the most used standards. Indeed,
when you launch your Linux kernel, it inflates itself according zlib
standard, a superset of RFC 1951. Being a widely-used standard, we decided to
produce an OCaml implementation. In the process, we learned many lessons about
developing OCaml where we would normally use C. So, we are glad to present
One of the many users of RFC 1951 is Git, which uses it to pack data
objects into a PACK file. At the request of @samoht,
decompress appeared some years ago as a Mirage-compatible replacement for zlib
to be used for compiling a MirageOS unikernel with
ocaml-git. Today, this little project passes a major release with
substantial improvements in several domains.
decompress provides an API for inflating and deflating flows
. The main
goal is to provide a platform-agnostic library: one which may be compiled on
implementations like zstd or lz4, but we can play some
optimisation tricks to help bridge the gap. Additionally, OCaml can protect the
user against lot of bugs via the type-system and the runtime too (e.g. using
array bounds checking).
ocaml-tls was implemented partly in
response to the famous failure of
openssl; a vulnerability
which could not exist in OCaml.
: A flow, in MirageOS land, is an abstraction which wants to receive
and/or transmit something under a standard. So it's usual to say a TLS-flow
The API should be the most difficult part of designing a library - it reveals what we can do and how we should do it. In this way, an API should:
- constrain the user to avoid security issues; too much freedom can be a bad
thing. As an example, consider the
Hashtbl.createfunction, which allows the user to pass
~random:falseto select a fixed hash function. The resulting hashtable suffers deterministic key collisions, which can be exploited by an attacker.
An example of good security-awareness in API design can be seen in digestif, which provided an
unsafe_compareinstead of the common
eqaf.0.5). In this way, it enforced the user to create an alias of it if they want to use a hash in a
Map– however, by this action, they should know that they are not protected against a timing-attack.
- allow some degrees of freedom to fit within many environments; a
constrained API cannot support a hostile context. For example, when compiling
to an ESP32 target, even small details such as the length of a stream
input buffer must be user-definable. When deploying to a server, memory
consumption should be deterministic.
Of course, this is made difficult when too much freedom will enable misuse of the API – an example is dune which wants consciously to limit the user about what they can do with it.
- imply an optimal design of how to use it. Possibilities should serve the user, but these can make the API harder to understand; this is why documentation is important. Your API should tell your users how it wants to be treated.
From our experiences with protocol/format, one design stands out: the dbuenzli API. If you look into some famous libraries in the OCaml eco-system, you probably know uutf, jsonm or xmlm. All of these libraries provide the same API for computing a Unicode/JSON/XML flow – of course, the details are not the same.
From a MirageOS perspective, even if they use the
abstraction rather than a Mirage flow, these libraries
are system-agnostic since they let the user to choose input and output buffers.
Most importantly, they don't use the standard OCaml
Unix module, which cannot
be used in a unikernel.
The APIs are pretty consistent and try to do their best-effort
decoding. The design has a type state which represents the current system
status; the user passes this to
encode to carry out the processing.
Of course, these functions have a side-effect on the state internally, but
this is hidden from the user. One advantage of including states in a design is
that the underlying implementation is very amenable to compiler optimisations (e.g.
tail-call optimisation). Internally, of course, we have a porcelain
implementation where any details can have an rational explanation.
In the beginning,
decompress wanted to follow the same interface without the
mutability (a choice about performances) and it did. Then, the hard test was to
use it in a bigger project; in this case, ocaml-git. An iterative
process was used to determine what was really needed, what we should not provide
(like special cases) and what we should provide to reach an uniform API that is
not too difficult to understand.
From this experience, we finalised the initial
decompress API and it did not
change significantly for 4 versions (2 years).
: best-effort means an user control on the error branch where we don't
leak exception (or more generally, any interrupts)
: porcelain means implicit invariants held in the mind of the programmer
(or the assertions/comments).
decompress keeps the same inflation logic, but drastically changes the
deflator to make the flush operation clearer. For many purposes, people don't
want to hand-craft their compressed flows – they just want
to_string functions. However, in some contexts (like a PNG
encoder/decoder), the user should be able to play with
decompress in detail
(OpenPGP needs this too in RFC 4880).
decompress and zlib use Huffman coding, an algorithm
for building a dictionary of variable-length codewords for a given set of
symbols (in this case, bytes). The most common byte is assigned the shortest bit
sequence; less common bytes are assigned longer codewords. Using this
dictionary, we just translate each byte into its codeword and we should achieve
a good compression ratio. Of course, there are other details, such as the fact
that all Huffman codes are prefix-free. The compression can be
taken further with the LZ77 algorithm.
The zlib format, a superset of the RFC 1951 format, is easy to understand. We will only consider the RFC 1951 format, since zlib adds only minor details (such as checksums). It consists of several blocks: DEFLATE blocks, each with a little header, and the contents. There are 3 kinds of DEFLATE blocks:
- a FLAT block; no compression, just a blit from inputs to the current block.
- a FIXED block; compressed using a pre-computed Huffman code.
- a DYNAMIC block; compressed using a user-specified Huffman code.
The FIXED block uses a Huffman dictionary that is computed when the OCaml runtime is initialised. DYNAMIC blocks use dictionaries specified by the user, and so these must be transmitted alongside the data (after being compressed with another Huffman code!). The inflator decompresses this DYNAMIC dictionary and uses it to do the reverse translation from bit sequences to bytes.
The design of the inflator did not change a lot from the last version of
decompress. Indeed, it's about to take an input, compute it and return an
output like a flow. Of course, the error case can be reached.
So the API is pretty-easy:
val decode : decoder -> [ `Await | `Flush | `End | `Malformed of string ]
As you can see, we have 4 cases: one which expects more inputs (
second which asks to the user to flush internal buffer (
when we reach the end of the flow and the
Malformed case when we encounter an
For each case, the user can do several operations. Of course, about the
case, they can refill the contents with an other inputs buffer with:
val src : decoder -> bigstring -> off:int -> len:int -> unit
This function provides the decoder a new input with
len bytes to read
off in the given
Flush case, the user wants some information like how many bytes are
available in the current output buffer. Then, we should provide an action to
flush this output buffer. In the end, this output buffer should be given by
the user (how many bytes they want to allocate to store outputs flow).
type src = [ `Channel of in_channel | `Manual | `String of string ] val dst_rem : decoder -> int val flush : decoder -> unit val decoder : src -> o:bigstring -> w:bigstring -> decoder
The last function,
decoder, is the most interesting. It lets the user, at the
beginning, choose the context in which they want to inflate inputs. So they
src, where come from inputs flow
o, output buffer
w, window buffer
o will be used to store inflated outputs,
dst_rem will give to us how many
bytes inflator stored in
flush will just set
decoder to be able to
recompute the flow.
w is needed for lz77 compression. However, as we said, we let
the user give us this intermediate buffer. The idea behind that is to let the
user prepare an inflation. For example, in ocaml-git, instead of
w systematically when we want to decompress a Git object, we
w one time per threads and all are able to use it and re-use it.
In this way, we avoid systematic allocations (and allocate only once time) which
can have a serious impact about performances.
The design is pretty close to one idea, a description step by the
function and a real computation loop with the
decode function. The idea is to
prepare the inflation with some information (like
o) before the main
(and the most expensive) computation. Internally we do that too (but it's mostly
about a macro-optimization).
It's the purpose of OCaml in general, be able to have a powerful way to describe something (with constraints). In our case, we are very limited to what we need to describe. But, in others libraries like angstrom, the description step is huge (describe the parser according to the BNF) and then, we use it to the main computation, in the case of angstrom, the parsing (another example is [cmdliner][cmdliner]).
This is why
decoder can be considered as the main function where
be wrapped under a stream.
The deflator is a new (complex) deal. Indeed, behind it we have two concepts:
- the encoder (according to RFC 1951)
- the compressor
For this new version of
decompress, we decide to separate these concepts where
one question leads all: how to put my compression algorithm? (instead to use
The first idea was to provide a functor which expects an implementation of the compression algorithm. However, the indirection of a functor comes with (big) performance cost. Consider the following functor example:
module type S = sig type t val add : t -> t -> t val one : t end module Make (S : S) = struct let succ x = S.add x S.one end include Make (struct type t = int let add a b = a + b let one = 1 end) let f x = succ x
Currently, with OCaml 4.07.1, the
f function will be a
caml_apply2. We might
wish for a simple inlining optimisation, allowing
f to become an
addq instruction (indeed,
flambda does this), but optimizing
functors is hard. As we learned from Pierre Chambart, it is possible
for the OCaml compiler to optimize functors directly, but this requires
respecting several constraints that are difficult to respect in practice.
So, the choice was done to made the encoder which respects RFC 1951 and the compressor under some constraints. However, this is not what zlib did and, by this way, we decided to provide a new design/API which did not follow, in first instance, zlib (or some others implementations like miniz).
To be fair, the choice from zlib and miniz comes from the first point about API and the context where they are used. The main problem is the shared queue between the encoder and the compressor. In C code, it can be hard for the user to deal with it (where they are liable for buffer overflows).
In OCaml and for
decompress, the shared queue can be well-abstracted and API
can ensure assumptions (like bounds checking).
Even if this design is much more complex than before, coverage tests are better
where we can separately test the encoder and the compressor. It breaks down the
initial black-box where compression was intrinsec with encoding – which was
decompress had a bug about generation of
Huffman codes but we never reached it because the (bad)
compressor was not able to produce something (a specific lengh with a specific
distance) to get it.
NOTE: you have just read the main reason for the new version of
The compressor is the most easy part. The goal is to produce from an inputs flow, an outputs flow which is an other (more compacted) representation. This representation consists to:
- A literal, the byte as is
- A copy code with an offset and a length
The last one say to copy length byte(s) from offset. For example,
be compressed as
[ Literal 'a'; Copy (offset:1, len:3) ]. By this way, instead
to have 4 bytes, we have only 2 elements which will be compressed then by an
Huffman coding. This is the main idea of the lz77
However, the compressor should need to deal with the encoder. An easy interface, à la uutf should be:
val compress : state -> [ `Literal of char | `Copy of (int * int) | `End | `Await ]
But as I said, we need to feed a queue instead.
At this point, the purpose of the queue is not clear and not really explained.
The signature above still is a valid and understandable design. Then, we can
Copy directly to the encoder. However, we should
(for performance purpose) use a delaying tactic between the compressor and the
Behind this idea, it's to be able to implement an hot-loop on the encoder which will iter inside the shared queue and transmit/encode contents directly to the outputs buffer.
So, when we make a new
state, we let the user supply their queue:
val state : src -> w:bistring -> q:queue -> state val compress : state -> [ `Flush | `Await | `End ]
Flush case appears when the queue is full. Then, we refind the
buffer which is needed to produce the
Copy code. A copy code is limited
according RFC 1951 where offset can not be upper than the length of the window
(commonly 32ko). length is limited too to
258 (an arbitrary choice).
Of course, about the
Await case, the compressor comes with a
src function as
the inflator. Then, we added some accessors,
compressor does not build the Huffman coding which needs
frequencies, so we need firstly to keep counters about that inside the state and
a way to get them (and pass them to the encoder).
: About that, you should be interesting by the reason of why GNU yes is so
fast where the secret is just about buffering.
Finally, we can talk about the encoder which will take the shared queue filled by the compressor and provide an RFC 1951 compliant output flow.
However, we need to consider a special detail. When we want to make a DYNAMIC block from frequencies and then encode the inputs flow, we can reach a case where the shared queue contains an opcode (a literal or a copy) which does not appear in our dictionary.
In fact, if we want to encode
[ Literal 'a'; Literal 'b' ], we will not try to
make a dictionary which will contains the 256 possibilities of a byte but we
will only make a dictionary from frequencies which contains only
'b'. By this way, we can reach a case where the queue contains an opcode
Literal 'c') which can not be encoded by the pre-determined
Huffman coding – remember, the DYNAMIC block starts with
Another point is about inputs. The encoder expects, of course, contents from the shared queue but it wants from the user the way to encode contents: which block we want to emit. So it has two entries:
- the shared queue
- an user-entry
So for many real tests, we decided to provide this kind of API:
type dst = [ `Channel of out_channel | `Buffer of Buffer.t | `Manual ] val encoder : dst -> q:queue -> encoder val encode : encoder -> [ `Block of block | `Flush | `Await ] -> [ `Ok | `Partial | `Block ] val dst : encoder -> bigstring -> off:int -> len:int -> unit
As expected, we take the shared queue to make a new encoder. Then, we let the
user to specify which kind of block they want to encode by the
Flush operation tries to encode all elements present inside the shared
queue according to the current block and feed the outputs buffer. From it, the
encoder can returns some values:
Okand the encoder encoded all opcode from the shared queue
Partial, the outputs buffer is not enough to encode all opcode, the user should flush it and give to us a new empty buffer with
dst. Then, they must continue with the
Block, the encoder reachs an opcode which can not be encoded with the current block (the current dictionary). Then, the user must continue with a new
The hard part is about the ping-pong game between the user and the encoder
Block expects a
Block response from the user and a
Await response. But this design reveals something higher about zlib
this time: the flush mode.
Firstly, we talk about mode because zlib does not allow the user to
decide what they want to do when we reach a
Block or a
Ok case. So, it
defines some under-specified modes to apply a policy of what
to do in this case.
decompress, we followed the same design and see that it may be not a good
idea where the logic is not very clear and the user wants may be an another
behavior. It was like a black-box with a black-magic.
Because we decided to split encoder and compressor, the idea of the flush mode does not exists anymore where the user explicitly needs to give to the encoder what they want (make a new block? which block? keep frequencies?). So we broke the black-box. But, as we said, it was possible mostly because we can abstract safely the shared queue between the compressor and the encoder.
OCaml is an expressive language and we can really talk about a queue where, in
C, it will be just an other array. As we said, the deal is about performance,
but now, we allow the user the possibility to write their code in this corner-case
which is when they reachs
Block. Behaviors depends only on them.
The biggest challenge of building a library is defining the API - you must strike a compromise between allowing the user the flexibility to express their use-case and constraining the user to avoid API misuse. If at all possible, provide an intuitive API: force the user not to need to think about security issues, memory consumption or performance.
Avoid making your API so expressive that it becomes unusable, but beware that
this sets hard limits on your users: the current
decompress API can be used to
to_string functions, but the opposite is not true - you
definitely cannot build a stream API from
The best advice when designing a library is to keep in mind what you really want and let the other details fall into place gradually. It is very important to work in an iterative loop of repeatedly trying to use your library; only this can highlight bad design, corner-cases and details.
Finally, use and re-use it on your tests (important!) and inside higher-level
projects to give you interesting questions about your design. The last version
decompress was not used in ocaml-git mostly because the flush
mode was unclear.