Feature Parity Series: Compaction is Back!
Compaction is a feature that rearranges OCaml values in memory to free up space, which can then be returned to the operating system. In the OCaml 5.2 release, the technique returns to the OCaml Garbage Collector for the first time since its removal in the 5.0 multicore update.
This is part one of our feature parity series highlighting features returning to OCaml in an effort to restore feature parity with OCaml 4.14. When OCaml gained multicore support (that is, the ability to execute on multiple domains) it had far-reaching implications on the way the runtime worked, and as a result, support for some features were dropped for the 5.0 release. To address these gaps, a significant amount of work has been done behind the scenes to adapt tools and runtime features to work safely and performantly with multiple domains. Tarides is part of the effort to restore these familiar features to OCaml 5.
What is Compaction?
In OCaml 5.0, the major heap in the parallel Garbage Collector employs size-segregated pools attached to each domain. Over time, as OCaml domains allocate and discard heap values, many of these pools end up being only partially used. For example, a program might allocate millions of two-element tuples when initialising but no longer need most of these afterwards. This will result in lots of 'size two' pools, most of which will only be sparsely filled. This is inefficient as OCaml will still consume system memory for all the pools, even if the heap values only take up a tiny proportion of each pool.
Compaction is not new to OCaml; the latest version to include the technique was 4.14. After OCaml adopted a multicore garbage collector, compaction needed to be rewritten to work with the major heap's new structure and to be safe in parallel execution. Compaction for OCaml 5 - as reintroduced in PR #12193 - achieved this by identifying a small number of shared pools in each size class big enough to contain all heap values in that size and then moving all heap values in the remaining pools into the selected pools. This results in many empty pools, the memory of which can be returned to the operating system.
The new algorithm is entirely different from the previous one, so let's look at how it works!
Compaction in 5.2
Allocating Into the Major Heap
Simply put, compaction is a means of rearranging fragmented pieces of memory to larger - compact - chunks. To understand how the compaction introduced in 5.2 works, we must first understand how allocation works in the GC's major heap (you may want to skip this part if you're already familiar with the process). When you first allocate a value in the major heap, it is given a size from a size class table. Each size class, in turn, lists four different ways a pool can be stored: unswept available, unswept full, available, and full.
These pools are divided into blocks, and each pool contains a number of blocks of the same size class. When we allocate a value, an appropriate size class is chosen depending on the size of the value, and the first pool with space available in that class is selected. The header of that pool has a pointer indicating the next available block, and the value is written into that memory block. Each domain is responsible for this process independent of other domains. This is crucial for acceptable performance in parallel programming.
When the GC sweeps the pools, it will free certain blocks and add them to the free list in the header of their pool. After a while, cycles of sweeping and allocation create pools with free ('empty') blocks interspersed among live ('full') blocks. This process results in inefficient memory use and many partially filled pools.
Compacting the Major Heap
To address this inefficiency, the developer can compact the major heap and move the live blocks into an optimised order among the pools. In OCaml 5, this technique follows a specific sequence, which is as follows:
1. Barrier:
Because this is parallel compaction, we must synchronise all the domains before proceeding. Each domain has its own heap, and the heaps are compacted in parallel, with each domain responsible for its own compaction. Synchronisation is achieved with the help of a barrier.
2. Size Class:
The compaction process iterates through each size class, processing one at a time, starting with the smallest. A stats table is allocated for each domain, with a slot available for each pool of the current size class being processed. Since the GC has already swept everything we will be compacting, there are only two states a pool can be in, full or available, where the latter means there is free space available in it.
The process then continues by using the stats table to check whether pools are full or available (meaning they have at least one free block). This process means we don't delve deeply into the memory to read from it, and there is no cache contamination. There is no synchronisation between domains in this step, and the compaction process for a domain only proceeds from here if there is at least one available pool.
3. Using the Stats Table:
By this time, each domain to be compacted will have a stats table with a list of all the available pools. In the next step, the process goes down each pool on the available list and counts the number of live and free blocks. This is done linearly through the pool.
Once the number of live and free blocks is known, the number of live blocks is deducted from the number of free blocks. The resulting number of free blocks lets us calculate which pools can be emptied and which will be retained. This is all the information we need from the stats table, so once this step is completed, the stats table is cleared.
4. Pool Pointers and Live Links:
To summarise, we now know how much live space there is within the pools and how much free space we can liberate if we compact the live blocks together. To achieve compaction, we create pointers to two pools, one to the first pool we are evacuating and one to the first pool we are retaining (for those who are curious, these pointers are named current_pool
and to_pool
respectively).
The process starts with the first pool we know will be evacuated. It finds the first live block within that pool and uses the current_pool
pointer to remove it from the pool and the to_pool
pointer to insert it into one of the pools we know will still be live post-compaction (this information comes from the calculation we did using the stats table).
5. Compaction!:
This is the operative part of compaction: copying all the live blocks from pools that will be evacuated into pools that will remain live after compaction using the two pointers. As this is done, the process writes forward pointers that point from the block where something used to be stored forward to the block where it is now stored post-compaction.
6. Barrier 2:
Again, another barrier syncs all the domains – a crucial part of compaction on multiple domains.
7. Scanning:
This part of compaction is the most expensive in terms of time. The entire OCaml heap has to be scanned for pointers pointing to old block locations (moved as the pool they were in was evacuated), and old pointers must be updated using the forward pointers. Each domain is responsible for updating its data.
This is a deceptively extensive process. For example, even pointers in objects that are too large for the size allocator (so over 128 words) and therefore never moved by compaction may still need to be updated after the compaction process as they may also contain pointers to the old block locations.
8. Barrier 3:
Another barrier synchronises between domains.
10. Freeing Evacuated Pools:
All the evacuated pools are freed and added to the free list.
11. Barrier 4:
Another barrier to synchronise between domains.
12. Release Memory:
One domain, whichever is the first one to get to that point, unmaps the free list. This means that the memory we asked the OS for initially, which currently belongs to the OCaml system, is released at this point and goes back to the OS. This is the end benefit of compaction; it reduces the size of the OCaml system on your machine and returns memory to the OS for use elsewhere.
Final Details & Next Steps
Before we wrap up, let's look at one more detail about how compaction works in 5.2. In 5.2, compaction uses a slab allocator and size classes, whereas OCaml 4 uses a free list. This means that OCaml 5.2 does not provide the option to set an allocation policy like OCaml 4.14. Our testing has found that for most workloads, the chosen allocation policy (using the size classes) performs well. However, expert users can tune the configuration of the size classes in gen_sizeclasses.ml
(necessitating that they build their own OCaml), which they may find useful for their own projects. This is just one example of the challenge that comes with adapting a feature as complex as compaction to be compatible with multiple cores, and the careful weighing of pros and cons it requires on behalf of the developers.
The next steps for OCaml 5 are the expected restoration of MSVC backends, Statmemprof support, and the return of the unloadable runtime coming in releases 5.3 and 5.4. Keep a look out for future posts detailing those features and the efforts put toward bringing them back.
It's great to have compaction restored to OCaml, and it is a testament to the hard work of several teams within Tarides and the wider open-source community surrounding OCaml. We are happy to be part of the team working on this secure and performant programming language.
Share Your Experience!
We want to understand your experience! We're open to suggestions and feedback about the process to help us optimise the feature and deal with any pain points. You can share your thoughts on OCaml's discussion forum or make suggestions directly in the repo.
You can stay in touch with us on X, Mastodon, Threads, and LinkedIn. We look forward to hearing from you!
Tarides champions open-source development. We create and maintain key features of the OCaml language in collaboration with the OCaml community. To learn more about how you can support our open-source work, discover our page on GitHub.
We are always happy to discuss commercial opportunities around OCaml. We provide core services, including training, tailor-made tools, and secure solutions. Contact us today to learn more about how Tarides can help your teams realise