The OCaml 5.2 Release: Features and Fixes!

by Isabella Leandersson on May 15th, 2024

There has been a new release of OCaml! The 5.2 release brings several new features, along with improvements, optimisations, and bug fixes. New features include compaction, ThreadSanitizer, and restored support for compiling to the POWER architeture on OCaml, plus other crucial changes that prepare the ground for future updates.

This post highlights new and restored features and gives you a good overview of the release. We won’t cover everything, however, so if you’re looking for an exhaustive list I recommend that you read the Changes document on GitHub. Let’s get started!

Compaction

The 5.2 release reintroduces compaction to OCaml 5.*. Compaction is a technique where blocks of memory are reordered to be adjacent to each other, releasing the fragmented free spaces between them in the pools back to the allocator. In 5.2, the compaction process needs to be multicore compatible, so a parallel compactor is added for the shared pools that make up the GC’s major heap.

This work is part of the ongoing effort to achieve feature parity between OCaml 5.* and OCaml 4.14, and providing users with familiar favourites from previous iterations of the language. The bulk of the compaction work can be found in PR #12193, which details how the compaction algorithm works and how the pools of memory are released back to the OS. Sadiq Jaffer and Nick Barnes' impressive efforts, alongside reviews and input from the wider OCaml community, have brought compaction back to OCaml.

There have also been two additional pull requests, #12859 and #12850, which update and fine-tune the commands for compaction to be more accurate and useful. #12850 adds caml_collect_stats_sample_stw to the major heap cycling stop-the-world (STW) meaning that Gc.quick_stat reflects the state of the heap after a major cycle or compaction accurately. #12859 ensures that Gc.compact completes a full major cycle before compacting (in contrast to Gc.quick_compact which only performs a single one).

Look out for a post on compaction coming to our blog soon!

ThreadSanitizer Support

ThreadSanitizer (or TSan) is a tool originally developed by Google that can detect data races that occur during a program's execution. Data races can happen in parallel programs and easily go undetected. Developers can use TSan to monitor their programs, flagging data races so that they can be eliminated before the program is released. Since OCaml 5 brings multicore capabilities to the language, adding support for a reliable way to detect data races has been a top priority.

In the 5.2 release, the big PR #12114 adds TSan support and introduces a new configure-time flag --enable-tsan to enable compilation with TSan instrumentation. When enabled, the OCaml compiler instruments your executables with calls to TSan's runtime, which keeps a record of previous memory accesses (at a cost to performance). Executables instrumented with TSan will report data races without false positives. The original TSan PR added support for Linux on the x86_64 architecture, and since then the community has added support for all actively maintained tier 1 platforms.

PRs #12876, #12809, #12810, #12907, and #12915, all extend the TSan support to further platforms including FreeBSD on x86_64, Linux and macOS on arm64, and Linux on RISC-V, POWER, and s390x. PRs #12681 and #12746 fix false positives and tidy up some annotations, and PR #12802 adds a chapter on TSan to the OCaml reference manual. We applaud the hard work of Olivier Nicole, Fabrice Buoro, and Miod Vallat (based on initial efforts by Anmol Sahoo) to bring TSan to OCaml, with feedback and input from Jacques-Henri Jourdan, Luc Maranget, Sébastien Hinderer, Arthur Wendling, Guillaume Munch-Maccagnoni, and more!

If you would like to learn more about TSan, you can check out our blog post on the tool.

TSan in Action

As part of the work on this update, Gabriel Scherer, Eutro, Olivier Nicole, Fabrice Buoro, and others have been able to use TSan to catch and fix data race bugs in different parts of the OCaml runtime. A direct benefit of TSan support is the number of data race fixes that this update brings to users, which include:

  • Fix for a Race in the Minor GC: PRs #12595 and #12597 describe a race condition occurring when caml_collect_gc_stats_sample makes calls to domain_terminate, and #12597 outlines the fix implemented in 5.2.
  • Data Race Between Marking and Sweeping: PR #12934 fix a reported race between marking and sweeping caught by TSan.
  • Data Race on Global Pools Arrays: PR #12755 addresses races on global_avail_pools and global_full_pools members of the struct pool_freelist in shared_heap.c.
  • Data Races in minor_gc.c: This PR #12737 fixes two races, one in the minor GC occurring when promoting the values that are in the remembered set, and one in caml_natdynlink_open.
  • Data Race fix for #12799: PR #12851 fixes a bug described in issue #12799 where runtime events teardown and event emission could race each other.
  • Data Race When Using the Debug Runtime: PR #12969 resolves a data race involving caml_scan_stack and caml_free_stack.

User Experience

Improving user experience is a high priority and iterative changes are made regularly to make OCaml easier for developers to use, with a special focus on newcomers. Each new release therefore brings quality-of-life improvements alongside the bigger features. In 5.2, examples of these user experience improvements include:

  • Improve Dynlink Error Messages: In PR #12213, Samuel Hym addresses some complexities in the Dynlink library making certain error messages hard to parse. Changes to the way the errors are wrapped means that they are now simplified and easier to understand.
  • New Chapter in the Manual: Olivier Nicole's PR #12840 adds a new chapter on custom events to the OCaml reference manual. Improved documentation is key to help newcomers get the most out of OCaml, and help developers adopt new tools.

POWER Backend Restored

In PR #12276 Xavier Leroy restores native-code support for the POWER/PowerPC backend, specifically for the 64 bit little endian architecture. In a subsequent PR #12667 A. Wilcox extends the support to include 64 bit big endian as well. Leroy's PR #12601 highlights some of the fine-tuning that took place to transition between OCaml 4.* and 5.*, specifically to implement the leaf functions in a way that would work better for the 64 bit architecture.

The IBM POWER processor family, including PowerPC, is used in servers, supercomputers, embedded systems, and even on personal computers. Historically, OCaml has supported the POWER processors, and restoring this support lets users take advantage of post 5.* features on it as well.

Bug Fixes

There are literally dozens and dozens of bug fixes included with this release, and I can’t mention them all here, but let’s take a look at a few:

  • Fixing a Segmentation Fault: In PR #12726 Nicolas Ojeda Bär addresses a segmentation fault that happens when ocamlrun.exe is not found in the PATH.
  • Locking Bugs: In PR #12897 Thomas Leonard identifies a locking bug that affects custom events tracing, where the program runtime_events stops indefinitely without being able to proceed. Gabriel Scherer introduces a fix in the form of a mutex in PR #12900.
  • Threads Crash: The threads library contained a bug that could cause a crash, where users could not update Caml_state–>backtrace_last_exn using direct assignment. Doing so caused the program to crash. In PR #12861 Mark Shinwell identified and fixed the bug.

Preparing for Upcoming Features

Part of the contributions to each release focus on preparing the way for future features. 5.2 lays the foundation for some long-anticipated additions including:

  • Project-Wide Occurrences: Ulysse Gérard's PR #12508 provides the required steps to support project-wide-occurences in OCaml projects, a feature that many Merlin users in particular have been waiting for. Work is ongoing to implement this in Merlin to enhance code navigation and refactoring. This change is possible thanks to OCaml's Shapes feature.
  • Statmemprof: Statmemprof is a well-loved statistical memory profiler that was removed from OCaml before the multicore 5.0 release, due to unanswered questions about how it would perform. Significant efforts have gone into bringing statmemprof back, and 5.2 prepares the way in two PRs. PR #11911 features much of the initial conversation and collaboration to bring the feature back, and Nick Barnes's #12381 PR changes part of the memory profiler’s API to prepare it for multicore. This feature is expected to be included in 5.3 this autumn.
  • MSVC: MSVC is Microsoft's C/C++ compiler and users can compile OCaml 4.14 with MSVC. However, in OCaml 5.0 the runtime uses C features that MSVC doesn't support, making it incompatible with MSVC. Antonin Décimo's PR #12769 unifies MSVC and MinGW-w64 code paths to prepare for full MSVC support for OCaml 5, also expected to be re-introduced in OCaml 5.3 later this autumn.

What’s Next?

Work on OCaml never stops! The following months will bring more bug fixes and updates to OCaml in the lead up to the 5.3 release, where popular features like MSVC support and statmemprof are being reintroduced. It’s great to see contributors from many different backgrounds coming together to work on improving the OCaml language. Anyone can gain more insight into how the language is developed by exploring the public OCaml GitHub repository and the official OCaml Website

Stay in touch with us on X (formerly known as Twitter) and LinkedIn – we would love to hear about your experience with 5.2 and how you are using OCaml!