Summer of Internships: Projects From the OCaml Compiler Team

by Isabella Leandersson on Sep 24th, 2024

We have had the pleasure of hosting several interns in the compiler team this past year. Their projects have tackled varied and challenging tasks touching on different aspects of compiler development, ranging from modularising the observability tool Olly to creating eBPF-based kernel-side performance monitoring, improving polyglot package management, and lifting limitations of the Ortac tool that helps developers test Gospel specifications for OCaml.

Let's take a look at what the interns have been up to over the past six months. Remember, if you want to intern with us, keep an eye on our careers page for upcoming opportunities!

Eutro and Olly

@eutro's internship focussed on making the observability tool Olly more useful for developers by addressing two shortcomings: an incompatibility between Olly and the Runtime Events API that would cause crashes in certain cases, and the number of Olly's dependencies which made it difficult or impossible to use its core functions with unreleased ("trunk") OCaml (or at all on Windows). To resolve these issues, @eutro modularised Olly, implementing a table-based translation of runtime event names and tags.

@eutro modularised Olly by refactoring the binary into smaller libraries, with the awkward dependencies isolated into an optional library. Splitting up the libraries gives users greater control over their dependencies and you can learn more about modularisation in the PR.

The second part of the internship focussed on the table-based translation of runtime event names and tags to allow different versions of OCaml to consume runtime events. The goal was to avoid two bugs that arose when Olly profiles a program compiled with a different version of OCaml, olly trace generating nonsensical names for the slices and olly gc-stats silently generating garbage output. To delve into the details, check out @eutro's PR about runtime event names.

Lastly, @eutro managed to squeeze in some bug-fixes as well! One PR addresses the incorrect use of snprintf_os in formatting the runtime events ring file path and another that fixes some memory bugs in runtime_events_consumer.c. Both of these fixes will be included in the 5.3 OCaml update.

Lee Koon Wen and Eio

The Eio library enables users to write high-performance I/O programs leveraging the new effects system that came with OCaml 5. The I/O library pairs well with the io_uring interface on Linux as a rule; however, the asynchronous nature of io_uring can make it hard for the developer to get a grasp on the performance of their Eio programs when they are bottlenecked by something within the kernel rather than the program itself.

The Linux kernel has a mature and extensive system for gathering data on its performance. In his internship project, Lee Koon Wen's task was to produce a library that used the Linux eBPF probes to gather kernel-side data on an Eio program's io_uring use and deliver them to the program's runtime events. This would let users analyse kernel-side performance data alongside their program's performance using an observability tool like Olly.

Two projects have sprung from this work, uring-trace and ocaml-libbpf. The former is a tracer that, using bindings provided by the latter, can extract events from a Linux kernel. These traces can then be generated in Fuschia format and displayed on Perfetto. This project will benefit developers on the Linux platform, helping them understand and optimise their programs using accurate data.

Ryan and Polyglot Package Management

Using several programming languages in one project lets you take advantage of the particular strengths of each language and of its library ecosystem. For example, Python is well-known for its data science libraries, Rust for its ownership memory model, and OCaml for its type safety. Over the past couple of decades, many large programming language ecosystems (and even some smaller ones) have acquired language-specific package managers, e.g. pip (Python's; 2008), cargo (Rust's; 2015 - although it started with one) and, of course, opam (OCaml's; 2013). Managing these so-called 'polyglot programming' projects, with several languages working together, relies on coordinating these package managers to provide language libraries and toolchains like compilers and build systems. The need to use multiple package managers naturally increases the complexity of these projects. Additionally, dependencies are hard, or impossible, to express across different package managers.

Ryan Gibb's research internship at Tarides focussed on using nix as an initial bridge towards these objectives, extending opam to support the provision of "external dependencies" using Nix, the language-agnostic functional package manager, instead of the OS's own package manager. There has been much work in this area already (e.g. opam-nix and opam2nix), but these have focussed more on being able to take opam packages themselves and install them using nix.

Ryan's work moves in the other direction, allowing Nix packages to be used within the environment set-up by opam, by adding a depext mechanism to opam. Parallel with this work, Ryan also extended a previous investigation with nixpkgs to allow users to be able to specify versions of Nix dependencies. In general, Nix only supports the latest versions, but by analysing nixpkgs repository history we can map the versions of the packages we’re interested in to the ranges in nixpkgs commit history which provide them. Using opam’s solver we can then find the maximum commit of nixpkgs satisfying the version constraints on the packages (as long as a state exists meeting the conditions).

Future work will address current limitations and bring improvements to the workflow for users. For example, opam’s Nix depext mechanism picks up the environment variables from the builder's shell, meaning it must manually specify the environment it wants to extract. It may be possible to access the env attribute of derivations directly as the Nix binary does. Ryan intends to keep working on these limitations as well as future goals, including shepherding Opam's Nix depext support through review and possibly productionising opam-nix-repository for use in opam-repository.

For more information, check out the project's PRs, #5982 and #2.

Nikolaus, Gospel and Ortac

The culture around OCaml values safety and reliability, so it is no surprise that a suite of tools has been developed to ensure these qualities. One such tool, Gospel, is a contract-based behavioural specification language that can provide a logical model for OCaml types and describe the intended behaviour of functions using pre- and post-conditions. Ortac, in turn, is a tool that can generate a QCheck-STM test suite based on the Gospel specification of a library. You can find out more about them in the dedicated post Getting Specific: Announcing the Gospel and Ortac Projects.

The goal of Nikolaus Huber's internship was to lift some of the limitations of Ortac and QCheck-STM to expand its use cases for developers. It's essential to increase the types of tests that users can generate so that more OCaml code can be checked using Gospel. His project has produced three PRs, #235 which centres on allowing tests to run without system under test in the signature, #237 focusses on adding support for tests with tuples in their signature, and #247 aiming to introduce support for testing functions with multiple systems under test as arguments.

In addition to the PRs addressing Ortac limitations, Nikolaus has also fixed several issues regarding Gospel and Ortac. He added a new error for when there are no commands produced during the translation from Gospel to OCaml, fixed a bug within the QCheck-STM that occurred when testing functions that return integers, and addressed another bug where a type check would incorrectly indicate that code was correct. All in all, Nikolaus' project benefits developers who want to test their OCaml programs to ensure they perform predictably and correctly.

Stay in Touch

We want to hear from you! Follow us on X, Mastodon, Threads, and LinkedIn for the latest news from Tarides and to share your thoughts with us. Are you interested in completing an internship project with us? Keep an eye on our careers page, where we announce upcoming internship opportunities. Happy hacking!

Tarides champions open-source development. We create and maintain key features of the OCaml language in collaboration with the OCaml community. To learn more about how you can support our open-source work, discover our page on GitHub.

We are always happy to discuss commercial opportunities around OCaml. We provide core services, including training, tailor-made tools, and secure solutions. Contact us today to learn more about how Tarides can help your teams realise their vision.