Compiler Hacking in Cambridge is Back!

by Isabella Leandersson on Mar 22nd, 2023

What’s the best way to spend a Friday evening? We think most people would agree that hacking on OCaml is pretty much at the top of that list (although full disclosure, our sample size for this data could be larger).

On Friday the 24th of February, Tarides’s UK office hosted an evening of compiler hacking, presentations, and talks about all things OCaml. We’re continuing a tradition that began in 2013, making this our 19th event, when we (then known as OCaml Labs) were based at the Computer Lab in Cambridge. Just like back then, anyone with an interest in the OCaml compiler is welcome. At our recent event we had a mixture of students, industry professionals, and experts in attendance. If you'd like to create your own compiler hacking sessions, check out the wiki here.

Something that’s changed since 2013 is that OCaml now represents a large chunk of the undergraduate Computer Science tripos at the University of Cambridge; not only as the implementation language for courses such as Compiler Construction & Semantics of Programming Language, but literally as the first language students are taught! This means that we had quite a few undergraduates turn up – it was great to see such an interest in OCaml across different backgrounds.

David's Talk

A Welcome and Introduction to the Compiler

The afternoon began with our very own David Allsopp giving the first of the day’s two talks. He briefly laid the foundation for what Tarides is and what we do, but focussed on introducing OCaml and outlining some examples of things to hack on. Since we had the pleasure of hosting many undergraduate students who were new to the OCaml community, as well as some grizzled veterans (sorry, Jon!), it was important to have a selection of projects for all abilities.

Suggestions included bug fixes (which are always welcomed), documentation edits and improvements (which are always needed), and issues labelled with the tag “good first issue” or “newcomer job.” Compilers that are self-bootstrapped (like OCaml) always require a complex build system, so David concluded with a demonstration of the sequence of build system targets, explaining each step along the way.

Once the introduction was over, the room settled into a hive of activity, with some people furiously typing and others scratching their heads and looking thoughtful. Many of the undergrads focused on getting familiar with the OCaml compiler, whereas more experienced developers began undertaking their own hacking projects. We had invited well-known OCaml compiler hackers (including some of our own) as an awesome resource for all levels of experience. A combination of in-person hacking and an informal setting provided the perfect environment for sharing imaginative new ideas - something we’ve all been missing since the pandemic.

With everyone divided up into smaller working groups, we worked our way around trying to help everyone make some progress. Groups were working on projects at all levels: some were trying to get the compiler to run hello world, whilst others (Patrick!) were forward-porting advanced modal type features between major versions of the compiler. A third year undergraduate was working on debugging the OCaml compiler for her dissertation, and she was attempting to use hash consing to make multiple identical values use the same bit of memory rather than multiple memory slots as a space-saving solution for the compiler.

Ryan hacking on modal types

Local Allocations and Pizza

After a couple of hours of hacking, Stephen Dolan gave us a tour of his and Leo White’s ground-breaking work on stack allocation. This work was presented at ICFP 2022 and is a compiler feature that aims to improve performance by reducing heap allocations in OCaml programs. Local allocations let programs use space on the stack (instead of allocating on the heap), which is automatically reclaimed without requiring the assistance of the (resource-heavy) garbage collector (GC). OCaml uses a stop-the-world parallel garbage collector for collecting recently allocated objects in the minor heap. This means that all the OCaml threads will need to stop when the minor heap is collected. Generating fewer heap allocations means less garbage, and less garbage means improved performance and reduced pause times. This is particularly important for parallel workloads. Local allocations are already being run in production internally at Jane Street, and there are plans to bring the associated benefits to the masses by upstreaming the work to mainline OCaml.

After Stephen’s talk, and a quick but much needed pizza break, everyone went back to hacking. An all-too-common problem that cropped up several times happened when trying to run the freshly-built OCaml compiler from the build tree without first installing it. The error messages in this circumstance are not particularly intuitive, complaining of a "bad interpreter: no such file or directory." The message refers to a bootstrap issue; the program is trying to find the interpreter, but the interpreter hasn’t been installed yet. Some people solved the issue and moved straight onto the next task (a very common thing to do), but one group decided to tackle this head-on by improving the error message to provide more detail. This will help other new OCaml compiler developers and will almost certainly make life easier in our future hack events! This kind of “simple” fix is incredibly important for reducing the barrier to entry for new developers and emphasises the benefits of mixed-experience hack events with newcomers providing feedback and highlighting useful areas of improvement. We hope this work turns into a PR soon!

Another project focussed on ensuring OCaml programs can take advantage of new security features in Linux. There is a relatively new feature of the kernel that allows the user to create a secure temporary file that is isolated from other users. One participant was experimenting with different versions of OCaml and Linux to see how this feature might be used in OCaml. Implementing this in the Unix module is tempting, but as it provides the "lowest common denominator" interface, it has to be compatible with all platforms, and therefore does not cater to a niche function. A better option would be to write a separate library with a separate binding to address the compatibility issues, but that would require a lot of work for one feature. This illustrates the important kinds of questions that form the debate around supporting new, platform-specific features.

The prize for “oldest bug addressed” for the evening went to one of our most junior attendees, a first-year computer scientist who took on a problem first reported in 2005. The almost-20-year-old-issue involves structural comparisons of cyclical data structures and is easily reproduced by pasting “let rec x = 1 :: x in x = x” into a toplevel. A pull request fixing the problem was made during the evening and has generated a lot of interesting discussion!

Until Next Time

We’re thrilled that we could restart these events, and it was lovely to see so many familiar faces alongside all the newcomers. The next hack day is scheduled for March 31st, and we’re excited to see more people working on the compiler.

We’d love to see you at a future event, but even if you can’t come in person, there are loads of ways you can contribute. You can suggest projects and "good first issues," add and improve on documentation, and even set up your own local event! You can check out the wiki here.

We look forward to hanging out with more people around Cambridge who are curious or passionate about OCaml. If you’re interested in joining future events in Cambridge, please email us, we look forward to hearing from you! See you next time!

Hacking Patrick