Benchmarking OCaml projects with current-benchby Gargi Sharma on Aug 26th, 2021
Regular CI systems are optimised for workloads that do not require stable performance over time. This makes them unsuitable for running performance benchmarks.
current-bench provides a predictable environment for performance benchmarks and a UI for analysing results over time. Similar to a CI system, it runs on pull requests and branches which allows performance to be analysed and compared. It can currently be enabled as an app on GitHub repositories with zero configuration. Several public repositories are running
current-bench, including Irmin and Dune. We plan to enable it on more projects in the future.
In this article, we give a technical overview of
current-bench, showing how results are collected and analysed, requirements for using it and how we built the infrastructure for stable benchmarks. We also describe future work that would allow more OCaml projects to run
For performance critical software, we must run benchmarks to ensure that there's no regression. Running benchmarks before the user submits their pull request is tedious, and since every user might have a different machine, you can't be sure if the benchmarks performed actually improved or regressed performance.
current-bench aims to solve this problem by providing a stable benchmarking platform that runs every time the user submits a pull request and compares the result to the benchmarks on the main branch. As
current-bench is zero-configuration, users can enroll their repository to run benchmarks with ease. This
current-bench has helped projects ensure that regression doesn't happen, so you can merge code with more confidence.
As shown in Figure 1 (above), the benchmarking infrastructure uses
ocurrent1, an embedded Domain Specific Language to write a pipeline. The
ocurrent command computes the build incrementally and helps with static analysis. Whenever a pull request is opened on a repository monitored by
POST request is sent to the server running the pipeline. The pipeline fetches the head commit on the pull request and uses Docker to compile the code, and then it runs the
make bench command inside the generated Docker image.
The pipeline runs on a single node, and the process is pinned to a single core to ensure there's no contention of resources when running the benchmarks. Once finished, the raw JSON result is stored in a
Postgres database, which the frontend can query using a
GraphQL API, as shown in Figure 2 below.
The frontend supports historical navigation and provides comparison with the default branch. It allows users to select a pull request of which they want to see the graphs. The graphs display the individual result of the head commit and the comparison with the commits on the default branch. The frontend permits users to select the historical interval when they want to compare benchmarks, and it also shows the standard deviation. Once the benchmarks have run successfully, the pipeline sets the pull request status to the frontend URL. Then the user can look at the graphs.
current-bench uses the hardware optimisations developed for OCaml multicore compiler benchmarks (presented at ICFP OCaml Workshop 2019) with a few modifications to allow the benchmarks to run inside Docker containers. To get stable performance, we configured the kernel to isolate some of the CPU cores. Linux then avoids scheduling other user processes automatically. We also disabled IRQ handling and power saving.
The container that runs the benchmark is pinned to one of the isolated cores. Since I/O operations can make the benchmarks less stable, we use an in-memory
tmpfs partition in
/dev/shm for all storage. For NUMA enabled systems, we configure this partition to be allocated on the NUMA node of the isolated core. The pipeline disables ASLR inside the container automatically, which is normally blocked by the default Docker seccomp profile, so we have modified the profile to allow the
To enroll a repository, you need to ensure the following:
- Enable the ocaml-benchmarks GitHub app for your repository.
- The repository needs a
benchMakefile target. This is triggered from the
- The output of the
make benchtarget is JSON, which can be parsed by the pipeline and displayed by the frontend.
Anyone who wants to roll out a continuous, zero-configured benchmarking infrastructure can set up the current-bench infrastructure. In the future, we want to scale
current-bench by isolating cores on multiple machines and adding a scheduler to ensure that benchmarks use only one core at a time per machine. We plan to add support for different benchmarking libraries that repositories can use—for example, we currently support repositories using
bechamel. We also aim to make the adoption of
current-bench easier by adding a conversion library that can convert any benchmark output into output parseable by
current-bench. We intend to add support for
slow benchmarks, which would allow users to have faster feedback loops on pull requests while ensuring they can still run more extensive, time consuming benchmarks to see the performance.
Thank you for reading! You can check out the implementation for