What are Data Races? And do They Threaten Your Business?by Isabella Leandersson on Jan 17th, 2024
Imagine you have a brand-new coffee machine. One morning, you traipse excitedly down the stairs only to discover that, alas, your appliance has turned into a potato. The next day, it turns back into a coffee machine.
And so it continues: on some days, it dispenses you perfect, frothy coffee; on other days, it dispenses Coca-Cola; and on some days, it doggedly persists with being a root vegetable. Overall, how satisfied would you be with your new coffee machine? Sure, sometimes the beverage you receive is perfect, but you can never be confident about what you will get.
Software users, just like coffee drinkers, value reliability. Your product could be excellent, but its value quickly diminishes if you can’t guarantee that it will meet your users’ expectations every time. This uncertainty is what data races do to code. Data races are a type of bug that introduces unpredictability into software programs, with potentially dire effects if not eliminated.
The term ‘data race’ describes a problem that can occur in programs that use multiple threads (also known as parallel programming). Multiple threads allow several operations to be performed simultaneously, significantly boosting the performance of programs, applications, and software.
Parallel programming introduces complexity that needs to be skillfully managed to ensure it does not have negative implications. For example, say two operations are happening simultaneously. If two threads try to write to the same memory location in parallel (and there are no rules that manage the event), there is no telling which access might be recorded, if any, or which might be dropped and why.
This is what a data race looks like. Formally, a data race occurs when two conflicting memory accesses are accessing the same piece of memory, are performed concurrently, at least one is a
write, and both are ungoverned by synchronisation or locks.
If code allows data races to occur, it introduces an element of randomness that can make programs act unpredictably. Critically, problems caused by data races are not deterministic – you can have your program crash upon launch one day and start-up seemingly normally on the next. This randomness makes it hard for developers to debug data races and frustrating for the end users who cannot predict the behaviour of the software they’re using.
In languages such as C, C++, Go, and unsafe Rust, this unpredictability causes crashes, memory corruption, and a variety of strange behaviours by the program. Importantly, this is not the case in OCaml.
In practice, data races undermine the trustworthiness of programs and data. Tech-savvy users may call the reliability of a program into question for even allowing data races – this happened in 2020 when the accuracy of Covid-19 simulation software was debated online due to the suspected presence of data races in the code. Even users who are unaware of what data races are notice their effects, such as random crashes and unexpected behaviours, which damage their perception of the program.
Now that you know what data races are and why they are undesirable let me tell you how Tarides manages the risks of data races and helps our partners avoid their pitfalls.
We use the programming language OCaml to power our solutions. There are many reasons why we have chosen to use OCaml, one of the most significant ones being its strong safety guarantees. The language is type- and memory- safe, avoiding the security vulnerabilities that would otherwise occur as a result of memory safety issues. OCaml’s expressive type system also catches many bugs for the developer at compile time, making debugging easier.
Thanks to the rigorous safety measures extant in OCaml, data races are also less destructive than in languages like C or C++. Furthermore, data races in OCaml do not result in crashes, and its memory model guarantees that memory safety is preserved even for programs with data races. These factors already mitigate two of the biggest problems brought about by data races: crashes and memory corruption.
Still, data races can create surprises for the developer, even in OCaml. This is because data races can cause behaviours that cannot be explained by mere interleavings of operations from different threads. Put simply, the code doesn’t operate in a way that makes it easy to reason about its behaviour. Developers who are unfamiliar with the symptoms will struggle when faced with a data race.
OCaml helps developers reason about unpredictable behaviours by imposing limits on the unpredictability. The OCaml Memory Model offers strong guarantees even for programs with data races. For example, data races cannot result in out-of-thin-air values, meaning that the only values that can be observed are those that have been previously written to – so no nasty surprises. OCaml offers what is known as a Local Data Race Freedom Sequential Consistency (LDRF-SC) guarantee. It helps the developer reason about a parallel program, guaranteeing that even if it has data races, the rest of it will follow predictable patterns.
These factors help the developer be more productive as they face fewer headaches. Yet, we still want to give programmers an easy way to find and eliminate data races in their OCaml programs. Even though data races in OCaml are less unpredictable than those in other languages, it is still good to eliminate them whenever possible. To this end, we have brought the data race detector ThreadSanitizer to OCaml.
For a deep dive into this topic, I recommend checking out this paper for more technical background on the OCaml Memory Model.
ThreadSanitizer, or TSan, is an open-source tool that reliably detects data races at runtime. TSan can be enabled, alert the developer of any data races in their code, and then disabled. This allows users to quickly identify and remove data race bugs without introducing overhead.
Tarides is one of the largest maintainers of OCaml libraries and related tools, and our team members worked hard to help bring TSan support to OCaml. With TSan support, users now have a comprehensive and straightforward way to detect and eliminate data races from their code. TSan makes the developer experience in OCaml smoother – less time spent bug-hunting makes for happier developers who can use their time and energy on better (and more fun!) things.
We build our technologies in the programming language OCaml, which brings several benefits to our partners. OCaml has always been known for its high levels of security, which is one of the reasons why the trading firm Jane Street uses the language for its services. The OCaml 5 update introduced Multicore support, drastically increasing the performance potential of programs and applications using OCaml. By bringing TSan to OCaml, we’re helping both the community and our partners benefit from the security and performance of Multicore OCaml with minimal headaches.
If you want to discover how we can leverage OCaml to your advantage, please contact us, and we will be happy to chat. You can also follow us on X (formerly Twitter) and LinkedIn to stay up to date on what we’re up to!