OCaml Matrix: A Virtual World

by Irina Mariuca Asavoea on Jun 9th, 2022

Introduction

One of Tarides' projects is to create an open and secure infrastructure for communication protocols, initially focusing on emails and Matrix. This will allow organisations to self-host their messaging services, using either personal cloud resources or low-cost embedded devices. Individuals and organisations can use this framework to avoid having their emails and messages read and managed by third parties.

Every component of our system is carefully designed as independent libraries, using modern development techniques to avoid the common reported threats and flaws. For instance, the protocols' implementation is written in a type-safe language and tested with state-of-the-art, coverage-driven tests, such as fuzzing. Then it's deployed as unikernels for enhanced security, model quality, and library portability. The combination of these techniques will increase users’ trust to migrate their personal data to these new secure services.

The Matrix

When hearing the word Matrix, people invariably think about Neo and his ability to see the code behind his virtual world. In lieu of the cultural connection to the popular film series, the Matrix Communication Standard creators respond to the implicit assumption regarding their choice of name: “We are called Matrix because we provide a structure in which all communication can be matrixed together.”

Communication is essential to our society to both create and maintain relationships, whether personal or professional. As we progress further into this age of information, people communicate and stay connected through online and text-based communication. Gone are the days when someone would pick up a phone to call a friend or family member. Now, most people send a text message or email as the default. Thus, online communication has become the norm in our current society. Inevitably, this online communication is vulnerable to malicious actors trying to invade our privacy and hijack our correspondence. Tarides has addressed this issue and aims to host community discussions about open-source projects.

Matrix is an established protocol for human-to-human and human-to-machine communications, including instant messaging. OCaml Matrix is an OCaml implementation of the Matrix protocol. This provides a secure communication layer which is based on MirageOS’s unikernel technology in order to reduce the attack surface. It uses Irmin as storage for the communication content to ensure integrity, and we have integrated it into the CI system for all OCaml projects.

Let's take a closer look at the ocaml-matrix component and explore some details about the Matrix Communication Standard to see if it’s indeed communication matrixed together or if it’s comparable to Neo’s Matrix with people plugged into a virtual world.

Matrix Beginnings (History)

Matrix is an open standard for interoperable, decentralised, real-time communication over the Internet, created in 2014 inside Amdocs, a company specialised in software and services for communications. Matrix provides fully decentralised and federated architecture, so they don’t store users’ information in a centralised location. This means when people join one of the Matrix virtual rooms to send messages, video chat, or share files, their exchanges are truly private, especially with Matrix’s end-to-end encryption. Matrix’s decentralised, federated architecture ensures communication integrity and availability in every room.

Matrix is openly specified and implemented with the open-source reference implementation server Synapse and client Element, previously Riot, which already have several, astute security features and allow end-to-end encryption. Starting in 2018, the French Government deployed a private federation of Matrix home servers and Tchap, an open-source client forked from Riot. The French National Cybersecurity Agency (ANSSI) jointly works with the Interdepartmental Digital Directorate (DINUM) on a cybersecurity audit of Tchap. Matrix’s interesting security features include end-to-end capable search and enables private rooms’ end-to-end encryption by default.

Matrix Reloaded (Architecture)

Users interact by sending and receiving events in Matrix rooms. Each Matrix user registers a homeserver that is identified by a unique ID, like “Neo:tarides.com.” The registration goes through a client application that connects to a Matrix homeserver via the client-server API. This allows users to perform actions such as sending messages, controlling rooms, or synchronising their conversation history. All communication in a Matrix room replicates across the room participants’ homeservers, so every homeserver connected to a room stores the content of the room’s history.

Basically, the user communicates to a home server via a client application. Once the user decides to join a room, the client sends this request to the homeserver, and it’s the homeserver’s responsibility to connect the user to the room, to store the history of the messages of that room, and to send the messages back to the user. The homeserver gets all this information by talking with the other users’ homeservers in that room. This way, if a homeserver goes down, the conversation can continue as the remaining homeservers are still exchanging messages. When a homeserver comes back online, it resynchronises the messages. It receives old ones from other homeservers and inserts its own into others’ timelines.

Matrix Architecture

Matrix Architecture Image Description: Matrix users communicate via Matrix clients, which can be web client, a mobile client, desktop clients, or embedded clients built into existing apps like Slack via Matrix bridges. It could even be a piece of hardware (e.g., a drone) that is Matrix enabled. A user's client connects via an unique ID to a single homeserver, which stores the communication history and account information for that user. It also shares data with the wider Matrix federation by synchronising communication history with other homeservers. The conversations among users take place in rooms that have their contents replicated across all of the homeservers associated with the users present in a room.

The centralised communication architectures keep the data within their own systems, which induces a series of security issues. For example, usually the centralised systems offer very little transparency regarding their implementations. This means that, for the claimed purpose of security, the centralised system could either hide backdoors or have security flaws that pose serious issues to privacy. By contrast, an open-source system promotes transparent development, which provides assurance regarding the liability of the implementation by allowing ad-hoc code audits. Moreover, the decentralised architecture empowers users to host their own conversations rather than all their data being stored by the service provider. This renders less incentives for attacks targeting massive data leaks and, in combination with the confidentiality ensured by the end-to-end encryption, induces an increased level of security while promoting ownership and data sovereignty.

Matrix Revolutions (in OCaml)

Matrix’s Hall of Fame shows several ethical researchers’ investigative work into Matrix’s security vulnerabilities. For example, a recently discovered buffer overflow produces a considerable information disclosure in other Matrix implementations, such as Element. At Tarides, we mitigate a consistent class of these vulnerabilities with the OCaml development environment, which provides secure-by-design guarantees for the OCaml Matrix project.

Matrix Servers

OCaml Matrix Architecture: The OCaml CI Client is a bot that communicates with Matrix servers via the TLS protocol, such as the ocaml-matrix server. The ocaml-matrix server is the unikernel that ensures the communication with other Matrix servers from the federation to synchronise upon events in the Matrix rooms. For this purpose, OCaml Matrix exchanges DNS information with a unikernel that plays the role of a Primary DNS Server and connects with an Irmin storage unit to save the rooms’ states.

Our ocaml-matrix server manages its own clients, who create public rooms for events and messaging. It also handles foreign servers; their users can ask to join these public rooms. This server interacts with other servers and manages their users requests for registration and event updates in public rooms via the server-to-server communication API. Our OCaml implementation follows the Matrix specification standard. From this, we extract the parts describing the subset of Matrix components that we choose to implement for our OCaml Matrix MVP (Minimum Viable Product). However, the MVP applies its constraints while taking into account that other servers would not be aware of them by using errors/rights restrictions provided by the Matrix standard.

We also implemented an OCaml-CI client that communicates with the Matrix servers via the client-server API. This client implements a subset of the actions defined in the specification and is meant to be used as a bot only (and would therefore not need to drift apart from this subset). The OCaml-CI client was specifically designed to allow an easy implementation for our OCaml server, but it is totally compatible with other Matrix homeservers. We tested the integration of the OCaml-CI client with both Synapse and our ocaml-matrix server, and we used it for testing throughout the ocaml-matrix server implementation.

For now, we’ve only given the OCaml Matrix access to public rooms because they don’t require the end-to-end encryption protocol. Nevertheless, we define support for encrypted communication via the Key module, and we note that most of the encryption algorithms used by the end-to-end encryption protocol are available in MirageOS unikernels via the mirage-crypto library.

We deployed the ocaml-matrix server as an end-to-end application and converted it into the unikernel format. The process of unikernel deployment enables theocaml-matrix unikernel’s compatibility to run on various platforms in isolation, increasing the security level of the Matrix server. The unikernel format of the Matrix server is completed for Unix and in the final stages for the platforms ported by Solo5. It is noteworthy to say that throughout the stage of ocaml-matrix unikernel deployment, we’ve had our share of dream-ing. Going through this experience was a game changer.

Matrix Resurrections (Future Work)

Although we’re thrilled about the progress thus far, there is still much work to do. We plan to revive the OCaml Matrix to improve or add certain features. First, we will add user access to private rooms with end-to-end encryption and more authentication methods that follow Matrix specifications and GDPR recommendations. We will also adopt a methodology for testing and benchmarking for both the ocaml-matrix client and server, integrate the ocaml-matrix codebase into OCaml Multicore, create other ocaml-matrix unikernel deployments, and evaluate the security model provided in the Matrix specifications. Finally, we’ll update and complete the implementation according to the latest Matrix specifications.

Conclusions

Having said all of the above, we invite you to decide whether the Matrix name comes from the provided federated structure in which all communication can be matrixed together or from the idea that it's creating a virtual world that is sustained by the users plugged into it. Do you want to know the truth behind the Matrix? It’s up to you. Will you choose the blue pill or the red pill?