Recent and upcoming changes to Merlin

by Thomas Refis on Jan 26th, 2021

Merlin is a language server for the OCaml programming language; that is, a daemon that connects to your favourite text editor and provides the usual services of an IDE: instant feedback on warnings and errors, autocompletion, "type of the code under the cursor", "go to definition", etc. As we (Frédéric Bour, Ulysse Gérard and I) are about to do a new major release, we thought now would be a good time to talk a bit about some of the changes that are going into this release.

Project configuration

Since its very first release, merlin has been getting information about the project being worked on through a .merlin file, which used to be written by the user, but is now often generated by build systems.

This had the advantage of being fairly simple: Merlin would just look in the current directory if such a file existed, otherwise it would look in the parent directories until it found one; and then read it. But there were also some sore points: the granularity of the configuration is the directory not the file, and this information is duplicated from the build system configuration (be it dune, Makefiles, or, back in the days, ocamlbuild).

After years of thinking about it, we've finally decided to make some light changes to this process. Since version 3.4, when it scans the filesystem Merlin is now looking for either a .merlin file or a dune (or dune-project) file. And when it finds one of those, it starts an external process in the directory where that file lives, and asks that process for the configuration of the ml(i) file being edited.

The process in charge of communicating the configuration to Merlin will either be a specific dune subcommand (when a dune file is found), or a dedicated .merlin reader program.

We see several advantages in doing things this way (rather than, for instance, changing the format of .merlin files):

  1. this change is entirely backward compatible, and indeed the transition has already happened silently; although dune is still emitting .merlin files, this will only stop with dune 2.8.
  2. externalizing the reading of .merlin files and simply requiring a "normalized" version of the config (i.e. with no mention of packages, just of flags and paths) allowed us to simplify the internals of Merlin.
  3. talking to the build system directly not only gets us a much finer grained configuration (which is important when you build different executables with different flags in the same directory, or if you apply different ppxes to different files of a library), it opens the door to getting a nicer behavior of Merlin in some circumstances. For instance, the build system can (and does) tell Merlin when the project isn't built. Currently we only report that information to the user when he asks for errors, alongside all the other (mostly rubbish) errors. Which is already helpful in itself. But in the future we can start filtering the other errors to only report those that would remain even after building the project (e.g. parse errors).

There are however some changes to look out for:

  • people who still use .merlin files but do not install Merlin using opam need to make sure to also have the dot-merlin-reader binary in their PATH (it is available as an opam package, but is also buildable from Merlin's git repository)
  • vim and emacs users who could previously load packages interactively (by calling M-x merlin-use or :MerlinUse) cannot do that anymore, since Merlin itself stopped linking with findlib. They'll have to write a .merlin file.

Dropping support for old versions of OCaml

Until now, every release of Merlin has kept support from OCaml 4.02 to the latest version of OCaml available at the time of that release.

We have done this by having one version of "the frontend" (i.e. handling of buffer state, project configuration; analyses like jump-to-definition, prefix-completion, etc.), but several versions of "the backend" (OCaml's ASTs, parser and typechecker), and choosing at build time which one to use. The reason for doing this instead of having, for instance, one branch of Merlin per version of OCaml, is that while the backends are fairly stable once released, Merlin's frontend keeps evolving. Having just one version of it makes it easier to add features and fix bugs (patches don't need to be duplicated), whilst ensuring that Merlin's behavior is consistent across every version of OCaml that we support.

For this to work however, one needs a well defined API between the frontend and all the versions of the backend. This implies mapping every versions of OCaml's internal ASTs (which receive modifications from one version to the next), to a unified one, so as to keep Merlin's various features version agnostic. But it also means being resilient to OCaml's internal API changes. For instance between 4.02 and 4.11 there were big refactorings impacting: the way one accesses the typing environment, the way one accesses the "load path" (the part of the file system the compiler/Merlin is aware of), the way error message are produced, ...

The rate of changes on the compiler is a lot higher than what it was when we first started Merlin (7 years ago now!) which doesn't just mean that we have to spend more and more time on updating the common interface, but also that the interface is getting harder to define. Recently (with the 4.11 release) some of the changes were significant enough that for some parts of the backend we just didn't manage to produce a single interface to access old and new versions, so instead we had to start duplicating and specializing parts of the frontend. And we don't expect things to get much better in the near future.

Furthermore, Merlin's backends are patched to be more resilient to parsing and typing errors in the user's code. Those patches also need to be evolved at each new release of the compiler. The work required to keep the "unified interface" working was taking time away from updating our patches properly, and our support of user errors has slowly been getting worse over the past few years, resulting in less precise type information when asked, incomplete results when asking for auto-completion, etc.

Therefore we have decided to stop dragging older versions of OCaml along. We plan to switch to a system where we have one branch of Merlin per version of OCaml, and each opam release of Merlin will only be buildable with one version of OCaml. We will keep maintaining all the relatively recent branches (that is: 4.02 definitely will not get fixes, but 4.06 is still in the clear). However, all the new features will be developed against the latest version of OCaml and cherry-picked to older branches if, and only if, there are no merge conflicts and they work as expected without changes.

We hope that this will make it easier for us to update to new versions of OCaml (actually, we already know it does, working on adding support for 4.12 was easier than for any of the other recent versions), will allow us to clean up Merlin's codebase (let's call that a work in progress), and will free some time to work on new features.

You might wonder what all this changes for you, as a user, in practice. Well, it depends:

  • if you install Merlin from opam: nothing, or almost nothing. Everything that you currently do with Merlin will keep working. In the future, perhaps some new feature will appear that won't work on all versions. But that day hasn't come yet.
  • if you install Merlin some other way (manually?): you can't just fetch master and build it anymore. You have to pick the appropriate branch for your version of OCaml.
  • if you're reusing Merlin's codebase as part of another project and (even worse) have patches on it: come and talk to us if you haven't done so already! We can try and integrate your patches, so that you only need to worry about vendoring the right version(s) for your needs.

Over the years, Merlin has received bugfixes and improvements from a long list of people, but for the upcoming release Frédéric and I are particularly grateful to Rudi Grinberg, a long time and regular contributor who also maintains the OCaml LSP project, as well as Ulysse Gérard, who joined our team a year ago now. They are in particular the main authors of the work to improve the handling of projects' configuration.

We hope you'll be as excited as us by all these changes!