Project-Wide Occurrences: A New Navigation Feature for OCaml 5.2 Users
by Ulysse Gérard on Aug 28th, 2024Project-Wide Occurrences
With the release of merlin-lib
5.1-502
and associated ocaml-lsp-server
, we
brought a new, exciting feature to OCaml's editor tooling: project-wide
occurrences.
The traditional "occurrences" query in Merlin modes, named "Find All References" in LSP-based mode, was used to only return results in the active buffer. This is no longer the case!
Occurrences queries will now return every usage of the selected identifier across all of the project's source files.
There are some limitations that come with this initial release. When queried from an identifier's usage or its definition, all other usages of it are returned, but related declarations are not. In particular, this means that queries should be made from implementation files, not interfaces (
.mli
).
In this post, we will give an overview of the ecosystem's various parts that need to work together for this feature to work.
Try It!
Before diving into technical details, let's see how it works. You can try it on any project that builds with Dune and is compatible with OCaml 5.2.
Update your switch by running opam update && opam upgrade
to get the required tool versions:
- Dune
>= 3.16.0
- Merlin
>= 5.1-502
- OCam-LSP
>= 1.19.0
Since we are looking for all occurrences, we need to build an index for Merlin and LSP. Fortunately, this is well integrated in Dune, and you can build the index for your project by running:
dune build @ocaml-index
This alias ensures that all the artifacts needed by Merlin are built. You can
also add --watch
to always keep the configuration and the indexes up to date
while you edit your source files.
Note that unlike
dune build @check
, the@ocaml-index
will build the entire project, including tests.
Once the index is ready, you can query for project-wide occurrences:
merlin-project-occurrences
in EmacsMerlinOccurrencesProjectWide
in VimFind All References
in LSP-based plugins.
Here is a comparison of a references query before, and after building the index:
Now, let's dive into more technical details.
High-Level Overview
The base design is fairly simple. We want to iterate on every identifier of every source file, determine their definition, and group together those that share the same definition. This forms an index. Tools can then query that index to get the location list of identifiers that share the same definition.
The following section describes how we implemented that workflow:
- Compute definitions using two-step shape reduction
- Driving of the indexer tool by the build system
- Changes to Merlin to properly answer queries
Two-Step Shape Reduction
Finding an identifier's definition in OCaml is a difficult problem, mostly because of its powerful module system. A solution to this problem has been recently described in a presentation at the ML Workshop: shapes. In short, shapes are terms of a simple lambda-calculus that represent an abstraction of the module system. To find an identifier's definition, one can build a shape from its path and reduce (as in beta-reduction) that shape. The result should be a leaf with a UID that uniquely represents the definition.
This has been implemented in the compiler, and Merlin already takes advantage of
it to provide a precise jump-to-definition
feature.
For project-wide occurrences, we perform shape reduction in two steps:
First, at the end of a module's compilation, the compiler iterates on the
Typedtree and locally reduces every identifier's shape. If the
identifier is locally (in the same unit) defined, the result will be a
fully-reduced shape holding the definition's UID. However, if the identifier is
identified in another compilation unit, the result is a partially-reduced
shape, because we cannot load the .cmt
files of other compilation
units (that are required to finish the reduction) without breaking the
separate compilation assumptions. These resulting UIDs or partially-reduced
shapes are stored in the unit's .cmt
file:
type cmt_infos = {
...
cmt_ident_occurrences :
(ident_with_loc * def_uid_or_shape) list
}
Then, an external tool (called ocaml-index
) will read that list and finish
the reduction of the shapes when necessary. This step might load the .cmt
files
of any transitive dependency the unit relies on.
Indexation by the Build System
The tool we just introduced, ocaml-index
, plays two roles:
- It finishes the reduction of the shapes stored in the
.cmt
files. - It aggregates locations that share the same definition's UID.
The result is an index that is written to a file. Additionally, the tool can merge multiple indexes together. This allows build systems to handle indexation in the way they decide.
We only provide rules for Dune right now, but the tools themselves are built system agnostic. The Dune rules are as follow:
For every stanza library
or executable
, we index every .cmt
file and store
the results into an index
file for the stanza.
- This process, similar to linking, depends on every transitive dependency of
the stanza being indexed, since shape reduction might require loading those
cmt
files. - Additionally, if any of the dependencies' indexes have changed, each stanza's index must be rebuilt.
This is a somewhat simple but heavy process, and it could be refined in the future. Right now it performs well enough to provide a usable watch mode in small to fairly large projects (like Dune itself).
Index Configuration and Reading
Last but not least, we need a way for Merlin to know were the index
files are
located and how to read them.
This is done by using a new configuration directive INDEX
. It can be used to
provide one or more index
files to Merlin. Then, querying for all the usages of
the identifier under the cursor is done in the following way:
- Identify the path of the identifier under the cursor
- Reduce the shape corresponding to this path to get the definition's UID.
- Lookup this UID in the
index
files and in the current buffer's index (which is computed by Merlin). - Return all the locations
Future Work
Thank you for reading this post! We hope you will have a lot of fun exploring your codebases using this new feature. We have a lot of exciting improvements on our roadmap, some of which involve returning the declarations linked to an identifier and providing project-wide renaming queries.
If you are interested to learn more about these features or to contribute, please have a look at this tracking issue. You can also have a look at the announcement and wiki page. Finally, feel free to attend future Merlin public meetings and watch the talk at the OCaml Workshop during ICFP!
Tarides is an open-source company first. Our top priorities are to establish and tend to the OCaml community. Similarly, we’re dedicated to the development of the OCaml language and enjoy collaborating with industry partners and individual engineers to continue improving the performance and features of OCaml. We want you to join the OCaml community, test the language and tools, and actively be part of the language’s evolution.
Tarides is always happy to discuss commercial opportunities around OCaml. There are many areas where we can help industrial users to adopt OCaml 5 more quickly, including training, support, custom developments, etc. Please contact us if you are interested in discussing your needs.