
Internship Report: Refactoring Tools Coming to Merlin
Refactoring features have contributed to the popularity of editors like IntelliJ, as well as certain programming languages whose editor support offers interactive mechanisms to manage code — Gleam being an excellent example. Even though OCaml has some features related to refactoring (such as renaming occurrences, substituting typed holes with expressions, and case analysis for pattern matching), the goal of my internship was to kickstart work on a robust set of features to enable the smooth integration of multiple complementary refactoring support commands.
As part of my Tarides internship (on the editor side), I specified several useful commands, inspired by competitors and materialised in the form of RFCs, subject to discussion. There were multiple candidates, but we found that expression extraction to toplevel was the most suitable for a first experiment. Since it touched on several parts of the protocol and required tools that could be reused for other features, it was important to design the system with extensibility and modularity in mind.
In this article, I will present the results of this experiment, including the new command and some interesting use cases.
Examples
Expression extraction to toplevel will select the most inclusive expression that fits in your selection and propose to extract it. In this case, extract
means that the selected expression will be moved into its own freshly generated let binding top level.
Extracting Constants
Here is a first example: Let's try to extract a constant. Let’s assume that the float 3.14159 is selected in the following code snippet:
let circle_area radius = 3.14159 *. (radius ** 2.)
(* ^^^^^^^ *)
The extract
action code will then be proposed, and if you apply it, the code will look like this:
let const_name1 = 3.14159
let circle_area radius = const_name1 *. (radius ** 2.)
Here is an illustrated example (based on an experimental branch of ocaml-eglot):
We can see that the expression has been effectively extracted and replaced by a reference to the fresh let binding. We can also observe that in the absence of a specified name, the generated binding will be named with a generic name that is not taken in the destination scope. You also have the ability to supply the name you want for extraction.
For example, here is the same example where the user can enter a name:
But the refactoring capabilities go much further than constant extraction!
Extracting an Expression
In our previous example, we could speculate about the purity of the expression, since we were only extracting a literal value. However, OCaml is an impure language, so extracting an expression into a constant can lead to unintended behavior. For example, let's imagine the following snippet:
let () =
let () =
print_endline "Hello World!";
print_endline "Done"
in ()
In this example, extracting into a constant would cause problems! Indeed, we would be changing the semantics of our program by executing both print statements beforehand. Fortunately, the command analyses the expression as not being a constant and delays its execution using a thunk — a function of type unit -> ...
.
As we can see, our goal was to maximise the production of valid code, as much as possible, by carefully analysing how to perform the extraction. This is all the more challenging in OCaml, which allows for arbitrary (and potentially infinite) nesting of expressions.
Extracting an Expression That Uses Variables
The final point we’ll briefly cover is the most fun. Indeed, it’s possible that the expression we want to extract depends on values defined in the current scope. For example:
let z = 45
let a_complicated_function x y =
let a = 10 in
let b = 11 in
let c = 12 in
a + b + c + (c * x * y) + z
In this example, the extraction of the expression a + b + c (c * x * y) + z
will be placed between z
and a_complicated_function
. As a result, z
will still be accessible; however, x
, y
, a
, b
, and c
will be free variables in the extracted expression. Therefore, we generate a function that takes these free variables as arguments:
Identifying free variables was one of the motivations for starting with this command. We are fairly certain that this is a function that we will need to reuse in many contexts! Note that the command behaves correctly in the presence of objects and modules.
A Real World Example
Let’s try to extract something a little more complicated now. Let’s assume we have the following code and we want to refactor it, for example, by extracting the markup
type pretty print logic outside our pp
function.
type t = markup list
and markup = Text of string | Bold of string
let show doc =
let buf = Buffer.create 101 in
let bold_tag = "**" in
List.iter
(fun markup ->
Buffer.add_string buf
@@
match markup with
| Text txt -> txt
| Bold txt -> bold_tag ^ txt ^ bold_tag)
doc;
Buffer.contents buf
We can observe that bounded variables in the extracted region are now passed as arguments, and the extracted function is properly replaced by a call to the new show_markup generated function.
let show_markup buf bold_tag =
fun markup ->
(Buffer.add_string buf)
(match markup with
| Text txt -> txt
| Bold txt -> bold_tag ^ txt ^ bold_tag)
let show doc =
let buf = Buffer.create 101 in
let bold_tag = "**" in
List.iter (show_markup buf bold_tag) doc;
Buffer.contents buf
Here is an example of how it is used. Impressive, isn't it?
Editor Support
To understand how this new Merlin command can be properly used in your favourite editor, we have to take a closer look at the functioning of the Language Server Protocol. The LSP supports two mechanisms to extend the existing protocol with new features. First, there is code action
, which allows us to perform multiple LSP commands sequentially. This kind of request has the merit of working out of the box without requiring any plugin or specific command support on the editor side (which oils the wheels for maintenance). Secondly, there are custom requests
, which are more powerful than code actions and enable custom interactivity. So, if you want to prompt the user, a custom request is the way to go. The price you have to pay for this power is to have client-side support implemented for each custom request in every editor plugin.
The current editor team approach is as follows: For each of Merlin's commands that don't map directly to a standard LSP request, we provide a code action associated with the Merlin command and potentially a dedicated custom request if the feature requires custom interactivity. Regarding the ‘extract’ feature, the associated code action does not allow us to choose the name of the generated let binding, but the custom request does.
What’s Next?
I hope this new command helps you get even more productive in OCaml! Don’t hesitate to experiment with it and report any bugs you encounter.
The development of Merlin’s refactoring tools was part of a broader vision to improve OCaml editor support and perhaps claim an editor experience similar to JetBrains IDE in the future!
The work done on the extract
command gives us the opportunity to identify various problems pertaining to refactoring (substitution, code generation) and potentially to make the connection to refactoring commands that already exist in Merlin (like open
refactoring and project-wide renaming). The next step is to add a small toolbox library in Merlin dedicated to refactoring in order to develop even more refactor actions. I hope this is just the first refactoring feature of a long series.
If you're curious and want to take a look at the feature, it's split into several PRs:
- ocaml/merlin#1948 which implements the extraction logic on the Merlin side and exposes it in the protocol,
- ocaml/ocaml-lsp#1545 which exposes the Custom Request enabling the use of the LSP-side functionality,
- ocaml/ocaml-lsp#1546 which exposes an Action Code that allows the functionality to be invoked without additional formalities on the Editor side,
- tarides/ocaml-eglot#65 which implements extraction behaviour in OCaml-Eglot, invocable either from a type enclosing or directly as a classic Emacs command.
All of these PRs are currently under review, and should be merged soon!
A big thanks to Xavier, Ulysse, and all the people that helped me during this internship. It was pretty interesting!
Open-Source Development
Tarides champions open-source development. We create and maintain key features of the OCaml language in collaboration with the OCaml community. To learn more about how you can support our open-source work, discover our page on GitHub.
Stay Updated on OCaml and MirageOS!
Subscribe to our mailing list to receive the latest news from Tarides.
By signing up, you agree to receive emails from Tarides. You can unsubscribe at any time.