Our Experience at Tarides: Projects From Our Internships in 2023

by Dipesh Kafle, Shreyas Krishnakumar, Adithya Chandraserry on Sep 15th, 2023

Internships at Tarides

We regularly have the pleasure of hosting internships where we work with engineers from all over the world on a diverse range of projects. By collaborating with people who are relatively new to the OCaml ecosystem, we get to benefit from their perspective. Seeing things with fresh eyes helps with identifying holes in documentation, gaps in workflows, as well as other ways to improve user experience.

In turn, we offer interns the opportunity to work on a project in OCaml in close collaboration with a mentor. This affords participants a great deal of independence, while still having the support and expertise of an experienced engineer at their disposal. During the course of their internship, participants will learn more about OCaml and strengthen their skills in functional programming. They will also have the chance to complete a project with real-world implications, contributing meaningfully to an open-source ecosystem.

Does this sound like something you would like to do? Appplications for our next round of internships open early next year, and you will be able to apply on our website around that time.

Let's check out some reports from this summer's internships, and see what the teams got up to!

Dipesh: Par_incr - A Library for Incremental Computation With Support for Parallelism

Background

I am a final year CS student from NIT Trichy. I had tried to learn Haskell in my second year but didn't really succeed. I enjoy learning about languages and their features, however, so I had learnt some OCaml by the end of my third year but not tried out any fancy features.

I found out about the internship from X (Twitter) in one of KC's tweets, but I knew about Tarides and the good work they do since I had worked with KC in the past. I messaged him to check the rules and ask if recent graduates could apply. He confirmed that they could and encouraged me to apply.

The interview itself was very pleasant; it was as if it was just me talking and discussing things with interviewers (all interviews ever should be like this!). I thought I wouldn't get it but thankfully I did.

Goal of the Project

The goal of my project was to build an incremental library with support for parallelism constructs using OCaml 5.0. Incremental computation is a software feature which attempts to optimise efficiency by only recomputing outputs that depend on changed data. The library we built, Par_incr, takes advantage of the new parallelism features in OCaml 5.0 to create an even more efficent incremental computation library.

Journey

I was somewhat familiar with OCaml so I brushed up on some concepts using the Real World OCaml textbook. OCaml.org also has a lot of resources for learning OCaml aimed at programmers of any level(beginner to advanced). For any non-trivial doubts, I would just ask my amazing mentor (Vesa) or someone else at Tarides (you can always find someone who's an expert in whatever question you have relating to OCaml) for help.

Initially, we wanted to finalise the module signature for the library. Vesa suggested a Monadic interface for the library, and it felt like the right choice.

After that was done, I started on the implementation and got something working. We wanted to check how it fared against existing libraries, so I wrote benchmarks comparing the library to current_incr and incremental.

I remember one particular bug on which I wasted almost 2 full days. I had something like this in the code:

      if not is_same then t.value <- x;
      Reader_list.iter readers Rsp.RNode.mark_dirty

which should've actually been like this:

      if not is_same then (t.value <- x;
      Reader_list.iter readers Rsp.RNode.mark_dirty)

This caused a huge performance hit because it would cause a lot of unnecessary work. You can learn more about the library from here.

Debugging this was quite fun and frustrating. It didn't even occur to me that this part could be the problem, so I was banging my head against the wall thinking I did something wrong somewhere else. I was trying out different things, but thankfully making changes to the code was enjoyable because the typechecker was always there holding my hand.

Overall it was an amazing journey. Getting to work in such an amazing environment here was a blessing for me, and I'm very grateful to have gotten this opportunity. I learnt a lot from Vesa throughout the internship and from many amazing folks at Tarides.

Challenges

The biggest challenge was to make the library performant. Since OCaml is a language with a garbage collector, you have to take special care when allocating things, since allocation isn't cheap. Another difficulty was trying to find things relating to compiler internals, so how certain things get compiled when certain optimisations kick in, etc. This is something that can be improved, but I get that it's quite difficult to keep track of documentation of large open-source compiler codebases that keep changing.

Takeaways and Best Parts

The best part was learning about optimisations, profiling, benchmarking, and improving performance, looking into assembly trying to figure out whether some things got inlined, as well as my discussions with Vesa.

The discussions with Vesa made me want to explore Emacs more, and his advice will definitely help me throughout my career. I'm also much more confident in OCaml and will probably use it whenever possible. I got to learn about all sorts of cool things being done by the Multicore team and other Tarides folks.

Shreyas: Olinkcheck

Background

I'm a final year CS student from NIT Trichy. I had never been exposed to functional programming before, but I had heard cool things about Haskell and OCaml and how Rust features were inspired by these languages. I also followed KC on Twitter from before, when I had been researching internships and professors whose work I found interesting.

When KC tweeted about openings for interns at Tarides, I opened the application doc to read about all the cool projects listed, but I didn't know any functional programming. I still applied anyways, thinking that the worst that could happen is I get rejected, no big deal.

Fast forward to a really fun interview. (No Data Structures and Algorithms? Yay! Easily my favorite interview experience so far.) It was more of a discussion than a question-and-answer.

Goal of the Project

The goal of my project was to create a tool that could be used to check for broken HTTP links, as well as present the broken link information to the user. The tool would then be integrated into OCaml.org through GitHub, to check for broken links on the website. Since OCaml.org is such a large website with lots of content, it is difficult to manually keep up with all the links. However, broken links negatively impact the user experience, and may also make pages on the website less visible to people who would otherwise be able to find the information they need.

Journey

Learning OCaml

I used these resources to learn OCaml:

  • From the book 'Real World OCaml'
  • From ocaml.org/learn
  • By reading others' code
  • Writing something and changing it until the compiler stops complaining
  • UTop
  • Stackoverflow
  • Setting up a developer environment (I was convinced by friends at college that 'real programmers' use Vim / Emacs on Arch Linux)

Categories of Programmers and Categories in Programming

I spent some time going through library code to figure out how to actually use it. I could hack something together to work for Markdown files, and I slowly learned how to write more idiomatic OCaml (thanks to my mentor Cuihtlauac). As an imperative programmer, I was used to giving names to intermediate things, which wasn't really necessary with OCaml.

I learnt a bit about Lwt and came across the term Monad, which is, of course, as is widely known - a monoid in the category of endofunctors. (Thankfully there were much better explanations and documentation online).

Everything was going fine - I was slowly iterating on the code, making it incrementally better and adding more tests, until the first major rewrite. I was using an outdated version of a library! That wasn't too painful, I knew what parsing code looked like already - but the structure of the document was now different. Another library (hyper) had unfixed issues for over a year, so I swapped that out too.

I went back to my old habit of writing imperative OCaml (!) using refs. They have their place, but can be avoided when it's possible. But this was important - it helped me really imbibe the idea that functions are first class, what functional code looks like, and how I can start thinking like a functional programmer. The humble looking List.fold_left was the key to my enlightenment.

Or so I thought. I hadn't met functors yet. It is, after all, just a mapping between two categories. (No, please.) Again, Cuiht really broke it down to a point where I could start understanding what a functor in OCaml is, which eventually led me to discover the power of the OCaml module system.

Seeing it Work

After some "hacky" fixes and regular expression magic (resulting from a lot of discussions with Sabine, because I thought I hit a fundamental roadblock here and thought it might be very hard to do the project (!)), I could get it to run as a GitHub CI action, which lead to an automated pull request. I could also integrate it into Voodoo, the package documentation generator, and it is now being tested in the staging pipeline.

I've Had it All Wrong From the Beginning

By this time I had read a lot of other people's code and learnt enough from Cuiht to realise, yet again, that my code was bad. The functional programmer doesn't rely on the name of the function (what does the function v do? Or pp?). The meaning is taken from the context and the signature. So I had functions that looked like

val do_this_thing : a -> b -> c -> d -> ...

with no clue as to what those arguments mean. Someone reading the code would be forced to look into the source code to understand what that means. Now my target was to have a decent looking interface when someone said #show Olinkcheck;; on utop. That's how I used other libraries, so I wanted others to be able to use mine like that too.

Biggest Challenge

My project was a practical problem, as opposed to a theoretical one like a data structure. So the challenges were also practical. Not everyone follows the same formatting while writing text-based files (let's first agree on tabs vs spaces?), and not all parsers are perfect. In the ideal world I could manipulate a syntax tree data structure which turns back into a string with the original formatting, webservers wouldn't care how many links I request from them, and there would be well defined regular expressions to find URLs amongst other text, but alas, no. None of these things are true. Text based data is convenient because of the loose requirements. Webservers can't realistically be fine with a user asking it for 7000+ links in a short time.

The Best Part

The best part for me was easily the opportunity to learn from people who are much more experienced than I am and to see something written by me be actually used in the real world.

Adithya: Domain-Safe Data Structures for Multicore OCaml

Background

I am a final year CS student at NITK Surathkal. Before this internship, I had only done a little bit of functional programming in Scala, so programming in OCaml was something very new to me. However, I was pretty excited to work on this because OCaml had only recently got Multicore support, and it was a niche area to explore.

I got to know about the internship from one of KC's tweets, and how I got to know about KC and the work he does is a pretty random incident where I needed his help to contact another professor to discuss some of my previous research internship work in a related area.

The interview experience was amongst the best ones I've had, very open ended discussions and friendly interviewers.

Goal of the Project

I was a part of the Multicore applications team and was mentored by Carine. The goal of my project was to add lock-based data structures to the Saturn library that maintains parallelism-safe data structures for Multicore OCaml.

The first step was to create a bounded queue, which is based on a Michael Scott queue. This type of queue has two locks, one for the head and one for the tail node. I also investigated fine-grained versus coarse-grained lists, double-linked lists, and finally a lock-free priority queue which was implemented on top of a lock-free skiplist.

Towards the later part of the internship, I also worked on lock-free data structures.

Journey

Initially, I started off slow since I was just getting familiar with the OCaml environment and language features. My main 2 resources to learn Ocaml was Real World OCaml and OCaml.org. Other than this, I spent a significant amount of time going through the book called The Art of Multiprocessor Programming, since that was the main reference point for my project. I also had to dive into some research papers cited in the book to get a better understanding of the implementation and some nitty-gritty details.

Over the course of the internship, I gained a lot of insights about minor details while programming for multicore systems, as well as OCaml language features that can have a significant impact on performance. Something that never struck me before was how much worse using structural equality (=) instead of physical equality (==) could be depending on the scenario.

Since I was interning on-site at the Paris office, it was very easy for me to clarify any doubts or difficulties I faced whenever required, as most people at Tarides have a very high level of expertise in OCaml and are really helpful. I often had to rewrite many functions or make major changes, but thanks to OCaml features such as static checking and type inference, it was pretty easy and relatively quick to make those modifications.

Challenges

The biggest challenge was debugging and reasoning about performance of one implementation over the other. Since I was writing parallel programs, debugging was difficult because of the many edge case scenarios that are hard to detect and can lead to deadlocks or errors in output. I remember spending an entire day sometimes finding the bug, but in the end it was really satisfying to fix it. Comparing different implementations and trying to find if any possible optimisations can be done was quite interesting and challenging.

The Best Part

Compared to my previous internships, Tarides was a unique experience since it is a pretty small company with a great culture working on some niche areas. There aren't many other places doing this kind of work. So if someone is interested in computer systems and programming languages, I would definitely recommend them to intern here. Getting the opportunity to work from the Paris office and visit Europe was definitely an unexpected yet pleasant surprise.

Want to Strengthen Your OCaml Skills?

If you're looking to learn more about functional programming in a supportive environment, you sound like an excellent candidate for our next round of internships! The next round is coming up early next year and we would be delighted if you would apply! Keep an eye on our website for more information or contact us here.