A Tutorial to OCaml -ppx Language Extensions

A brief introduction to the ppx extension mechanism in OCaml, with examples and further pointers - things I wish I'd known while preparing my Masters thesis.

How do you extend a programming language? The hacky, old-school way is to write your own preprocessor that takes programs written in your extended syntax and transforms them to the vanilla programming language (this is how early C++ compilers worked). The slightly less hacky option is to rely on tooling provided by the programming language ecosystem; OCaml is one of the few that provides such functionality out of the box. In this post, I try to complement the existing collective wisdom on how the OCaml ppx language extension mechanism works from a programmer's perspective.

ppx basics

The OCaml grammar has support for extension nodes from the 4.02 release, as described in the OCaml manual. Extension nodes allow the language to be extended in arbitrarily complex ways - they can represent expressions, type expressions, patterns, etc. Let's look at a silly example of an extension node: one which appends a "+ 1" term to any algebraic expression. It may look as follows:

[%addone 1 + 2]

Extension nodes are made up of two parts: an attribute id (addone above) and a payload (the expression 1 + 2). The attribute id identifies which type of extension node it represents so that it can be handled by the appropriate rewriter, while the payload is the body of the expression that needs to be rewritten according to the logic of the language extension. In our case, upon expansion by the rewriter, the term above should read

(1 + 2) + 1

Extension nodes are generic placeholders in the syntax tree, which are rejected by the typechecker, and are intended to be expanded by ppx rewriters. A ppx rewriter is a binary which receives an AST produced by the parser, performs some transformations, and outputs a modified AST.

The OCaml AST

So what does this AST look like? The AST data types produced by parsing are part of the compiler-libs package. You may find the definitions of the types in the Parsetree and Asttypes modules. To examine the AST produced by the parser on a particular expression, you may use the dumpast tool from the ppx_tools package by running:

ocamlfind ppx_tools/dumpast -e "[%addone 1 + 2]"

Which produces the syntax tree fragment below, where data types from the Parsetree/Asttypes modules in OCaml version 4.05 are used. We can intuitively see that the parse tree contains an extension node, with the addone attribute id, and a payload consisting of an expression. This expression is the application of the addition function to two sub-expressions, which are mere constants. Trying to compile this trivial program as-is will result in an error being raised due to an extension node being uninterpreted - this is the job of the ppx rewriter.

{pexp_desc =
  Pexp_extension
   ({txt = "addone"},
    PStr
     [{pstr_desc =
        Pstr_eval
         ({pexp_desc =
            Pexp_apply ({pexp_desc = Pexp_ident {txt = Lident "+"}},
             [(Nolabel,
               {pexp_desc = Pexp_constant (Pconst_integer ("1", None))});
              (Nolabel,
               {pexp_desc = Pexp_constant (Pconst_integer ("2", None))})])},
         ...)}])}

AST Helpers

The ppx rewriter will need to pattern match against an AST fragment such as the one above and perform transformations. These transformations also need to generate valid OCaml AST fragments. Constructing these by hand is extremely cumbersome. To this end, the Ast_helper module provides helper functions for constructing fragments. For constructing the expression 1 + 2, we may use the Exp.apply and Exp.constant helpers as follows:

Exp.apply   (Exp.ident {txt = Lident "+"; loc=(!default_loc)}) 
            [(Nolabel, Exp.constant (Pconst_integer ("1", None)));
             (Nolabel, Exp.constant (Pconst_integer ("2", None)))]                                      ;

AST Mapper

The ppx rewriter for our task will use the Ast_mapper API, which provides a standard interface between the compiler and ppx rewriters. It also provides a default mapper, which is no more than a deep identity mapper - we can therefore only modify the parts of the syntax tree that are interesting to us. With the necessary plumbing, our addone rewriter will look like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
open Ast_mapper
open Ast_helper
open Asttypes
open Parsetree
open Longident

let expr_mapper mapper expr = 
   begin match expr with
      | { pexp_desc =
          Pexp_extension ({ txt = "addone"; loc }, pstr)} ->
        begin match pstr with
        | PStr [{ pstr_desc =
                  Pstr_eval (expression, _)}] -> 
                            Exp.apply  (Exp.ident {txt = Lident "+"; loc=(!default_loc)})
                                        [(Nolabel, expression);
                                         (Nolabel, Exp.constant (Pconst_integer ("1", None)))]
        | _ -> raise (Location.Error (Location.error ~loc "Syntax error"))                       
        end
      (* Delegate to the default mapper. *)
      | x -> default_mapper.expr mapper x;
  end

let addone_mapper argv =
  { 
    default_mapper with
    expr = expr_mapper;
  }
 
let () = register "addone" addone_mapper

Let's examine this fragment in detail. On line 23, we define our custom mapper, which replaces the expr field in the default mapper with our own expr_mapper. This means that our own expr_mapper will only deal with expressions; patterns and other AST types will be left untouched. The definition of the expr_mapper on line 7 matches the expression against an extension node with the identifier addone, other identifiers are not meant to be handled by this mapper. We then pattern match against the expression on line 13, and use the AST helpers to add another function application - + applied to the original expression and a constant 1.

In order to build the rewriter, we can use the standard ocamlbuild tool, specifying the dependency on compiler-libs, where the necessary modules lie:

ocamlbuild -package compiler-libs.common addone_ppx.native

To check that the rewriter does what we want, we may use the rewriter tool from the ppx_tools package. Assuming that the [%addone 1 + 2] OCaml code is in file addone.ml:

ocamlfind ppx_tools/rewriter ./addone_ppx.native addone.ml

Will output (1 + 2) + 1, which is exactly what we wanted. The tool also allows outputting rewritten source code to a file rather than stdout.

What about recursion?

The rewriter given above does not recurse, thus we cannot have nested addone nodes, such as [%addone 1 + [%addone 2]]. Supporting recursion is a key requirement for any meaningful language addition, and precisely what makes extensions powerful. The following will let us apply the +1 addition to the outer nodes of the AST first; notice the recursive invocation of the mapper on line 16. Rewriting the [%addone 1 + [%addone 2]] expression will then give (1 + (2 + 1)) + 1.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
open Ast_mapper
open Ast_helper
open Asttypes
open Parsetree
open Longident

let rec expr_mapper mapper expr = 
   begin match expr with
      | { pexp_desc =
          Pexp_extension ({ txt = "addone"; loc }, pstr)} ->
        begin match pstr with
        | PStr [{ pstr_desc =
                  Pstr_eval (expression, _)}] -> 
                              Exp.apply  (Exp.ident {txt = Lident "+"; loc=(!default_loc)})
                                          [(Nolabel, (expr_mapper mapper expression));
                                           (Nolabel, Exp.constant (Pconst_integer ("1", None)))]
        | _ -> raise (Location.Error (Location.error ~loc "Syntax error in expression mapper"))                       
        end
      (* Delegate to the default mapper. *)
      | x -> default_mapper.expr mapper x;
  end

let addone_mapper argv =
  { 
    default_mapper with
    expr = expr_mapper;
  }
 
let () = register "addone" addone_mapper

Building and packaging

Suppose you are finally happy with your ppx rewriter. The intuitive workflow which follows so far (rewrite file with ppx rewriter, compile rewritten file) is acceptable for toy problems, but infeasible for larger projects. Ideally, we want to be able to write source files in our extended language and not have to worry about invoking the preprocessing step.

This is where packaging comes in: with the rewriter installed as a package (say, addone.ppx), you can simply specify it as a dependency, and the compiler will take care of the intermediary step for you: ocamlfind ocamlc -package addone.ppx -package decml -linkpkg addone.ml.

I personally found the oasis tool the easiest to use in order to build the extension under more complex scenarios. whitequark's tutorial 1 gives a good example configuration; you can then use a tool such as oasis2opam to convert it to the opam format, pin it locally, and even publish it! You may also want to look at more complex projects (such as SLAP) to see example oasis configurations and project layouts in action, since the oasis documentation is brief and incomplete. It's also worth mentioning that the OCaml community seems to be moving towards dune (née jbuilder) as the de-facto build tool, although I thought it was more difficult to use as a complete beginner, especially with ppx rewriters.

Parsetree versions

A caveat when working with ppx rewriters is that AST datatypes differ slightly between versions; thus, an extension written for OCaml version 4.02 is incompatible with an OCaml compiler with version 4.05, for example. The community has come up with an automatic way for converting your extension between different AST versions in the ocaml-migrate-parsetree library. For the examples given in this blog to work, you'll need to be using OCaml version 4.05. If you only intend to support a limited number of versions, it may however be wortwhile to avoid the overhead and convert the rewriter to the appropriate version manually.

{pexp_desc =
  Pexp_apply ({pexp_desc = Pexp_ident {txt = Lident "+"}},
   [(Nolabel, {pexp_desc = Pexp_constant (Pconst_integer ("1", None))});
    (Nolabel, {pexp_desc = Pexp_constant (Pconst_integer ("2", None))})])}

AST for 1 + 2 in OCaml version 4.05.

{pexp_desc =
  Pexp_apply ({pexp_desc = Pexp_ident {txt = Lident "+"}},
   [("", {pexp_desc = Pexp_constant (Const_int 1)});
    ("", {pexp_desc = Pexp_constant (Const_int 2)})])}

AST for 1 + 2 in OCaml version 4.02.

Further resources

  • The ppx_tools libraries provide helpful functionality when writing ppx rewriters. This tutorial mentioned rewriter and dumpast; the authors also make available metaquot, which gives you an easy way to get the OCaml syntax tree for a particular expression inside your rewriter's code. For example, using it, you can obtain the AST for 1 + 2 by writing [%expr 1 + 2] instead of constructing it using the verbose parsetree datatypes. This comes in handy when writing tests.
  • whitequark's tutorial is the original tutorial about ppx rewriters, and was my initial starting point.
  • Shayne Fletcher has written an excellent tutorial, which uses more than one type of mapper (not expressions, but structures and type/constructor declarations). You may find it useful as another use case for rewriters.

EDIT 26/06/2018

There is a discussion about the post on the OCaml reddit here. The OCaml community has helpfully pointed out that the packaging part is outdated. You can check out how to set up your config with dune/jbuilder (the newer build tool) in Rudi Grinberg's tutorial, including a clever way of running tests to compare syntax trees with the diff tool. He provides a starter project you can clone here. Some libraries have been reorganised so you'd have to change imports for it to work out of the box.

Did you like this post? You can follow @VictorDarvariu on Twitter.