17 January 2019

proc-macro-rules

I'm announcing a new library for procedural macro authors: proc-macro-rules (and on crates.io). It allows you to do macro_rules-like pattern matching inside a procedural macro. The goal is to smooth the transition from declarative to procedural macros (this works pretty well when used with the quote crate).

(This is part of my Christmas yak mega-shave. That might someday get a blog post of its own, but I only managed to shave about 1/3 of my yaks, so it might take till next Christmas).

Here's an example,

rules!(tokens => {
    ($finish:ident ($($found:ident)*) # [ $($inner:tt)* ] $($rest:tt)*) => {
        for f in found {
            do_something(finish, f, inner, rest[0]);
        }
    }
    (foo $($bar:expr)?) => {
        match bar {
            Some(e) => foo_with_expr(e),
            None => foo_no_expr(),
        }
    }
});

The example is kind of nonsense. The interesting thing is that the syntax is very similar to macro_rules macros. The patterns which are matched are exactly the same as in macro_rules (modulo bugs, of course). Metavariables in the pattern (e.g., $finish or $found in the first arm) are bound to fresh variables in the arm's body (e.g., finish and found). The types reflect the type of the metavariable (for example, $finish has type syn::Ident). Because $found occurs inside a $(...)*, it is matched multiple times and so has type Vec<syn::Ident>.

The syntax is:

rules!( $tokens:expr => { $($arm)* })

where $tokens evaluates to a TokenStream and the syntax of an $arm is given by

($pattern) => { $body }

or

($pattern) => $body,

where $pattern is a valid macro_rules pattern (which is not yet verified by the library, but should be) and $body is Rust code (i.e., an expression or block.

The intent of this library is to make it easier to write the 'frontend' of a procedural macro, i.e., to make parsing the input a bit easier. In particular to make it easy to convert a macro_rules macro to a procedural macro and replace a small part with some procedural code, without having to roll off the 'procedural cliff' and rewrite the whole macro.

As an example of converting macros, here is a declarative macro which is sort-of like the vec macro (example usage: let v = vec![a, b, c]):

macro_rules! vec {
    () => {
        Vec::new()
    };
    ( $( $x:expr ),+ ) => {
        {
            let mut temp_vec = Vec::new();
            $(
                temp_vec.push($x);
            )*
            temp_vec
        }
    };
}

Converting to a procedural macro becomes a mechanical conversion:

use quote::quote;
use proc_macro::TokenStream;
use proc_macro_rules::rules;

#[proc_macro]
pub fn vec(input: TokenStream) -> TokenStream {
    rules!(input.into() => {
        () => { quote! {
            Vec::new()
        }}
        ( $( $x:expr ),+ ) => { quote! {
            let mut temp_vec = Vec::new();
            #(
                temp_vec.push(#x);
            )*
            temp_vec
        }}
    }).into()
}

Note that we are using the quote crate to write the bodies of the match arms. That crate allows writing the output of a procedural macro in a similar way to a declarative macro by using quasi-quoting.

How it works

I'm going to dive in a little bit to the implementation because I think it is interesting. You don't need to know this to use proc-macro-rules, and if you only want to do that, then you can stop reading now.

rules is a procedural macro, using syn for parsing, and quote for code generation. The high-level flow is that we parse all code passed to the macro into an AST, then handle each rule in turn (generating a big if/else). For each rule, we make a pass over the rule to collect variables and compute their types, then lower the AST to a 'builder' AST (which duplicates some work at the moment), then emit code for the rule. That generated code includes Matches and MatchesBuilder structs to collect and store bindings for metavariables. We also generate code which uses syn to parse the supplied tokenstream into the Matches struct by pattern-matching the input.

The pattern matching is a little bit interesting: because we are generating code (rather than interpreting the pattern) the implementation is very different from macro_rules. We generate a DFA, but the pattern is not reified in a data structure but in the generated code. We only execute the matching code once, so we must be at the same point in the pattern for all potential matches, but they can be at different points in the input. These matches are represented in the MatchSet. (I didn't look around for a nice way of doing this, so there may be something much better, or I might have made an obvious mistake).

The key functions on a MatchSet are expect and fork. Both operate by taking a function from the client which operates on the input. expect compares each in-progress match with the input and if the input can be matched we continue; if it cannot, then the match is deleted. fork iterates over the in-progress matches, forking each one. One match is matched against the next element in the patten, and one is not. For example, if we have a pattern ab?c and a single match which has matched a in the input then we can fork and one match will attempt to match b then c, and one will just match c.

One interesting aspect of matching is handling metavariable matching in repeated parts of a pattern, e.g., in $($n:ident: $e: expr),*. Here we would repeatedly try to match $n:ident: $e: expr and find values for n and e, we then need to push each value into a Vec<Ident> and a Vec<Expr>. We call this 'hoisting' the variables (since we are moving out of a scope while converting T to U<T>). We generate code for this which uses an implementation of hoist in the Fork trait for each MatchesBuilder, a MatchesHandler helper struct for the MatchSet, and generated code for each kind of repeat which can appear in a pattern.

proc-macro-rules

How it works

Leaving Mozilla and (most of) the Rust project

What to do in Christchurch