4 November 2015

Macros in Rust pt2

(Continuing from last time).

procedural macros

Also known as syntax extensions or syntax plugins or compiler plugins (which they are just a category of). I'll not get into the naming thing here, but I'll try and stick to 'procedural macro'. Procedural macros are implemented using pure Rust code at the meta level, rather than being written in the 'macro by example' syntax. They are extremely powerful since the output can be anything you can express as an AST. However, they are a bit rough around the edges and you have to use compiler internals (directly or indirectly), so they are subject to breaking changes. Therefore, procedural macros are not yet a stable part of the language (although we're working on it, more in later blog posts).

Unlike macro_rules macros which can only be used in one way, procedural macro uses can look very different. They can appear (like macro_rules macros) as function-like: foo!(...), they can be function-like with an ident: foo! bar (...) (like macro_rules itself, which looks like this), or they can look like attributes: #[foo(...)]. Attribute-like macros are defined by the following grammar:

attribute-like-macro ::= #[item] | #![item]
item ::= name | name = lit | name(item,*)

where lit is any Rust literal, e.g., a string, integer, floating point number, etc.

To use a procedural macro, it must be in scope. Rather than using extern crate (which pulls in a crate for runtime use), we use #![plugin(foo)] to load foo for use at compile time. Since procedural macros are unstable, you must also add #![feature(plugin)] to your crate to allow their use (and be using a nightly version of the compiler).

implementing a procedural macro

To implement a procedural macro, you implement a Rust trait, either by writing a function or by providing an expand method. There are several different traits you can implement, corresponding to the different kinds of macro:

MultiItemModifier an attribute-like macro which modifies the item it is attached to.
MultiItemDecorator an attribute-like macro which only creates new items. The Multi prefix in these two kinds is an historical artifact and should be removed.
TTMacroExpander a function-like macro which takes a token tree (basically a list of tokens, more on these in a later blog post).
IdentMacroExpander a function-like macro which takes an identifier and a token tree.
MacroRulesTT you shouldn't implement this one, it is just for defining macro_rules, listed here for completeness.

See base.rs for the details of these traits. Also in that file is the SyntaxExtension enum which has a variant for each kind of procedural macro and keeps a trait object and some other details.

You then register the implementation with the plugin registry, which allows users of the macro to access it. To do this you need a function which takes a rustc::plugin::Registry argument and has the #[plugin_registrar] attribute. In this function you register your macros. If the argument is called reg, then you call reg.register_syntax_extension passing an interned string for the name of the macro (e.g., ::syntax::parse::token::intern("foo")) and an instance of the SyntaxExtension enum. To do all this, you'll need the #![feature(plugin_registrar)] feature gate.

constructing code

Attribute-like macros operate on an AST node, given by the Annotatable enum. They also take an ast::MetaItem which contains any data passed to the attribute (see the item category in the little grammar above). They return an Annotatable. Function-like macros operate on a token tree and return an implementation of MacResult, this can return various kinds of AST node. So, macros take either an AST node or tokens and produce AST nodes.

When implementing the macro, you have various choices for constructing the AST output. The simplest way is to do it manually by assembling the various AST nodes (see ast.rs to see what they look like). This is pretty tedious. For most uses, it is quicker to use an AstBuilder, defined in build.rs, the usual implementor is ExtCtxt, which is passed in to your macro function.

The most convenient approach is quasi-quoting, you use the quote_expr, quote_stmt, quote_item, etc. macros which take an ExtCtxt and a bunch of tokens. The quasi-quoter parses the tokens and gives you AST nodes. You can even embed AST nodes you've created elsewhere in the input by using $foo where foo is a variable in scope which contains an AST node. For example,

let arg = ast::Expr { ... };
let call = quote_expr!(ctxt, foo.m($arg));

Here, call will be the AST for a an expression calling foo.m() with the actual argument given by the arg AST.

Finally, you could use a third party library. The only one I know of is ASTer which makes heavy use of the builder pattern and is (depending on your use case) usually more ergonomic than Rust's AstBuilder.

Talking of ASTer, it is worth mentioning Syntex, which ASTer can work with. This is a fork of libsyntax which allows you to use procedural macros with stable Rust via a pre-processing step. Note that there are limitations to this approach - the Rust compiler doesn't know about its expansion, so you don't get as much error information, and it does not cooperate with Rust's hygiene mechanism.

error handling

You can emit compile-time errors and warnings from procedural macros if there are errors in the input. There are methods on the ExtCtxt, such as span_err, which issue an error for a given span in the source code. If you cause an error, you must still return some result: function-like macros can return a DummyResult, attribute-like ones can usually just return the input - it won't be used for anything (as far as I know). Issuing an error will mean compilation stops after macro expansion.

You should not panic in a procedural macro, since doing so will cause the compiler to crash (ICE).

examples

For an example of putting all this together, see libhoare. This is a library of procedural macros I wrote for addding pre- and postconditions to functions and methods. The various macros are all MultiItemModifiers since they modify the functions they are attached to in order to add the pre- and postcondition code. Note that part of that code involves inserting macro uses (assert!), which are handled correctly by expansion.

Other examples of procedural macros are compile-time regexes and in HTML5ever.

built-in macros

Built-in macros look like regular macros when you use them, but are implemented inside the compiler, rather than in an external library. They are defined in libsyntax/ext and registered in ext::base::initial_syntax_expander_table. Some, such as include!, are explicitly declared in std::macros::builtin for documentation purposes. Others, such as asm! and #[derive(...)], are not. There is also #[cfg(...)], which acts like a macro but is deeply built-in to the compiler.

other ways to extend the compiler

Although these are not macros in any sense, for completeness I think it is worth mentioning other ways in which the compiler can be extended. These are all registered with the plugin registrar in the same way as procedural macros.

Lints are custom checks that the compiler runs on your code. They can be configured to warn or produce an error, or be ignored. They typically check for style issues or for common gotchas - which are not errors in the usual sense, but usually represent a mistake (such as an unused variable). Many are supplied with the compiler, others can be used from a library, such as Clippy. Lints can also be grouped into lint groups. Lints run later in the compiler than macros, either after expansion and before the analysis phase (operating on the AST), or at the end of the analysis phase, after all type checking is complete, but before we start generating code (operating on the HIR).

You can also register additional LLVM passes for the compiler to run. This has been used to run American Fuzzy Lop. This facility does not seem to be well-documented.

Finally, for the ultimate customisation experience, you can use the compiler as a library. For more information, see this tutorial and demo which collects some simple statistics for Rust programs.