/ Mozilla

Macro hygiene in all its guises and variations

Note, I'm not sure of the terminology for some of this stuff, so I might be making horrible mistakes, apologies.

Usually, when we talk about macro hygiene we mean the ability to not confuse identifiers with the same name but from different contexts. This is a big and interesting topic in it's own right and I'll discuss it in some depth later. Today I want to talk about other kinds of macro hygiene.

There is hygiene when naming items (I've heard this called "path hygiene", but I'm not sure if that is a standard term). For example,

mod a {
    fn f() {}

    pub macro foo() {
        f();
    }
}

a::foo!();

The macro use will expand to f(), but there is no f in scope. Currently this will be a name resolution error. Ideally, we would remember the scope where the call to f came from and look up f in that scope.

I believe that switching our hygiene algorithm to scope sets and using the scope sets for name resolution solves this issue.

Privacy hygiene

In the above example, f is private to a, so even if we can name it from the expansion of foo, we still can't access it due to its visibility. Again, scope sets comes to the rescue. The intuition is that we check privacy from the scope used to find f, not from its lexical position. There are a few more details than that, but nothing that will make sense before explaining the scope sets algorithm in detail.

Unsafety hygiene

The goal here is that when checking for unsafety, whether or not we are allowed to execute unsafe code depends on the context where the code is written, not where it is expanded. For example,

unsafe fn foo(x: i32) {}

macro m1($x: expr) {
    foo($x)
}

macro m2($x: expr) {
    $x
}

macro m3($x: expr) {
    unsafe {
        foo($x)
    }
}

macro m4($x: expr) {
    unsafe {
        $x
    }
}

fn main() {
    foo(42); // bad
    unsafe {
        foo(42);  // ok
    }
    m1(42); // bad
    m2(foo(42)); // bad
    m3(42); // ok
    m4(foo(42)); // bad
    unsafe {
        m1(42); // bad
        m2(foo(42)); // ok
        m3(42); // ok
        m4(foo(42)); // ok
    }
}

We could in theory use the same hygiene information as for the previous kinds. But when checking unsafety we are checking expressions, not identifiers, and we only record hygiene info for identifiers.

One solution would be to track hygiene for all tokens, not just identifiers. That might not be too much effort since groups of tokens passed together would have the same hygiene info. We would only be duplicating indices into a table, not more data than that. We would also have to track or be able to calculate the safety-status of scopes.

Alternatively, we could introduce a new kind of block into the token tree system - a block which can't be written by the user, only created by expansion or procedural macros. It would affect precedence but not scoping. Such a block is also the solution to having interpolated AST in the token stream - we just have tokens wrapped in the scope-less block. Such a block could be annotated with its safety-status. We would need to track unsafety during parsing/expansion to make this work. We have something similar to this in the HIR where we can push/pop unsafe blocks. I believe we want an absolute setting here rather than push/pop though, and we also don't want to introduce new scoping.

We could follow the current stability solution and annotate spans, but this is a bit of an abuse of spans, IMO.

I'm not super-happy with any of these solutions.

Stability hygiene

Finally, stability. We would like for macros in libraries with access to unstable code to be able to access unstable code when expanded. This is currently supported in Rust by having a bool on spans. We can probably continue to use this system or adapt either of the solutions proposed for unsafety hygiene.

It would be nice for macros to be marked as stable and unstable, I believe this is orthogonal to hygiene though.