RFC 2151: raw-identifiers

lang (syntax | resolve)

Summary

Add a raw identifier format r#ident, so crates written in future language editions/versions can still use an older API that overlaps with new keywords.

Motivation

One of the primary examples of breaking changes in the edition RFC is to add new keywords, and specifically catch is the first candidate. However, since that's seeking crate compatibility across editions, this would leave a crate in a newer edition unable to use catch identifiers in the API of a crate in an older edition. @matklad found 28 crates using catch identifiers, some public.

A raw syntax that's always an identifier would allow these to remain compatible, so one can write r#catch where catch-as-identifier is needed.

Guide-level explanation

Although some identifiers are reserved by the Rust language as keywords, it is still possible to write them as raw identifiers using the r# prefix, like r#ident. When written this way, it will always be treated as a plain identifier equivalent to a bare ident name, never as a keyword.

For instance, the following is an erroneous use of the match keyword:

fn match(needle: &str, haystack: &str) -> bool {
    haystack.contains(needle)
}
error: expected identifier, found keyword `match`
 --> src/lib.rs:1:4
  |
1 | fn match(needle: &str, haystack: &str) -> bool {
  |    ^^^^^

It can instead be written as fn r#match(needle: &str, haystack: &str), using the r#match raw identifier, and the compiler will accept this as a true match function.

Generally when defining items, you should just avoid keywords altogether and choose a different name. Raw identifiers require the r# prefix every time they are mentioned, making them cumbersome to both the developer and users. Usually an alternate is preferable: crate -> krate, const -> constant, etc.

However, new Rust editions may add to the list of reserved keywords, making a formerly legal identifier now interpreted otherwise. Since compatibility is maintained between crates of different editions, this could mean that code written in a new edition might not be able to name an identifier in the API of another crate. Using a raw identifier, it can still be named and used.

//! baseball.rs in edition 2015
pub struct Ball;
pub struct Player;
impl Player {
    pub fn throw(&mut self) -> Result<Ball> { ... }
    pub fn catch(&mut self, ball: Ball) -> Result<()> { ... }
}
//! main.rs in edition 2018 -- `catch` is now a keyword!
use baseball::*;
fn main() {
    let mut player = Player;
    let ball = player.throw()?;
    player.r#catch(ball)?;
}

Reference-level explanation

The syntax for identifiers allows an optional r# prefix for a raw identifier, otherwise following the normal identifer rules. Raw identifiers are always interpreted as plain identifiers and never as keywords, regardless of context. They are also treated equivalent to an identifier that wasn't raw -- for instance, it's perfectly legal to write:

let foo = 123;
let bar = r#foo * 2;

Drawbacks

Rationale and Alternatives

If we don't have any way to refer to identifiers that were legal in prior editions, but later became keywords, then this may hurt interoperability between crates of different editions. The r#ident syntax enables interoperability, and will hopefully invoke some intuition of being raw, similar to raw strings.

The br#ident syntax is also possible, but I see no advantage over r#ident. Identifiers don't need the same kind of distinction as str and [u8].

A small possible alternative is to also terminate it like r#ident#, which could allow non-identifier characters to be part of a raw identifier. This could take a cue from raw strings and allow repetition for internal #, like r##my #1 ident##. That doesn't allow a leading # or " though.

A different possibility is to use backticks for a string-like `ident`, like Kotlin, Scala, and Swift. If it allows non-identifier chars, it could embrace escapes like \u, and have a raw-string-identifier r`slash\ident` and even r#`tick`ident`#. However, backtick identifiers are annoying to write in markdown. (e.g. `` `ident` ``)

Backslashes could connote escaping identifiers, like \ident, perhaps surrounded like \ident\, \{ident}, etc. However, the infix RFC #1579 currently seems to be leaning towards \op syntax already.

Alternatives which already start legal tokens, like C#'s @ident, Dart's #ident, or alternate prefixes like identifier#catch, all break Macros 1.0 as @kennytm demonstrated:

macro_rules! x {
    (@ $a:ident) => {};
    (# $a:ident) => {};
    ($a:ident # $b:ident) => {};
    ($a:ident) => { should error };
}
x!(@catch);
x!(#catch);
x!(identifier#catch);
x!(keyword#catch);

C# allows Unicode escapes directly in identifiers, which also separates them from keywords, so both @catch and cl\u0061ss are valid class identifiers. Java also allows Unicode escapes, but they don't avoid keywords.

For some new keywords, there may be contextual mitigations. In the case of catch, it couldn't be a fully contextual keyword because catch { ... } could be a struct literal. That context might be worked around with a path, like old_edition::catch { ... } to use an identifier instead. Contexts that don't make sense for a catch expression can just be identifiers, like foo.catch(). However, this might not be possible for all future keywords.

There might also be a need for raw keywords in the other direction, e.g. so the older edition can still use the new catch functionality somehow. I think this particular case is already served well enough by do catch { ... }, if we choose to stabilize it that way. Perhaps br#keyword could be used for this, but that may not be a good intuitive relationship.

Unresolved questions