RFC 0486: std-ascii-reform

libs ()

Summary

Move the std::ascii::Ascii type and related traits to a new Cargo package on crates.io, and instead expose its functionality for u8, [u8], char, and str types.

Motivation

The std::ascii::Ascii type is a u8 wrapper that enforces (unless unsafe code is used) that the value is in the ASCII range, similar to char with u32 in the range of Unicode scalar values, and String with Vec<u8> containing well-formed UTF-8 data. [Ascii] and Vec<Ascii> are naturally strings of text entirely in the ASCII range.

Using the type system like this to enforce data invariants is interesting, but in practice Ascii is not that useful. Data (such as from the network) is rarely guaranteed to be ASCII only, nor is it desirable to remove or replace non-ASCII bytes, even if ASCII-range-only operations are used. (For example, ASCII case-insensitive matching is common in HTML and CSS.)

Every single use of the Ascii type in the Rust distribution is only to use the to_lowercase or to_uppercase method, then immediately convert back to u8 or char.

Detailed design

The Ascii type as well as the AsciiCast, OwnedAsciiCast, AsciiStr, and IntoBytes traits should be copied into a new ascii Cargo package on crates.io. The std::ascii copy should be deprecated and removed at some point before Rust 1.0.

Currently, the AsciiExt trait is:

pub trait AsciiExt<T> {
    fn to_ascii_upper(&self) -> T;
    fn to_ascii_lower(&self) -> T;
    fn eq_ignore_ascii_case(&self, other: &Self) -> bool;
}

impl AsciiExt<String> for str { ... }
impl AsciiExt<Vec<u8>> for [u8] { ... }

It should gain new methods for the functionality that is being removed with Ascii, be implemented for u8 and char, and (if this is stable enough yet) use an associated type instead of the T parameter:

pub trait AsciiExt {
    type Owned = Self;
    fn to_ascii_upper(&self) -> Owned;
    fn to_ascii_lower(&self) -> Owned;
    fn eq_ignore_ascii_case(&self, other: &Self) -> bool;
    fn is_ascii(&self) -> bool;

    // Maybe? See unresolved questions
    fn is_ascii_lowercase(&self) -> bool;
    fn is_ascii_uppercase(&self) -> bool;
    ...
}

impl AsciiExt for str { type Owned = String; ... }
impl AsciiExt for [u8] { type Owned = Vec<u8>; ... }
impl AsciiExt char { ... }
impl AsciiExt u8 { ... }

The OwnedAsciiExt trait should stay as it is:

pub trait OwnedAsciiExt {
    fn into_ascii_upper(self) -> Self;
    fn into_ascii_lower(self) -> Self;
}

impl OwnedAsciiExt for String { ... }
impl OwnedAsciiExt for Vec<u8> { ... }

The std::ascii::escape_default function has little to do with ASCII. I think it’s relevant to b'x' and b"foo" byte literals, which have types u8 and &'static [u8]. I suggest moving it into std::u8.

I (@SimonSapin) can help with the implementation work.

Drawbacks

Code using Ascii (not only for e.g. to_lowercase) would need to install a Cargo package to get it. This is strictly more work than having it in std, but should still be easy.

Alternatives

Unresolved questions