RFC 0401: Coercions and casts

lang | libs (typesystem | expressions | machine | dst | raw-pointers | coercions | cast)

Summary

Describe the various kinds of type conversions available in Rust and suggest some tweaks.

Provide a mechanism for smart pointers to be part of the DST coercion system.

Reform coercions from functions to closures.

The transmute intrinsic and other unsafe methods of type conversion are not covered by this RFC.

Motivation

It is often useful to convert a value from one type to another. This conversion might be implicit or explicit and may or may not involve some runtime action. Such conversions are useful for improving reuse of code, and avoiding unsafe transmutes.

Our current rules around type conversions are not well-described. The different conversion mechanisms interact poorly and the implementation is somewhat ad-hoc.

Detailed design

Rust has several kinds of type conversion: subtyping, coercion, and casting. Subtyping and coercion are implicit, there is no syntax. Casting is explicit, using the as keyword. The syntax for a cast expression is:

e_cast ::= e as U

Where e is any valid expression and U is any valid type (note that we restrict in type checking the valid types for U).

These conversions (and type equality) form a total order in terms of their strength. For any types T and U, if T == U then T is also a subtype of U. If T is a subtype of U, then T coerces to U, and if T coerces to U, then T can be cast to U.

There is an additional kind of coercion which does not fit into that total order

Finally, I will discuss function polymorphism, which is something of a coercion edge case.

Subtyping

Subtyping is implicit and can occur at any stage in type checking or inference. Subtyping in Rust is very restricted and occurs only due to variance with respect to lifetimes and between types with higher ranked lifetimes. If we were to erase lifetimes from types, then the only subtyping would be due to type equality.

Coercions

A coercion is implicit and has no syntax. A coercion can only occur at certain coercion sites in a program, these are typically places where the desired type is explicit or can be derived by propagation from explicit types (without type inference). The base cases are:

If the expression in one of these coercion sites is a coercion-propagating expression, then the relevant sub-expressions in that expression are also coercion sites. Propagation recurses from these new coercion sites. Propagating expressions and their relevant sub-expressions are:

Note that we do not perform coercions when matching traits (except for receivers, see below). If there is an impl for some type U, and T coerces to U, that does not constitute an implementation for T. For example, the following will not type check, even though it is OK to coerce t to &T and there is an impl for &T:

struct T;
trait Trait {}

fn foo<X: Trait>(t: X) {}

impl<'a> Trait for &'a T {}


fn main() {
    let t: &mut T = &mut T;
    foo(t); //~ ERROR failed to find an implementation of trait Trait for &mut T
}

In a cast expression, e as U, the compiler will first attempt to coerce e to U, and only if that fails will the conversion rules for casts (see below) be applied.

Coercion is allowed between the following types:

And where coerce_inner is defined as:

Note that coercing from sub-trait to a super-trait is a new coercion and is non- trivial. One implementation strategy which avoids re-computation of vtables is given in RFC PR #250.

A note for the future: although there hasn't been an RFC nor much discussion, it is likely that post-1.0 we will add type ascription to the language (see #354). That will (probably) allow any expression to be annotated with a type (e.g, foo(a, b: T, c) a function call where the second argument has a type annotation).

Type ascription is purely descriptive and does not cast the sub-expression to the required type. However, it seems sensible that type ascription would be a coercion site, and thus type ascription would be a way to make implicit coercions explicit. There is a danger that such coercions would be confused with casts. I hope the rule that casting should change the type and type ascription should not is enough of a discriminant. Perhaps we will need a style guideline to encourage either casts or type ascription to force an implicit coercion. Perhaps type ascription should not be a coercion site. Or perhaps we don't need type ascription at all if we allow trivial casts.

Custom unsizing coercions

It should be possible to coerce smart pointers (e.g., Rc) in the same way as the built-in pointers. In order to do so, we provide two traits and an intrinsic to allow users to make their smart pointers work with the compiler's coercions. It might be possible to implement some of the coercions described for built-in pointers using this machinery, and whether that is a good idea or not is an implementation detail.

// Cannot be impl'ed - it really is quite a magical trait, see the cases below.
trait Unsize<Sized? U> for Sized? {}

The Unsize trait is a marker trait and a lang item. It should not be implemented by users and user implementations will be ignored. The compiler will assume the following implementations, these correspond to the definition of coerce_inner, above; note that these cannot be expressed in real Rust:

impl<T, n: int> Unsize<[T]> for [T, ..n] {}

// Where T is a trait
impl<Sized? T, U: T> Unsize<T> for U {}

// Where T and U are traits
impl<Sized? T, Sized? U: T> Unsize<T> for U {}

// Where T and U are structs ... following the rules for coerce_inner
impl Unsize<T> for U {}

impl Unsize<(..., T)> for (..., U)
    where U: Unsize(T) {}

The CoerceUnsized trait should be implemented by smart pointers and containers which want to be part of the coercions system.

trait CoerceUnsized<U> {
    fn coerce(self) -> U;
}

To help implement CoerceUnsized, we provide an intrinsic - fat_pointer_convert. This takes and returns raw pointers. The common case will be to take a thin pointer, unsize the contents, and return a fat pointer. But the exact behaviour depends on the types involved. This will perform any computation associated with a coercion (for example, adjusting or creating vtables). The implementation of fat_pointer_convert will match what the compiler must do in coerce_inner as described above.

intrinsic fn fat_pointer_convert<Sized? T, Sized? U>(t: *const T) -> *const U
    where T : Unsize<U>;

Here is an example implementation of CoerceUnsized for Rc:

impl<Sized? T, Sized? U> CoerceUnsized<Rc<T>> for Rc<U> {
    where U: Unsize<T>

    fn coerce(self) -> Rc<T> {
        let new_ptr: *const RcBox<T> = fat_pointer_convert(self._ptr);
        Rc { _ptr: new_ptr }
    }
}

Coercions of receiver expressions

These coercions occur when matching the type of the receiver of a method call with the self type (i.e., the type of e in e.m(...)) or in field access. These coercions can be thought of as a feature of the . operator, they do not apply when using the UFCS form with the self argument in argument position. Only an expression before the dot is coerced as a receiver. When using the UFCS form of method call, arguments are only coerced according to the expression coercion rules. This matches the rules for dispatch - dynamic dispatch only happens using the . operator, not the UFCS form.

In method calls the target type of the coercion is the concrete type of the impl in which the method is defined, modified by the type of self. Assuming the impl is for T, the target type is given by:

selftarget type
selfT
&self&T
&mut self&mut T
self: Box<Self>Box<T>

and likewise with any variations of the self type we might add in the future.

For field access, the target type is &T, &mut T for field assignment, where T is a struct with the named field.

A receiver coercion consists of some number of dereferences (either compiler built-in (of a borrowed reference or Box pointer, not raw pointers) or custom, given by the Deref trait), one or zero applications of coerce_inner or use of the CoerceUnsized trait (as defined above, note that this requires we are at a type which has neither references nor dereferences at the top level), and up to two address-of operations (i.e., T to &T, &mut T, *const T, or *mut T, with a fresh lifetime.). The usual mutability rules for taking a reference apply. (Note that the implementation of the coercion isn't so simple, it is embedded in the search for candidate methods, but from the point of view of type conversions, that is not relevant).

Alternatively, a receiver coercion may be thought of as a two stage process. First, we dereference and then take the address until the source type has the same shape (i.e., has the same kind and number of indirection) as the target type. Then we try to coerce the adjusted source type to the target type using the usual coercion machinery. I believe, but have not proved, that these two descriptions are equivalent.

Casts

Casting is indicated by the as keyword. A cast e as U is valid if one of the following holds:

where &.T and *T are references of either mutability, and where unsize_kind(T) is the kind of the unsize info in T - the vtable for a trait definition (e.g. fmt::Display or Iterator, not Iterator<Item=u8>) or a length (or () if T: Sized).

Note that lengths are not adjusted when casting raw slices - T: *const [u16] as *const [u8] creates a slice that only includes half of the original memory.

Casting is not transitive, that is, even if e as U1 as U2 is a valid expression, e as U2 is not necessarily so (in fact it will only be valid if U1 coerces to U2).

A cast may require a runtime conversion.

There will be a lint for trivial casts. A trivial cast is a cast e as T where e has type U and U is a subtype of T. The lint will be warn by default.

Function type polymorphism

Currently, functions may be used where a closure is expected by coercing a function to a closure. We will remove this coercion and instead use the following scheme:

These steps should allow for functions to be stored in variables with both closure and function type. It also allows variables with function type to be stored as a variable with closure type. Note that these have different dynamic semantics, as described below. For example,

fn foo() { ... }         // `foo` has a fresh and non-denotable type.

fn main() {
    let x: fn() = foo;   // `foo` is coerced to `fn()`.
    let y: || = x;       // `x` is coerced to `&Fn` (a closure object),
                         // legal due to the `fn()` auto-impls.

    let z: || = foo;     // `foo` is coerced to `&T` where `T` is fresh and
                         // bounded by `Fn`. Legal due to the fresh function
                         // type auto-impls.
}

The two kinds of auto-generated impls are rather different: the first case (for the fresh and non-denotable function types) is a static call to Fn::Call, which in turn calls the function with the given arguments. The first call would be inlined (in fact, the impls and calls to them may be special-cased by the compiler). In the second case (for fn() types), we must execute a virtual call to find the implementing method and then call the function itself because the function is 'wrapped' in a closure object.

Changes required

Drawbacks

We are adding and removing some coercions. There is always a trade-off with implicit coercions on making Rust ergonomic vs making it hard to comprehend due to magical conversions. By changing this balance we might be making some things worse.

Alternatives

These rules could be tweaked in any number of ways.

Specifically for the DST custom coercions, the compiler could throw an error if it finds a user-supplied implementation of the Unsize trait, rather than silently ignoring them.

Amendments

Unresolved questions