RFC 1789: as-cell

lang | libs (conversions | machine | cell)

Summary

Note: https://github.com/rust-lang/rfcs/pull/1651 has been accepted recently, so no T: Copy bound is needed anymore.

Motivation

Rust's iterators offer a safe, fast way to iterate over collections while avoiding additional bound checks.

However, due to the borrow checker, they run into issues if we try to have more than one iterator into the same data structure while mutating elements in it.

Wanting to do this is not that unusual for many low level algorithms that deal with integers, floats or similar primitive data types.

For example, an algorithm might...

Todays answer for algorithms like that is to fall back to C-style for loops and indexing, which might look like this...


let v: Vec<i32> = ...;

// example 1
for i in 0..v.len() {
    for j in 0..v.len() {
        v[j] = f(v[i], v[j]);
    }
}

// example 2
for i in n..v.len() {
    v[i] = g(v[i - n]);
}

...but this reintroduces potential bound-checking costs.

The alternative, short of changing the actual algorithms involved, is to use internal mutability to enable safe mutations even with overlapping shared views into the data:

let v: Vec<Cell<i32>> = ...;

// example 1
for i in &v {
    for j in &v {
        j.set(f(i.get(), j.get()));
    }
}

// example 2
for (i, j) in v[n..].iter().zip(&v) {
    i.set(g(g.get()));
}

This has the advantages of allowing both bound-check free iteration and aliasing references, but comes with restrictions that makes it not generally applicable, namely:

This RFC proposes a way to address these in cases where Cell<T> could be used by introducing simple conversions functions to the standard library that allow the creation of shared borrowed Cell<T>s from mutably borrowed Ts.

This in turn allows the original data structure to remain unchanged, while allowing to temporary opt-in to the Cell API as needed. As an example, given Cell::from_mut_slice(&mut [T]) -> &[Cell<T>], the previous examples can be written as this:

let mut v: Vec<i32> = ...;

// convert the mutable borrow
let v_slice: &[Cell<T>] = Cell::from_mut_slice(&mut v);

// example 1
for i in v_slice {
    for j in v_slice {
        j.set(f(i.get(), j.get()));
    }
}

// example 2
for (i, j) in v_slice[n..].iter().zip(v_slice) {
    i.set(g(g.get()));
}

Detailed design

Language

The core of this proposal is the ability to convert a &T to a &Cell<T>, so in order for it to be safe, it needs to be guaranteed that T and Cell<T> have the same memory layout, and that there are no codegen issues based on viewing a reference to a type that does not contain a UnsafeCell as a reference to a type that does contain a UnsafeCell.

As far as the author is aware, both should already implicitly fall out of the semantic of Cell and Rusts/llvms notion of aliasing:

Std library

from_mut

We add a constructor to the cell API that enables the &mut T -> &Cell<T> conversion, implemented with the equivalent of a transmute() of the two pointers:

impl<T> Cell<T> {
    fn from_mut<'a>(t: &'a mut T) -> &'a Cell<T> {
        unsafe {
            &*(t as *mut T as *const Cell<T>)
        }
    }
}

In the future this could also be provided through AsRef, Into or From impls.

Unsized Cell<T>

We extend Cell<T> to allow T: ?Sized, and move all compatible methods to a less restricted impl block:

pub struct Cell<T: ?Sized> {
    value: UnsafeCell<T>,
}

impl<T: ?Sized> Cell<T> {
    pub fn as_ptr(&self) -> *mut T;
    pub fn get_mut(&mut self) -> &mut T;
    pub fn from_mut(value: &mut T) -> &Cell<T>;
}

This is purely done to enable cell slicing below, and should otherwise have no effect on any existing code.

Cell Slicing

We enable a conversion from &Cell<[T]> to &[Cell<T>]. This seems like it violates the "no interior references" API of Cell at first glance, but is actually safe:

For convenience, we expose this conversion by implementing Index for Cell<[T]>:

impl<T> Index<RangeFull> for Cell<[T]> {
    type Output = [Cell<T>];

    fn index(&self, _: RangeFull) -> &[Cell<T>] {
        unsafe {
            &*(self as *const Cell<[T]> as *const [Cell<T>])
        }
    }
}

impl<T> Index<Range<usize>> for Cell<[T]> {
    type Output = [Cell<T>];

    fn index(&self, idx: Range<usize>) -> &[Cell<T>] {
        &self[..][idx]
    }
}

impl<T> Index<RangeFrom<usize>> for Cell<[T]> {
    type Output = [Cell<T>];

    fn index(&self, idx: RangeFrom<usize>) -> &[Cell<T>] {
        &self[..][idx]
    }
}

impl<T> Index<RangeTo<usize>> for Cell<[T]> {
    type Output = [Cell<T>];

    fn index(&self, idx: RangeTo<usize>) -> &[Cell<T>] {
        &self[..][idx]
    }
}

impl<T> Index<usize> for Cell<[T]> {
    type Output = Cell<T>;

    fn index(&self, idx: usize) -> &Cell<T> {
        &self[..][idx]
    }
}

Using this, the motivation example can be written as such:

let mut v: Vec<i32> = ...;

// convert the mutable borrow
let v_slice: &[Cell<T>] = &Cell::from_mut(&mut v[..])[..];

// example 1
for i in v_slice {
    for j in v_slice {
        j.set(f(i.get(), j.get()));
    }
}

// example 2
for (i, j) in v_slice[n..].iter().zip(v_slice) {
    i.set(g(g.get()));
}

Possible extensions

The proposal only covers the base case &mut T -> &Cell<T> and the trivially implementable extension to [T], but in theory this conversion could be enabled for many "higher level mutable reference" types, like for example mutable iterators (with the goal of making them cloneable through this).

See https://play.rust-lang.org/?gist=d012cebf462841887323185cff8ccbcc&version=stable&backtrace=0 for an example implementation and a more complex use case, and https://crates.io/crates/alias for an existing crate providing these features.

How We Teach This

What names and terminology work best for these concepts and why? How is this idea best presented—as a continuation of existing Rust patterns, or as a wholly new one?

The API could be described as "temporarily opting-in to internal mutability". It would be a more flexible continuation of the existing usage of Cell<T> since the Cell<T> no longer needs to exist in the original location if you have mutable access to it.

Would the acceptance of this proposal change how Rust is taught to new users at any level? How should this feature be introduced and taught to existing Rust users?

As it is, the API just provides a few neat conversion functions. Nevertheless, with the legalization of the &mut T -> &Cell<T> conversion there is the potential for a major change in how accessors to data structures are provided:

In todays Rust, there are generally three different ways:

With this change, it would be possible in many cases to add a fourth accessor:

For example, today there exist:

We could then add a fourth iterator like this:

So there is the potential that we go away from teaching the "rule of three" of ownership and change it to a "rule of four".

What additions or changes to the Rust Reference, The Rust Programming Language, and/or Rust by Example does it entail?

Drawbacks

Why should we not do this?

Alternatives

Removing cell slicing

Instead of allowing unsized types in Cell and adding the Index impls, there could just be a single &mut [T] -> &[Cell<T>] conversions function:

impl<T> Cell<T> {
    /// [...]

    fn from_mut_slice<'a>(t: &'a mut [T]) -> &'a [Cell<T>] {
        unsafe {
            &*(t as *mut [T] as *const [Cell<T>])
        }
    }
}

Usage:

let mut v: Vec<i32> = ...;

// convert the mutable borrow
let v_slice: &[Cell<T>] = Cell::from_mut_slice(&mut v);

// example 1
for i in v_slice {
    for j in v_slice {
        j.set(f(i.get(), j.get()));
    }
}

// example 2
for (i, j) in v_slice[n..].iter().zip(v_slice) {
    i.set(g(g.get()));
}

This would be less modular than the &mut [T] -> &Cell<[T]> -> &[Cell<T>] conversions steps, while still offering essentially the same API.

Just the language guarantee

The conversion could be guaranteed as correct, but not be provided by std itself. This would serve as legitimization of external implementations like alias.

No guarantees

If the safety guarantees of the conversion can not be granted, code would have to use direct indexing as today, with either possible bound checking costs or the use of unsafe code to avoid them.

Replacing Index impls with Deref

Instead of the Index impls, have only this Deref impl:

impl<T> Deref for Cell<[T]> {
    type Target = [Cell<T>];

    fn deref(&self) -> &[Cell<T>] {
        unsafe {
            &*(self as *const Cell<[T]> as *const [Cell<T>])
        }
    }
}

Pro:

Cons:

Cast to &mut Cell<T> instead of &Cell<T>

Nothing that makes the &mut T -> &Cell<T> conversion safe would prevent &mut T -> &mut Cell<T> from being safe either, and the latter can be trivially turned into a &Cell<T> while also allowing mutable access - eg to call Cell::as_mut() to conversion back again.

Similar to that, there could also be a way to turn a &mut [Cell<T>] back into a &mut [T].

However, this does not seem to be actually useful since the only reason to use this API is to make use of shared internal mutability.

Exposing the functions differently

Instead of Cell constructors, we could just have freestanding functions in, say, std::cell:

fn ref_as_cell<T>(t: &mut T) -> &Cell<T> {
    unsafe {
        &*(t as *mut T as *const Cell<T>)
    }
}

fn cell_slice<T>(t: &Cell<[T]>) -> &[Cell<T>] {
    unsafe {
        &*(t as *const Cell<[T]> as *const [Cell<T>])
    }
}

On the opposite spectrum, and should this feature end up being used somewhat commonly, we could provide the conversions by dedicated traits, possibly in the prelude, or use the std coherence hack to implement them directly on &mut T and &mut [T]:

trait AsCell {
    type Cell;
    fn as_cell(self) -> Self::Cell;
}

impl<'a, T> AsCell for &'a mut T {
    type Cell = &'a Cell<T>;
    fn as_cell(self) -> Self::Cell {
        unsafe {
            &*(self as *mut T as *const Cell<T>)
        }
    }
}

But given the issues of adding methods to pointer-like types, this approach in general would probably be not a good idea (See the situation with Rc and Arc).

Unresolved questions

None so far.