RFC 0079: Struct layout is undefined by default

lang | libs (data-types | repr | machine | product-types)

Summary

Leave structs with unspecified layout by default like enums, for optimisation purposes. Use something like #[repr(C)] to expose C compatible layout.

Motivation

The members of a struct are always laid in memory in the order in which they were specified, e.g.

struct A {
    x: u8,
    y: u64,
    z: i8,
    w: i64,
}

will put the u8 first in memory, then the u64, the i8 and lastly the i64. Due to the alignment requirements of various types padding is often required to ensure the members start at an appropriately aligned byte. Hence the above struct is not 1 + 8 + 1 + 8 == 18 bytes, but rather 1 + 7 + 8 + 1 + 7 + 8 == 32 bytes, since it is laid out like

#[packed] // no automatically inserted padding
struct AFull {
    x: u8,
    _padding1: [u8, .. 7],
    y: u64,
    z: i8,
    _padding2: [u8, .. 7],
    w: i64
}

If the fields were reordered to

struct B {
    y: u64,
    w: i64,

    x: u8,
    i: i8
}

then the struct is (strictly) only 18 bytes (but the alignment requirements of u64 forces it to take up 24).

Having an undefined layout does allow for possible security improvements, like randomising struct fields, but this can trivially be done with a syntax extension that can be attached to a struct to reorder the fields in the AST itself. That said, there may be benefits from being able to randomise all structs in a program automatically/for testing, effectively fuzzing code (especially unsafe code).

Notably, Rust's enums already have undefined layout, and provide the #[repr] attribute to control layout more precisely (specifically, selecting the size of the discriminant).

Drawbacks

Forgetting to add #[repr(C)] for a struct intended for FFI use can cause surprising bugs and crashes. There is already a lint for FFI use of enums without a #[repr(...)] attribute, so this can be extended to include structs.

Having an unspecified (or otherwise non-C-compatible) layout by default makes interfacing with C slightly harder. A particularly bad case is passing to C a struct from an upstream library that doesn't have a repr(C) attribute. This situation seems relatively similar to one where an upstream library type is missing an implementation of a core trait e.g. Hash if one wishes to use it as a hashmap key.

It is slightly better if structs had a specified-but-C-incompatible layout, and one has control over the C interface, because then one can manually arrange the fields in the C definition to match the Rust order.

That said, this scenario requires:

Detailed design

A struct declaration like

struct Foo {
    x: T,
    y: U,
    ...
}

has no fixed layout, that is, a compiler can choose whichever order of fields it prefers.

A fixed layout can be selected with the #[repr] attribute

#[repr(C)]
struct Foo {
    x: T,
    y: U,
    ...
}

This will force a struct to be laid out like the equivalent definition in C.

There would be a lint for the use of non-repr(C) structs in related FFI definitions, for example:

struct UnspecifiedLayout {
   // ...
}

#[repr(C)]
struct CLayout {
   // ...
}


extern {
    fn foo(x: UnspecifiedLayout); // warning: use of non-FFI-safe struct in extern declaration

    fn bar(x: CLayout); // no warning
}

extern "C" fn foo(x: UnspecifiedLayout) { } // warning: use of non-FFI-safe struct in function with C abi.

Alternatives

Unresolved questions