RFC 1291: promote-libc

libs (nursery | libc)

Summary

Promote the libc crate from the nursery into the rust-lang organization after applying changes such as:

Motivation

The current libc crate is a bit of a mess unfortunately, having long since departed from its original organization and scope of definition. As more platforms have been added over time as well as more APIs in general, the internal as well as external facing organization has become a bit muddled. Some specific concerns related to organization are:

Additionally, on the technical and tooling side of things some concerns are:

The purpose of this RFC is to largely propose a reorganization of the libc crate, along with tweaks to some of the mundane details such as internal organization, CI automation, how new additions are accepted, etc. These changes should all help push libc to a more more robust position where it can be well trusted across all platforms both now and into the future!

Detailed design

All design can be previewed as part of an in progress fork available on GitHub. Additionally, all mentions of the libc crate in this RFC refer to the external copy on crates.io, not the in-tree one in the rust-lang/rust repository. No changes are being proposed (e.g. to stabilize) the in-tree copy.

What is this crate?

The primary purpose of this crate is to provide all of the definitions necessary to easily interoperate with C code (or "C-like" code) on each of the platforms that Rust supports. This includes type definitions (e.g. c_int), constants (e.g. EINVAL) as well as function headers (e.g. malloc).

One question that typically comes up with this sort of purpose is whether the crate is "cross platform" in the sense that it basically just works across the platforms it supports. The libc crate, however, is not intended to be cross platform but rather the opposite, an exact binding to the platform in question. In essence, the libc crate is targeted as "replacement for #include in Rust" for traditional system header files, but it makes no effort to be portable by tweaking type definitions and signatures.

The Home of libc

Currently this crate resides inside of the main rust repo of the rust-lang organization, but this unfortunately somewhat hinders its development as it takes awhile to land PRs and isn't quite as quick to release as external repositories. As a result, this RFC proposes having the crate reside externally in the rust-lang organization so additions can be made through PRs (tested much more quickly).

The main repository will have a submodule pointing at the external repository to continue building libstd.

Public API

The libc crate will hide all internal organization of the crate from users of the crate. All items will be reexported at the top level as part of a flat namespace. This brings with it a number of benefits:

A downside of this approach, however, is that the public API of libc will be platform-specific (e.g. the set of symbols it exposes is different across platforms), which isn't seen very commonly throughout the rest of the Rust ecosystem today. This can be mitigated, however, by clearly indicating that this is a platform specific library in the sense that it matches what you'd get if you were writing C code across multiple platforms.

The API itself will include any number of definitions typically found in C header files such as:

As a technical detail, all struct types exposed in libc will be guaranteed to implement the Copy and Clone traits. There will be an optional feature of the library to implement Debug for all structs, but it will be turned off by default.

Changes from today

The in progress implementation of this RFC has a number of API changes and breakages from today's libc crate. Almost all of them are minor and targeted at making bindings more correct in terms of faithfully representing the underlying platforms.

There is, however, one large notable change from today's crate. The size_t, ssize_t, ptrdiff_t, intptr_t, and uintptr_t types are all defined in terms of isize and usize instead of known sizes. Brought up by @briansmith on #28096 this helps decrease the number of casts necessary in normal code and matches the existing definitions on all platforms that libc supports today. In the future if a platform is added where these type definitions are not correct then new ones will simply be available for that target platform (and casts will be necessary if targeting it).

Note that part of this change depends upon removing the compiler's lint-by-default about isize and usize being used in FFI definitions. This lint is mostly a holdover from when the types were named int and uint and it was easy to confuse them with C's int and unsigned int types.

The final change to the libc crate will be to bump its version to 1.0.0, signifying that breakage has happened (a bump from 0.1.x) as well as having a future-stable interface until 2.0.0.

Scope of libc

The name "libc" is a little nebulous as to what it means across platforms. It is clear, however, that this library must have a well defined scope up to which it can expand to ensure that it doesn't start pulling in dozens of runtime dependencies to bind all the system APIs that are found.

Unfortunately, however, this library also can't be "just libc" in the sense of "just libc.so on Linux," for example, as this would omit common APIs like pthreads and would also mean that pthreads would be included on platforms like MUSL (where it is literally inside libc.a). Additionally, the purpose of libc isn't to provide a cross platform API, so there isn't necessarily one true definition in terms of sets of symbols that libc will export.

In order to have a well defined scope while satisfying these constraints, this RFC proposes that this crate will have a scope that is defined separately for each platform that it targets. The proposals are:

New platforms added to libc can decide the set of libraries libc will link to and bind at that time.

Internal structure

The primary change being made is that the crate will no longer be one large file sprinkled with #[cfg] annotations. Instead, the crate will be split into a tree of modules, and all modules will reexport the entire contents of their children. Unlike most libraries, however, most modules in libc will be hidden via #[cfg] at compile time. Each platform supported by libc will correspond to a path from a leaf module to the root, picking up more definitions, types, and constants as the tree is traversed upwards.

This organization provides a simple method of deduplication between platforms. For example libc::unix contains functions found across all unix platforms whereas libc::unix::bsd is a refinement saying that the APIs within are common to only BSD-like platforms (these may or may not be present on non-BSD platforms as well). The benefits of this structure are:

Testing

The current set of bindings in the libc crate suffer a drawback in that they are not verified. This is often a pain point for new platforms where when copying from an existing platform it's easy to forget to update a constant here or there. This lack of testing leads to problems like a wrong definition of ioctl which in turn lead to backwards compatibility problems when the API is fixed.

In order to solve this problem altogether, the libc crate will be enhanced with the ability to automatically test the FFI bindings it contains. As this crate will begin to live in rust-lang instead of the rust repo itself, this means it can leverage external CI systems like Travis CI and AppVeyor to perform these tasks.

The current implementation of the binding testing verifies attributes such as type size/alignment, struct field offset, struct field types, constant values, function definitions, etc. Over time it can be enhanced with more metrics and properties to test.

In theory adding a new platform to libc will be blocked until automation can be set up to ensure that the bindings are correct, but it is unfortunately not easy to add this form of automation for all platforms, so this will not be a requirement (beyond "tier 1 platforms"). There is currently automation for the following targets, however, through Travis and AppVeyor:

Drawbacks

Loss of module organization

The loss of an internal organization structure can be seen as a drawback of this design. While perhaps not precisely true today, the principle of the structure was that it is easy to constrain yourself to a particular C standard or subset of C to in theory write "more portable programs by default" by only using the contents of the respective module. Unfortunately in practice this does not seem to be that much in use, and it's also not clear whether this can be expressed through simply headers in libc. For example many platforms will have slight tweaks to common structures, definitions, or types in terms of signedness or value, so even if you were restricted to a particular subset it's not clear that a program would automatically be more portable.

That being said, it would still be useful to have these abstractions to some degree, but the filp side is that it's easy to build this sort of layer on top of libc as designed here externally on crates.io. For example extern crate posix could just depend on libc and reexport all the contents for the POSIX standard, perhaps with tweaked signatures here and there to work better across platforms.

Loss of Windows bindings

By only exposing the CRT functions on Windows, the contents of libc will be quite trimmed down which means when accessing similar functions like send or connect crates will be required to link to two libraries at least.

This is also a bit of a maintenance burden on the standard library itself as it means that all the bindings it uses must move to src/libstd/sys/windows/c.rs in the immedidate future.

Alternatives

Unresolved questions