RFC 1969: cargo-prepublish

tools (cargo | dependencies)

Summary

This RFC proposes the concept of patching sources for Cargo. Sources can be have their existing versions of crates replaced with different copies, and sources can also have "prepublished" crates by adding versions of a crate which do not currently exist in the source. Dependency resolution will work as if these additional or replacement crates actually existed in the original source.

One primary feature enabled by this is the ability to "prepublish" a crate to crates.io. Prepublication makes it possible to perform integration testing within a large crate graph before publishing anything to crates.io, and without requiring dependencies to be switched from the crates.io index to git branches. It can, to a degree, simulate an "atomic" change across a large number of crates and repositories, which can then actually be landed in a piecemeal, non-atomic fashion.

Motivation

Large Rust projects often end up pulling in dozens or hundreds of crates from crates.io, and those crates often depend on each other as well. If the project author wants to contribute a change to one of the crates nestled deep in the graph (say, xml-rs), they face a couple of related challenges:

The Goldilocks problem

It's likely that a couple of Cargo's existing features have already come to mind as potential solutions to the challenges above. But the existing features suffer from a Goldilocks problem:

Prepublication dependencies add another tool to this arsenal, with just the right amount of dependency unification: the precise amount you'd get after publication to crates.io.

Detailed design

The design itself is relatively straightforward. The Cargo.toml file will support a new section for patching a source of crates:

[patch.crates-io]
xml-rs = { path = "path/to/fork" }

The listed dependencies have the same syntax as the normal [dependencies] section, but they must all come form a different source than the source being patched. For example you can't patch crates.io with other crates from crates.io! Cargo will load the crates and extract the version information for each dependency's name, supplementing the source specified with the version it finds. If the same name/version pair already exists in the source being patched, then this will act just like [replace], replacing its source with the one specified in the [patch] section.

Like [replace], the [patch] section is only taken into account for the root crate (or workspace root); allowing it to accumulate anywhere in the crate dependency graph creates intractable problems for dependency resolution.

The sub-table of [patch] (where crates-io is used above) is used to specify the source that's being patched. Cargo will know ahead of time one identifier, literally crates-io, but otherwise this field will currently be interpreted as a URL of a source. The name crates-io will correspond to the crates.io index, and other urls, such as git repositories, may also be specified for patching. Eventually it's intended we'll grow support for multiple registries here with their own identifiers, but for now just literally crates-io and other URLs are allowed.

Examples

It's easiest to see how the feature works by looking at a few examples.

Let's imagine that xml-rs is currently at version 0.9.1 on crates.io, and we have the following dependency setup:

With this setup, the dependency graph for Servo will contain two versions of xml-rs: 0.9.1 and 0.8.0. That's because minor versions are coalesced; 0.9.1 is considered a minor release against 0.9.0, while 0.9.0 and 0.8.0 are incompatible.

Scenario: patching with a bugfix

Let's say that while developing foo we've got a lock file pointing to xml-rs 0.9.0, and we found the 0.9.0 branch of xml-rs that hasn't been touched since it was published. We then find a bug in the 0.9.0 publication of xml-rs which we'd like to fix.

First we'll check out foo locally and implement what we believe is a fix for this bug, and next, we change Cargo.toml for foo:

[patch.crates-io]
xml-rs = { path = "../xml-rs" }

When compiling foo, Cargo will resolve the xml-rs dependency to 0.9.0, as it did before, but that version's been replaced with our local copy. The local path dependency, which has version 0.9.0, takes precedence over the version found in the registry.

Once we've confirmed a fix bug we then continue to run tests in xml-rs itself, and then we'll send a PR to the main xml-rs repo. This then leads us to the next section where a new version of xml-rs comes into play!

Scenario: prepublishing a new minor version

Now, suppose that foo needs some changes to xml-rs, but we want to check that all of Servo compiles before pushing the changes through.

First, we change Cargo.toml for foo:

[patch.crates-io]
xml-rs = { git = "https://github.com/aturon/xml-rs", branch = "0.9.2" }

[dependencies]
xml-rs = "0.9.2"

For servo, we also need to record the prepublication, but don't need to modify or introduce any xml-rs dependencies; it's enough to be using the fork of foo, which we would be anyway:

[patch.crates-io]
xml-rs = { git = "https://github.com/aturon/xml-rs", branch = "0.9.2" }
foo = { git = "https://github.com/aturon/foo", branch = "fix-xml" }

Note that if Servo depended directly on foo it would also be valid to do:

[patch.crates-io]
xml-rs = { git = "https://github.com/aturon/xml-rs", branch = "0.9.2" }

[dependencies]
foo = { git = "https://github.com/aturon/foo", branch = "fix-xml" }

With this setup:

The Cargo.toml files that needed to be changed here span from the crate that actually cares about the new version (foo) upward to the root of the crate we want to do integration testing for (servo); no sibling crates needed to be changed.

Once xml-rs version 0.9.2 is actually published, we will likely be able to remove the [patch] sections. This is a discrete step that must be taken by crate authors, however (e.g. doesn't happen automatically) because the actual published 0.9.2 may not be precisely what we thought it was going to be. For example more changes could have been merged, it may not actually fix the bug, etc.

Scenario: prepublishing a breaking change

What happens if foo instead needs to make a breaking change to xml-rs? The workflow is identical. For foo:

[patch.crates-io]
xml-rs = { git = "https://github.com/aturon/xml-rs", branch = "0.10.0" }

[dependencies]
xml-rs = "0.10.0"

For servo:

[patch.crates-io]
xml-rs = { git = "https://github.com/aturon/xml-rs", branch = "0.10.0" }

[dependencies]
foo = { git = "https://github.com/aturon/foo", branch = "fix-xml" }

However, when we compile, we'll now get three versions of xml-rs: 0.8.0, 0.9.1 (retained from the previous lockfile), and 0.10.0. Assuming that xml-rs is a public dependency used to communicate between foo and bar this will result in a compilation error, since they are using distinct versions of xml-rs. To fix that, we'll need to update bar to also use the new, 0.10.0 prepublication version of xml-rs.

(Note that a private dependency distinction would help catch this issue at the Cargo level and give a maximally informative error message).

Impact on Cargo.lock

Usage of [patch] will perform backwards-incompatible modifications to Cargo.lock, meaning that usage of [patch] will prevent previous versions of Cargo from interpreting the lock file. Cargo will unconditionally resolve all entries in the [patch] section to precise dependencies, encoding them all in the lock file whether they're used or not.

Dependencies formed on crates listed in [patch] will then be listed directly in Cargo.lock, and the original listed crate will not be listed. In our example above we had:

We then update the crate foo to have a dependency of xml-rs = "0.10.0". This causes Cargo to encode in the lock file that foo depends directly on the git repository of xml-rs containing 0.10.0, but it does not mention that foo depends on the crates.io version of xml-rs-0.10.0 (it doesn't exist!). Note, however, that the lock file will still mention xml-rs-0.8.0 and xml-rs-0.9.1 because bar and baz depend on it.

To help put some TOML where our mouth is let's say we depend on env_logger but we're using [patch] to depend on a git version of the log crate, a dependency of env_logger. First we'll have our Cargo.toml including:

# Cargo.toml
[dependencies]
env_logger = "0.4"

With that we'll find this in Cargo.lock, notably everything comes from crates.io

# Cargo.lock
[[package]]
name = "env_logger"
version = "0.4.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
 "log 0.3.7 (registry+https://github.com/rust-lang/crates.io-index)",
]

[[package]]
name = "log"
version = "0.3.7"
source = "registry+https://github.com/rust-lang/crates.io-index"

Next up we'll add our [patch] section to crates.io:

# Cargo.toml
[patch.crates-io]
log = { git = 'https://github.com/rust-lang-nursery/log' }

and that will generate a lock file that looks (roughly) like:

# Cargo.lock
[[package]]
name = "env_logger"
version = "0.4.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
 "log 0.3.7 (git+https://github.com/rust-lang-nursery/log)",
]

[[package]]
name = "log"
version = "0.3.7"
source = "git+https://github.com/rust-lang-nursery/log#cb9fa28812ac27c9cadc4e7b18c221b561277289"

Notably log from crates.io is not mentioned at all here, and crucially so! Additionally Cargo has the fully resolved version of the log patch available to it, down to the sha of what to check out.

When Cargo rebuilds from this Cargo.lock it will not query the registry for versions of log, instead seeing that there's an exact dependency on the git repository (from the Cargo.lock) and the repository is listed as a patch, so it'll follow that pointer.

Impact on [replace]

The [patch] section in the manifest can in many ways be seen as a "replace 2.0". It is, in fact, strictly more expressive than the current [replace] section! For example these two sections are equivalent:

[replace]
'log:0.3.7' = { git = 'https://github.com/rust-lang-nursery/log' }

# is the same as...

[patch.crates-io]
log = { git = 'https://github.com/rust-lang-nursery/log' }

This is not accidental! The intial development of the [patch] feature was actually focused on prepublishing dependencies and was called [prepublish], but while discussing it a conclusion was reached that [prepublish] already allowed replacing existing versions in a registry, but issued a warning when doing so. It turned out that without a warning we ended up having a full-on [replace] replacement!

At this time, though, it is not planned to deprecate the [replace] section, nor remove it. After the [patch] section is implemented, if it ends up working out this may change. If after a few cycles on stable the [patch] section seems to be working well we can issue an official deprecation for [replace], printing a warning if it's still used.

Documentation, however, will immediately begin to recommend [patch] over [replace].

How We Teach This

Patching is a feature intended for large-scale projects spanning many repos and crates, where you want to make something like an atomic change across the repos. As such, it should likely be explained in a dedicated section for large-scale Cargo usage, which would also include build system integration and other related topics.

The mechanism itself is straightforward enough that a handful of examples (as in this RFC) is generally enough to explain it. In the docs, these examples should be spelled out in greater detail.

Most notably, however, the overriding dependenices section of Cargo's documentation will be rewritten to primarily mention [patch], but [replace] will be mentioned still with a recommendation to use [patch] instead if possible.

Drawbacks

This feature adds yet another knob around where, exactly, Cargo is getting its source and version information. In particular, it's basically deprecating [replace] if it works out, and it's typically a shame to deprecate major stable features.

Fortunately, because these features are intended to be relatively rarely used, checked in even more rarely, are only used for very large projects, and cannot be published to crates.io, the knobs are largely invisible to the vast majority of Cargo users, who are unaffected by them.

Alternatives

The primary alternative for addressing the motivation of this RFC would be to loosen the restrictions around [replace], allowing it to arbitrarily change the version of the crate being replaced.

As explained in the motivation section, however, such an approach does not fully address the desired workflow, for a few reasons:

Unresolved questions