RFC 1212: line-endings

libs (platform)

Summary

Change all functions dealing with reading "lines" to treat both '\n' and '\r\n' as a valid line-ending.

Motivation

The current behavior of these functions is to treat only '\n' as line-ending. This is surprising for programmers experienced in other languages. Many languages open files in a "text-mode" per default, which means when they iterate over the lines, they don't have to worry about the two kinds of line-endings. Such programmers will be surprised to learn that they have to take care of such details themselves in Rust. Some may not even have heard of the distinction between two styles of line-endings.

The current design also violates the "do what I mean" principle. Both '\r\n' and '\n' are widely used as line-separators. By talking about the concept of "lines", it is clear that the current file (or buffer, really) is considered to be in text format. It is thus very reasonable to expect "lines" to apply to both kinds of encoding lines in binary format.

In particular, if the crate is developed on Linux or Mac, the programmer will probably have most of his input encoded with only '\n' for the line-endings. He may use the functions talking about "lines", and they will work all right. It is only when someone runs this crate on input that contains '\r\n' that the bug will be uncovered. The editor has personally run into this issue when reading line-by-line from stdin, with the program suddenly failing on Windows.

Detailed design

The following functions will have to be changed: BufRead::lines and str::lines. They both should treat '\r\n' as marking the end of a line. This can be implemented, for example, by first splitting at '\n' like now and then removing a trailing '\r' right before returning data to the caller.

Furthermore, str::lines_any (the only function currently dealing with both kinds of line-endings) is deprecated, as it is then functionally equivalent with str::lines.

Drawbacks

This is a semantics-breaking change, changing the behavior of released, stable API. However, as argued above, the new behavior is much less surprising than the old one - so one could consider this fixing a bug in the original implementation. There are alternatives available for the case that one really wants to split at '\n' only, namely BufRead::split and str::split. However, BufRead:split does not iterate over String, but rather over Vec<u8>, so users have to insert an additional explicit call to String::from_utf8.

Alternatives

There's the obvious alternative of not doing anything. This leaves a gap in the features Rust provides to deal with text files, making it hard to treat both kinds of line-endings uniformly.

The second alternative is to add BufRead::lines_any which works similar to str::lines_any in that it deals with both '\n' and '\r\n'. This provides all the necessary functionality, but it still leaves people with the need to choose one of the two functions - and potentially choosing the wrong one. In particular, the functions with the shorter, nicer name (the existing ones) will almost always not be the right choice.

Unresolved questions

None I can think of.