29 June 2022

Complexity

In programming and programming language design we often talk about complexity. And for good reason! Complexity often feels like the enemy we are battling when learning a new codebase or new programming language, or just trying to debug some code. In this post I want to think a bit about what complexity is and how we might reason about it when thinking of new features in Rust?

Motivation

Complexity in programming hurts because it makes programming less productive. Code which is more complex (either because the programming language is complex, or because the code itself or its dependencies are complex) exacts a higher cognitive load to read, learn, and modify.

Other things being equal, a more complex language (or codebase) will take longer to learn, both for beginners and for those progressing to become productive programmers or experts. More time also means more opportunities to stop learning. For some people, learning a more complex thing can be more frustrating and therefore make it more likely they stop learning.

Even when learning is no longer a primary activity, complexity slows us down: it makes reading code slower, it makes it easier to write bugs (either because we don't fully understand the language or existing code, or because it is easier to hide bugs); similarly, it makes finding bugs slower. It also makes programming more demanding and stressful and makes us slower in the long term.

In general, people don't like complexity. From the 2021 Rust community annual survey, in answer to "What are your biggest worries for the future of Rust?", 33% of users included "Becomes too complex" (the second highest ranked answer). Consider also the following quotes from the same survey (from among many similar ones):

"Rust isn't too complex per se, but what's annoying is when trying to do one thing requires recursively learning about ten others."
"I cannot convince my colleagues to use it, because they find it too complex and not worth learning"
"some libraries are so frustrating or too complex to work with"
"Try to not make rust too complex"
"Rust is a wonderful language. However, it's still one of the most complex language I've learned."
"C++ did a lot of things right except shake its reputation for complexity"
"Please stabilize the language, and try to limit the complexity"

On the other hand, lets not go too far. Complexity is often used as a vague criticism when what we mean is 'I don't like this' or 'I don't understand this' or even 'this isn't how I would do it'. Often, complexity is necessary, but we don't realise that because we don't appreciate the constraints or requirements which led to the complexity (although note that this is not a defence of complexity, but a defence of a decision made about a trade-off where complexity is still the down side). Sometimes, complexity is discussed in a way which can seem patronising or disrespectful to users, especially beginners.

Classifying complexity

If we are to properly consider and reason about complexity, then we must understand what it is and how it impacts us.

As mentioned above, complexity affects users by increasing cognitive load. That cognitive load might mean more things to keep in our working memory, which could be due to breadth (e.g., more variables in a function, more branches in a conditional, or more possible implementations of an interface) or depth (e.g, to understand some function call we have to read its implementation, and functions it calls). The cognitive load could mean having to think harder to reason about some problem (analogous to requiring more CPU cycles), e.g., having to consider the variance of type parameters rather than being able to assume invariance. It might mean that we require more context to solve a problem, e.g., a programmer who has not encountered a new syntax having to look up the meaning, or having to understand a new convention for memory management in C.

It is sometimes tempting to consider complexity as necessary or not. I believe this is not useful. It is much more common for complexity to be part of a trade-off. If we are trading off complexity against satisfying a requirement and that requirement is considered essential, then the complexity may be necessary. But it's also possible that there is a less complex solution, or that the requirement is not truly essential.

A different framing is intrinsic vs incidental. Intrinsic complexity is complexity which is inherited from the real world or other parts of the system which are out of our control. E.g., CPUs are extremely complicated things and there are many different kinds, if a program is to run on these CPUs, then there is some intrinsic complexity involved in doing so. Incidental complexity is complexity which is a property of the system alone, e.g., the choice to differentiate String from &str in Rust is purely incidental complexity. Incidental complexity is not always bad! It often has good reasons for existing, such as in the string example (which is why I prefer 'incidental' to 'accidental'). We should note that the boundary between intrinsic and incidental is a very grey area, e.g., one might argue that the string distinction is intrinsic following the decision to differentiate owned from borrowed data in general. It is also very easy to mistake intrinsic for incidental complexity (or vice versa), especially if you don't fully understand the domain.

Another framing I find useful is core vs peripheral. Is complexity used to achieve a core goal of a project, or is it being used to achieve a less important goal? For example, consider a program for text search (like ripgrep). If the regex engine is complex, then that is core complexity because it is in the core of what the program does. On the other hand, if the code for rendering error messages is complex, that is peripheral complexity. Again, peripheral complexity isn't necessarily bad - in this example, presenting a good UI is important and some code complexity there is probably justified. However, by identifying whether complexity is core or peripheral to the project we might better reason about trade-offs.

Finally, I think it is worth examining the relationship between complexity and abstraction. Whether we are discussing programs or programming languages, this relationship is often interesting. A small and simple API may seem less complex than a large and detailed one. However, complexity which is hidden may still cause issues. If a programmer has to frequently look at the implementation, or work around the API (rather than using it as intended), then the simpler API may be more complex to use and result in more complex programs. This is not to say that abstraction is useless, but that simpler doesn't always mean better.

Similarly, if a language feature has a simple syntax, it may still introduce complexity since reasoning about it might be difficult. C pointers are a great example of this. Rust references are much more complex in terms of syntax (which might include a lifetime) and semantics (the borrow checker!), but using C pointers is more complex - either the programmer must implement their own memory safety mechanisms, or the programmer must reason in detail about memory safety, both of which are complex tasks.

The complexity of the implementation is orthogonal to the complexity of the abstraction. Some abstractions are fairly simple but have complex implementations. They may or may not be complex to use and reason about. That depends on the nature of the abstraction and how it is used.

Complexity budgets

The idea of a complexity budget is useful but often misunderstood or taken too far. I think the most important idea of a complexity budget is the idea that complexity has a cost and this should be considered when planning. Secondly, similar to money, there is the idea that complexity increases should be bounded and that we should take a global view of complexity when planning and designing. We might choose a more complex solution in one place at the expense of requiring something simpler elsewhere. In other words, not only should discussions about complexity usually be trade-off discussions, but they should often be about global trade-offs (as well as isolated ones).

Unlike money, complexity is not measurable or fungible. We cannot say that a certain amount of complexity in one area is equivalent to a certain amount in another. So at best, the concept of a complexity budget is a rough approximation.

The relationship between a complexity budget and time is also interesting. There is rarely an advantage of delaying adding complexity (c.f., delaying spending). If we want to add a complex feature to Rust (or to a software project), the additional complexity (and its impact on users) is the same whether we add the feature now or next year. To the extent that we think of complexity budgets in any concrete sense, we should think of an all-time budget, rather than an annual budget. That implies that the amount of complexity added per year should diminish over time (obviously this is unlikely to be a smooth or monotonic decrease).

Reasoning about complexity

I hope the following principals are useful for discussing and reasoning about complexity.

Precisely identify the flavour of complexity

Try to identify exactly how a new feature makes programming more complex. How does it affect cognitive load? Is the complexity intrinsic or incidental, core or peripheral, in the design or implementation?

Accept the trade-off and discuss it

Complexity is part of the trade-off for any new feature. We should accept that and frame discussion around the trade-off. After identifying the kind of complexity, we should ask if it is justified by the advantages of the feature (rather than discuss either the complexity or the advantages (or other disadvantages) in isolation: "is this feature too complex?" is a bad question).

Sometimes the answer to a question about trade-offs is that there is another solution which has most of the advantages with fewer of the disadvantages; here specifically, a solution which is less complex. This isn't always the case, but it is worth investing significant energy to be sure.

Consider our users

Complexity affects different users in different ways. I have another post in the works on considering users, but in summary: identify and consider the effects of complexity on different groups such as beginners vs intermediate vs advanced users, library authors vs consumers, users with different programming backgrounds, and users in different domains.

Doing this takes considerable effort and empathy. It might be useful to conduct surveys or experiments to better understand our users. (I hope to help teams in doing that, get in touch if you're interested!)

We must also be aware of our biases. We are likely to be like some of our users in some ways, but unlike many of them in many ways. In particular, those who are interested enough in programming languages to discuss programming language design often have experience and interests which are unlike the majority of programmers (who, mostly, just want to get things done). Anyone with an understanding of compiler implementation, type theory, or very sophisticated type systems will have ways of thinking about programming languages which are not shared with most programmers.

Consider global effects

As mentioned in the section on complexity budgets, we should consider complexity as a global issue as well as specific to the feature being designed. We should not only consider the trade-off of the complexity increase due to a new feature vs the benefit of the new feature, but we should also consider the project as a whole: is this added complexity worth more than other plans to add complexity on our roadmap? How complex is the project going to be as whole with this feature and other features we plan to add? How does this new feature interact with other features to present complexity to the user?

Consider second-order effects

We should consider how adding a feature might have indirect effects on users' perceptions of complexity. For example, if adding a feature means a different feature gets used less (or can be removed), then there might be a net negative impact on complexity. Or, if adding a feature encourages the use of more complex APIs or coding patterns, then it might increase complexity for users far beyond the complexity of the feature itself.