KCL part 1: units

This blog post is about numeric units. Numbers in KCL are not just a number like 42 they always include units, e.g., 42mm. This is not unknown: F# has a famous and well-designed system, and more recently Swift added units. It is a bit different (and interesting) in KCL because KCL doesn't have static typing, and ease of use is a high priority.

Before we dig into things though, we need to quickly go over types in KCL. KCL is not a statically typed language. It has a dynamic type system and optional type annotations in various places (most obviously on function parameters). These types are primarily used for documentation, both for generating the actual docs and for IDE features like hovering the cursor and signature help. Types are checked dynamically where they are present (in the future, it would be good to check types statically too), dynamic checking can include coercion which can change runtime values. The type system is fairly simple: there is limited subtyping (which is mostly structural), no generics, etc.

Motivation

KCL is a language for CAD and units in CAD are important! When a design is manufactured, that is done in the real world and designs without units make no sense.

We need multiple units because different users want to use different units. This is not just because the USA (and Liberia and Myanmar) are stuck in a weirdly complex parallel world of measurement, but also because different projects work at different scales and having a 0.000 prefix or 0000 suffix on all your numbers is annoying. Reusing components is an important motivator for KCL, so interoperation of components written using different units is important.

Making units first-class in KCL is motivated by eliminating errors due to unit mismatches (again, important because of the CAD domain), eliminating precision errors (because ZMS is a programming system, not just a CAD tool, precision errors can grow due to arithmetic, etc.), and by user experience (if a user writes a number using one unit, then we want to make sure the system always uses that unit when showing that number (and numbers derived from it) to the user, whether that is in output or tooling).

As well as the requirements derived from the above motivation, there are multiple requirements for a units system, many of which are in conflict with each other and so we have trade-offs.

Using units can be boilerplate-heavy, especially in KCL programs which often have many more numeric types and literals than programs in other domains. Furthermore, although knowing the units for a design is important, for most of their work, programmers don't want to think about units. I found more than many features, the trade-off between correctness, expressiveness, and ergonomics was especially apparent with units.

One issue which I find interesting with units, more so than for most kinds of types, is that there is some intrinsic imprecision in the way that programmers use units. For example, programmers might use a single number as both a length and angle, they might use coordinates, vectors, and normalised vectors (i.e., a direction without a magnitude) without precise conversion, or they might do conversion between units 'manually' using arithmetic (which brings us to the tricky issue of how to handle π, more on that later).

On top of that, there were some considerations specific to KCL at the time. Historically KCL had units per file with some functions to convert units, but no checking of units. Lots of code existed without unit annotations in the code and we wanted to be as backwards compatible as possible. This also set expectations for the number of annotations and other syntax to be very low. Furthermore, implementation and conversion of existing code was time-constrained by the release date for ZMS 1.0.

Design

Numbers in KCL are either lengths (most commonly) or angles or unit-less. Lengths can be either metric (mm, cm, m) or imperial (inches, feet, yards); angles are either degrees or radians. Unit-less numbers are used for ratios or for counting (e.g., the length of an array or the current index when iterating an array). We don't often have areas, volumes, etc. so higher-powered units are not supported. Being perfectly expressive is a non-goal - there are occasions where a programmer might want to express a weight or some complex unit (e.g., a velocity) and these are also not supported. Furthermore, even when using the supported units, perfectly tracking units through arithmetic is a non-goal. In part, that's just because it is hard and there are quickly diminishing returns in benefits, but also because it seems like when the theoretical units get complicated is often where programmers don't want precise unit tracking (consider matrix and vector math, for example).

Like types, units annotations are optional (however, unlike types, sometimes units annotations are required to avoid errors). Numeric literals can take a unit as a suffix, e.g., 42mm or 90deg (_ is used as a suffix for unit-less numbers). If a suffix is not used (e.g., 42), the numeric value has 'default' units. Each file has a default length and angle unit (which can optionally be set with a file-scoped settings annotation, e.g., @settings(defaultLengthUnit = in)). A number with default units could be either a length, angle, or unit-less and KCL tracks the default unit for each possibility.

Numeric types can include a unit, e.g., number(mm). Where there is no unit (number), the type has 'default' units (inherited from the file using the setting if present). More recently, we allowed use of units without the number part of the type, e.g., mm. These numeric types can also be used with type ascription (e.g., (4 * x): mm) which asserts that the value of the sub-expression has numeric type and sets the units of the value to those specified. We also have 'partial' units: number(Length) and number(Angle) which are used when a function can accept any length or angle (exactly how these work in some edge cases is currently under-specified, but my preference is that they should be syntactic sugar for universal quantification using a single variable at function scope, so for example, a function fn foo(x: number(Length)): number(Length) would always return the same units as its argument).

The units of values are tracked as the program is evaluated. As well as the fully concrete units (derived from literal suffixes), the partial units, and the default units, the interpreter has concepts of unknown units (where the interpreter cannot compute a type for a value, e.g., 4mm * 2mm has unknown units since the type system has no concept of mm^2) and 'any' units (for a value which could take any units, only used temporarily to implement type ascription).

Where possible, units are implicitly converted, taking defaults into account. E.g., 4mm + 2in will evaluate to 54.8mm, 4mm + 2 would give 6mm if the current default is mm or 54.8mm if the current default is in. If we have 4 + 2 where the two values have the same defaults, then the result is 6 with those defaults, but if the defaults are different, then the units are unknown. Similar rules are applied when passing arguments, etc.

Using a number with unknown units is an error (for some definition of 'using' - it would be a bit obnoxious to give a separate error for every sub-expression with unknown units, etc.). Such errors require the user to specify the type using ascription. Type ascription does not do any conversion, so 2in: mm is 2mm, not 54.8mm.

There are also functions in the standard library for explicit conversion between units.

Evaluation

I think the overall system works pretty well. The syntactic burden is pretty low, most of the time it feels like it just works without too much friction, and it is mostly correct, catching many kinds of units-related bugs in programs. It is not perfect: there is a bit of an ergonomic cliff - when you do need to specify the units because of the limits of the system, it feels a bit clunky; it would be nice if the system were more expressive and could track units more often; explaining and teaching the system is a bit more complicated than I would like; and because of some of the fuzziness around defaults, it is still possible to write some units bugs. However, I think it has ended up at a good point in the trade-off between correctness and ergonomics.

The syntactic overhead is kept low by the use of default units (and per-file defaults) and the relatively lightweight syntax (e.g., suffixes for literals, units as types). Default units also help a lot with backwards compatibility. The system is also ergonomic due to implicit conversion of units in most circumstances - programmers rarely have to think about units beyond choosing units for a file or (sometimes) values; the system Just Works. This facilitates easy reuse of KCL components. The system is expressive enough for most practical uses, including nearly all of the standard library, due to having fully specified and partial unit types (with the universal quantification extension, it satisfies all expressiveness requirements of the standard library, but that is not totally trivial to support). The system was also implemented in time for the 1.0 release and can probably be extended without significant breaking changes due to the handling of unknown units (and which also allows for an 'escape hatch' when the system is not expressive enough).

I think it is worth examining why there is sometimes friction with the system. The context is important: KCL is by design a very low-friction language, there are few type errors and it is designed so that the right thing is usually either obvious or the only thing which can be done. Units are are a pretty novel concept, so if users have experience from other languages, they are used to following rules around types, but not units (I think this is somewhat analogous to the borrow checker in Rust - it can be frustrating for new programmers because they are used to doing whatever they want around ownership rather than following rules). Unit errors are also rare, most of the time units work without the programmer thinking about them. So, when a unit error does occur it is surprising and therefore frustrating. The manner of errors is also frustrating - KCL will tell you that the units are unknown and that this is an error, but not why they are unknown, why that is a problem, or how to fix it (I believe this could be improved, but it is not trivial). Furthermore, often when these errors occur, the code is fine and the system is not good enough to know that, rather than it being a bug in the code, and to a programmer the code can seem obviously correct. The fix (usually adding type ascription or sometimes a units suffix) feels bureaucratic rather than truly improving the quality of the code.

One corner of the system which doesn't feel right is the PI constant. In an ideal world, PI would be a number(_), i.e., a unit-less/ratio number. However, in KCL (especially before the introduction of the units system) the majority of uses of PI was to convert between degrees and radians. If PI is typed as a ratio, then this introduces a silent error: after the conversion a number in radians is treated as a number in degrees or vice versa. This was the major source of false positives in the system. The solution is to make PI always have unknown units. This is a bad solution because it meant making unknown units expressible in the surface syntax and it doesn't fit with the mathematical concept of pi. It also means that any use of PI requires type annotation. This solution did avoid the correctness issues around manual conversion, but it is uncomfortable. I'm not sure what a better solution is, perhaps special handling for PI and TAU, but I don't know exactly how that should work.

In terms of future work or changes I would make in retrospect, I think the following areas could be improved:

Support areas and volumes in a limited way. Supporting units which are a power of a single unit seems a fairly straightforward extensions (e.g., mm^2, m^3, deg^-1, etc.), much more so than arbitrary combinations of units.
Disallow default angles, i.e., a number can only be a length or unit-less by default, angle units must always be explicit. This would simplify the system for users and its implementation, and have an acceptable cost in syntactic overhead; it may even improve the readability of code. I implemented this as a warning recently, and I hope it can become the way KCL always works in the future.
No implicit conversion of default units. I.e., units are only converted if they are known with certainty. I think this makes the system more reliable, however, currently it is impractical. I think that over time, type annotations (including units) will become more common in KCL (especially if KCL gets static typing) and if that happens to a large enough degree then default units will be rare enough that this idea becomes practical (we don't need to eliminate use of defaults entirely, if we have known units for function parameters (but not values), I think that is enough).
Better error messages and better handling of PI and TAU as described above.

All in all, I think that the units system in KCL is good. It is a correct and mostly ergonomic solution to the significant problem of mixing components with different units. It also brings other correctness benefits, and has fairly low overhead/cost. The implementation is non-trivial, but not complex by type systems standards.

I currently have availability for Rust coaching, adoption, or development; from a single call to ongoing 3 days/week. I can help your team get things done, adopt Rust and use it more effectively, or to accurately evaluate Rust as a new technology.

If you're adopting Rust, I can help make that a success with advice, 1:1 or group mentoring, design and code review, or online support. Coaching.

If you're building with Rust and need a short or medium-term boost, I can join your team, quickly get up to speed, and deliver value. I have expertise with async and unsafe code, database implementation, distributed systems, dev tools, and language implementation. Consulting.