How Many Types Is Too Many?


Position: Type annotations are ultimately undesirable when working with sufficiently-complex type systems, but there is no better solution.

There is a point where a formally-verified compiler becomes indistinguishable from an automated theorem prover. Such a compiler can deduce a lot of information from a given language. Take, for example, a hypothetical, C-like language:

int array_nth(arr, n) {  return arr[n];}

A compiler can deduce all of the following from the above function definition, if it assumes that array_nth never causes an out-of-bounds:

  • arr is an array of integers, of some size. Let’s represent its type as int[Size], where Size is an unsigned integer.
  • n is an integer (let’s just assume it’s unsigned)
  • n >= 0 and n < Size

So, a full declaration of array_nth might look like:

// Dependent types? *groan*int array_nth(arr: int[Size], n: int32) :: n >= 0 && n < Size;

This is great and all, but it’s really long. There are a few different ways the typing of array_nth can be expressed in code:

  • The developer must type out the entire signature. The compiler will raise an error if the written signature is inconsistent with the inferred signature.
  • The developer can leave out all written signatures; the compiler will infer them instead.
  • The developer can write out as much, or as little, of the signature as they desire. The compiler will infer the full signature, and raise errors if the partial signature is logically inconsistent with the actual signature. This is most common, and such a middle-of-the-road solution is likely to piss off as few people as possible (see: OCaml, which is very good at inferring types).

But none of these solutions are perfect. Ideally, you want to maximize both readability, and developer experience, but none of these approaches fulfills this requirement.

For another example, which isn’t even all that extreme, take an example of a function that must neither cause an out-of-bounds, nor produce an integer overflow:

// Declarationint array_nth(arr: int[Size], x: int32, y: int32) :: (x + y) >= 0 && (x + y) < Size;// Definitionint array_nth_of_sum(arr, x, y) {  return arr[x + y];}

None of the options is optimal:

  • Typing out the full signature is a PITA.
  • Omitting the entire signature means you have to do all the type inference magic in your head, which is a PITA for longer, more complex functions.
  • Writing only part of the signature combines the downsides of both of the above approaches.

So what can we do? I really don’t know. I don’t think that there is really a good solution. To mathematically prove something, you must understand all assumptions it makes, as well as all constraints it applies to its inputs. No matter whether it’s the compiler that must ultimately understand it, or the human, either way, all conditions must be clearly defined eventually.

After reading this, I highly recommend reading Gilad Bracha’s Pluggable Type Systems (https://bracha.org/pluggableTypesPosition.pdf). It’s actually a 16-year-old paper now, but I think it offers a possible solution that might be better than what’s mainstream.

What if you could configure which aspects of a program the compiler would verify? It’s clear to see, for example, that requiring overflow to be statically accounted for not only makes the compiler’s job harder, but also requires more type information to be passed around. Unless you’re writing software for airplanes, you could argue that there are certain static guarantees you don’t need. You could even gradually incorporate more static checks as your application grew bigger, instead of having to account for it from day one.

There are many compilers that apply more rigorous checks to code than mainstream compilers do, yet those compilers, and their associated languages, see very little production use. Advanced type systems are very exciting, but also require users to have a deep understanding of type theory to be used to their fullest potential. There is a point when super-fancy type systems become more of a burden than actually writing correct code. The problem, though, is that there may not be a better solution.