Houston We Have An Error Part 2

There are a few new languages that provide systematic ways of handling errors, this blog post gives a general overview of these ideas.

Previously: Houston We Have An Error Part 1 discussed the various ways of handling errors.

The Semi-Predicate Problem

The Semipredicate Problem occurs when we have a function that wants to return both a valid value or failure (or status, predicate) value that denotes the function did not succeed.

The example from last week was the printf() function from the C language, which returns an int. The printf(3) manual states that:

These functions return the number of characters printed … or a negative value if an output error occurs.

The problem is that the type ‘int’ is used both for valid values and success status, although in this case positive values denote success and negative denotes failure.

Lets imagine another function int Parse(string); a function which takes some string and returns the integer value of that string. What should happen if the string is not a valid integer?

Unfortunately the function should parse any valid integer in the range support by the int type, and so there is not a spare value to use for “error”.

Function Based Solutions

Some solutions used in practice essentially add extra arguments, return values or raise exceptions:

Type Based Solutions

However lets restate our desire in English again: “Parse takes a string and returns either an integer or an error.” Lets write that a little more programmatically: Parse(string) -> int OR error.

What might really help is a type safe way to represent the type “X or Y” where X and Y are types. Sounds a bit like C or C++ unions?

Turns out there’s actually a few different sorts of Union:

Untagged Union Example

First attempt at a string to integer parser in C++ might read:

union Result
{
    int value;
    const char *error;
};

Result Parse(const char *str);

The problem with this is the caller cannot tell if the return Result stores an integer value or string error message. Typically untagged unions come with some way of telling which is stored.

Tagged Union Example

First style is emedding the union in a structure and adding a selector field:

struct Result
{
    enum { Value, Error } type;
    union
    {
        int value;
        const char *error;
    };
};

Result x = Parse("123");
if (x.type == Result::Value)
    printf("An integer %d!\n", x.value);
else
    printf("Failed to parse: %s\n", x.error);

Or if the types do not intersect by determining from the actual type. Hhere we take a lot of liberties in C++ and assume we’ve made Integer and Error classes, and that we can use dynamic_cast to determine the actual type of a pointer:

union Result
{
    void *ptr;
    Integer *value;
    Error *error;
};

Result x = Parse("123");
if (dynamic_cast<Integer *>(x.ptr))
    printf("An integer %d!\n", x->value);
else
    printf("Failed to parse: %s\n", x->error);

Set Theory Union

Some languages like Ceylon, allow the use of a set-theory like union type constructor:

Int|String Parse(string);

Here the | operator means “either the first type OR the second type”, and represents a notional super-type of both types which can be assigned values from either type and has only the common operations allowed on both types.

Let us imagine that C++ is extended to include this | syntax, and that Int and String only had the common super-type Object; this might only provide equality and type testing operations.

Therefore in some imaginary C++ hybrid language we can now write:

Int|String x = 1;
Int|String y = "hello";

if (x == y)
    printf("Same value\n");
else
    printf("Not same value\n");

That is the variables x and y can only be assigned Int or String values, but they can be compared assuming the super-type Object of both Int and String has an operator== defined to compare Object values.

If we also include the syntax:

Now we could write this in a type safe manner:

Int|String x = Parse("123");
if (x is Int)
    printf("x=%d\n", x as Int);
else
    printf("Error: %s\n", x as String);

To simplify this, lets add a special version of the if syntax: if (x as Type) A else B means “if x can be converted to Type then execute A with x narrowed to Type, otherwise execute B with x narrowed to the alternative type”. Now we can simplify this to:

Int|String x = Parse("123");
if (x as Int)
    printf("x=%d\n", x);
else
    printf("Error: %s\n", x);

Note that on line 3, x has been succesfully narrowed to have type Int and so can be printed as an integer. However if x is not of type Int it must be of type String so on line 5 we can print the error alternative.

Under the hood what actually might happen in the type deduction is a little more complex in that if (x as Type) A else B might work as follows:

Optional Types

Some languages like Ceylon and Swift also include a special syntax and short hand for optional types.

The notation for optional type like Int? is short hand for Int|Null, where Null is a special type which only has the null pointer or reference as a valid value. Almost all other types, such as String do not allow a null pointer or reference to be assigned (the exception is typically the Object, Any or Top type which is the super-class of all types including Null).

Let’s imagine we morph our C++-like language to include this syntax:

Int? Parse(String x);

Here Parse is a function which either parses the string as a number and returns the equivalent integer value, or it returns the null value (the only valid member of Null type).

Now we can write:

Int? x = Parse("123");
if (x is Int)
    printf("x=%d\n", x);
else
    printf("Error could not convert value\n");

Simplified Optional If Statements

Lets also allow if (x is T) to be shortened to just if (x) where x is some optional Type? (some languages might have exists or defined):

Int? x = Parse("123");
if (x)
    printf("x=%d\n", x);
else
    printf("Error could not convert value\n");

We can now also place the assignment into the if statement as per C++:

if (Int? x = Parse("123"))
    printf("x=%d\n", x);
else
    printf("Error could not convert value\n");

Optional Define Operator

Some languages like Groovy, Swift and Ceylon include a way to convert an optional value to a defined value.

In Groovy this is called the “Elvis Operator” and is written ?: (looks like a quiffed hair as modelled by Elvis). Here’s an example:

Int? x = Parse("123");
Int y = x ?: 0;

The expression x ?: y means “if x is not null then use x (typed without the Null) otherwise use y”. Therefore line 2 means “y is assigned the non-optional value of x, or if null the value 0”.

Optional Safe Navigation

Often functions or methods which return an optional value are chained to other methods which operate on those values. Here’s a contrived example which takes the integer 1, adds 1, then multiplies by 4 in a method chain:

Int x = 1.Add(1).Mul(4);

The value of this expression would be the Int value 8.

Can we use the Int? Parse(String) function in this chain in place of the value 1? Not directly, since it returns either an Int or Null.

Lets introduce a new syntax notation x?.Method(...) which means “if x is not null, define it to it’s non optional type and call Method on it, otherwise don’t call Method and evaluate to null”.

In other words:

x?.Method(...) == if (x) x.Method(...) else null
               == if (x is ~Null) (x as ~Null).Method(...) else null"

We can now write:

Int x = Parse("1")?.Add(1)?.Mul(4);

Notice that Parse("1")?.Add(1) has type Int? since it can return both Int from the call to the Add method or may return Null, and therefore we still need to use ?. to optionally call Mul.

Error Type

By analogy to Int? meaning Int|Null, we might introduce a new Error type and syntax for Int|Error. The short hand might be Int!.

I’m not aware of any programming languages that have this notation or syntax exactly.

The Error type would probably have a formatted message, and perhaps a stack trace to discover where the error originated. It would essentially be almost the same as exception types, though it would be passed and returned from functions as an other value. Sub-types might add extra detail where appropriate, for example IOError might describe the input/output stream that caused the error.

We would also extend the short hand forms of if, ?: and ?. to include both Null and Error types:

Int! x = Parse("123");
if (x)
    printf("x=%d\n", x);
else
    printf("%s:%d: Error: %s\n", x.GetFile(), x.GetLine(), x.GetMessage());

Other interesting properties of error might be useful:

comments powered by Disqus