Houston We Have An Error

Handling errors in programs is usually boring and tedious, but for most non-trivial programs it’s pretty important. This blog discusses the various types of errors that might occur when writing programs, and how these might be handled. Examples use the C programming language.

Handling Program Errors (Part 1 of 2)

Requirements:

There are many categories of errors:

Styles Of Error Handling

As there are many sorts of errors, there are also many ways to handle them.

Ignore It

The simplest is to just ignore the problem and pretend it didn’t happen; perhaps everything will be okay anyway, or perhaps the problem is small enough to be a non-issue, or perhaps the user will just try again.

This really isn’t a solution unless it’s a trivial program, or some kind of trivial issue. However suprisingly we probably use this all the time.

Take this famous trivial C program:

#include <stdio.h>

int main(int argc, char *argv[])
{
    printf("Hello World!\n");
    return 0;
}

Lets ask ourselves the great engineering question: “What could possibly go wrong?” Let’s see:

Generally we tend to assume that simple output to the console works in our code. We don’t care if it doesn’t, that situation is so rare it’s not worth bothering about.

One problem (apart from unused arguments) with the code above is that we’ve not really been explicit about ignoring the errors from printf(). Somebody looking at our code doesn’t know if we knowingly made the “Ignore It” decision.

Back when I first started learning C in 1988, my good friend Mike Taylor pointed out two things:

So if we were being explicit our program should really read:

#include <stdio.h>

int main(int argc, char *argv[])
{
    (void)printf("Hello World!\n");
    return 0;
}

Terminate The Program

Another solution would be to terminate the program for a particular error. This is more suitable for very rare, unrecoverable errors or disasters. It also has it’s use for situations where proceeding is impossible, doesn’t make sense or might be dangerous.

Examples:

Back to our trivial example:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
    if (printf("Hello World!\n") < 0)
    {
        (void)fprintf(stderr, "Error: could not greet world!\n");
        exit(1);
    }
    return 0;
}

On line 6 we check for an error from printf(), signalled by a negative number, and if one occurs we handle it as follows:

Terminate The Thread or Actor

Most programming languages have support for Thread or Actor based models for concurrent execution of programs. Instead of terminating the entire program, we could terminate the thread instead and hopefully signal the parent thread or another actor to notice and handle the situation.

This is more suitable for resilient systems or services such as web servers, or perhaps applications that need to save user’s work such as document editors.

Lets take our simple example, and use Posix Threads to change our program to output hello world, and retry every 10 seconds if that fails until it works:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>

void *Okay = (void *)0;
void *Error = (void *)1;

void *print_message_thread(void *arg)
{
    if (printf("Hello World!\n") < 0)
        pthread_exit(Error);
    return Okay;
}

int main(int argc, char *argv[])
{
    void *status = Okay;

    do
    {
        pthread_t thread;
        int rc = pthread_create(&thread, NULL, print_message_thread, NULL);
        if (rc)
        {
            (void)fprintf(stderr, "Error: cannot create thread\n");
            exit(1);
        }

        rc = pthread_join(thread, &status);
        if (rc)
        {
            (void)fprintf(stderr, "Error: cannot join thread\n");
            exit(1);
        }
        if (status == Error)
        {
            (void)fprintf(stderr, "Error: message did not get written\n");
            (void)sleep(10);
        }
    } while (status == Error);

    return 0;
}

The complexity here would really only be worth it for resilient systems, this is total overkill for our simple program!

Things to note are:

Return The Error To The Calling Function

We have already seen functions that signal errors by returning an error state:

This technique is pretty much the standard way of dealing with errors; defer the problem to somebody else - in this case the caller of the function. The caller will hopefully deal with the problem; ignore it, terminate the program or thread, handle it, pass it on, etc.

The printf() function highlights a couple of interesting problems:

In the case of printf(), the return value is some negative integer on error. The C library (and Unix-like operating system) chooses to provide more information about the error through a global (or thread local) integer variable errno:

A more suitable return value might be an error string or even Error class that describes the problem, however we then run into the Semipredicate Problem; printf() wants to return either the number of bytes written (an int) or an error. The designers of the C library choose to use int for simplicitly.

Interestingly this design decision is highlighted more by the int fgetc(stream) function. This reads one character from a given input stream and returns it. However fgetc() needs to signal two other possibilities; there are no more characters (end of file), or there was an error reading.

The Semipredicate Problem for fgetc() is solved by:

Returning an error value is very simple to use, but the Semipredicate Problem causes all sorts of issues:

The most famous example of error returns is probably the “null” pointer or reference, which Tony Hoare called “[His] billion-dollar mistake”.

Functions which want to singal an error might return “null”, but typically using this value causes “null pointer exceptions”, possibly cause instant program terminating or at least non-local error handling or exception handlers. Painful and probably very familiar to the reader:

Document * CreateDocument()
{
    return (Document *)malloc(sizeof(Document));
}

The above program has either forgotten that malloc() returns a null pointer if it cannot allocate memory, or it’s hoping that the caller understands that the pointer can be null and will take appropriate action. Lets see:

void OpenDocument(const char *path)
{
    Stream *s = Stream::Open(path, "r");
    Document *doc = CreateDocument();
    doc->Title = s->ReadLine();
    ...
}

Oh dear, looks like line 5 has dereferenced the return value from CreateDocument() without checking for null. This code will crash badly in low memory situations (but that’s so rare, who cares right?).

The problem with “null” is that to solve the Semipredicate Problem we have introduced a value into the set of valid values for Document pointers, which is not valid. Essentially Document pointers can be both valid or invalid, but we often forget and treat them as always valid.

What’s even more fun is that if we correctly handle the null pointer in CreateDocument() and never return null, we cannot tell externally and safely use all the values without error checking. Look at line 3, can we tell if Stream::Open() returns null (in which case we have a null dereference of s on line 5), or does it handle all errors somehow and not return null? Oh dear.

The compiler or other static analysis programs might be able to determine these mistakes, but it would be nice if we could explicitly state that “CreateDocument can return a valid Document or a null pointer (or error)“. That way the caller would be forced to handle both situations, or the compiler would fail to compile the code if it did not.

Again, being explicit about error handling seems to be important. Hiding or ignoring it is asking for trouble.

On Error Return An Special Object On Which All Actions Do Nothing

The Null Object Pattern works by returning a special return value in error scenarios which produces no useful or harmful action when that value is used.

This has limited use, often because the set of actions could be too large or complex to implement specifically.

However this is easier to achieve using Object Oriented languages. You would sub-class the valid return class with a special Null variant that overloads all the base methods to do no work. Even so this has restrictions; all the methods need to be overloadable, all the methods should have a “no work” equivalent, the return values for each method may be tricky to construct or define, and there might be a huge amount of methods and functionality to replicate.

This case is useful on occasion and is worth mentioning.

Store The Error Globally And Use Auxillary Functions To Check For Errors

The printf(), fgetc() functions mentioned previously do return a value signalling that an error has occurred, but they also use global data and auxillary functions to describe the error.

The fread() and fwrite() functions from the C standard library go the next step and never return errors directly:

If an error occurs, or the end-of-file is reached, the return value is a short object count (or zero).

These handle errors and side-step the Semipredicate Problem by:

Unfortunately this solution is quite error prone, it would be very easy to forget to call the error checking functions after calling the functions. A human reader or compiler will have little knowledge to know that function XXXX might cause an error, and that function YYYY should be called to check if one has occurred. The human and compiler would have to be taught or know this through some other means.

Raise A Signal / Call An Error Handler Function

If a function detects an error, instead of terminating the program or thread or returning some error value it could instead call another function that will handle the error.

Essentially we could delegate the problem to another function (or class or error manager module) that knows what to do; it would of course have to choose the right technique to resolve the error.

For example the error handler could write an error message, log the crash, and then terminate the program. If the handler detected a resource problem it might be able to resolve that; free memory or disk space. For a transient error it might be able to wait or retry the operation.

The error handler could be itself parameterized, modularized and controllable. The application could install special error handlers to save the users work before termination for example.

Some error handler functions are caused by asynchronous errors; on Unix and other similar operating systems you can interrupt or abort a program externally (kill it, interrupt it with Control-C on the keyboard, “Force Quit” from the menu). These signals are usually implemented as a slightly more complex asynchronous function calls, and usually a little more complex due to the asynchronous nature of the function calls (they can happen at any point in the code).

Throw An Exception / Non-Local Goto

A very popular, but somewhat controversial, method to handle errors is to “Throw An Exception”. This is essentially a non-local jump up zero or more function calls to a higher level piece of code that will handle the error.

This makes sense; the caller should handle the problem, but typically the caller does not really need to handle and should just pass on the error to a function higher up which will handle the problem.

However Google “Exception” “Bad” “Harmful” and you can find opinions such as Exception Handling Considered Harmful, Exceptions Considered Harmful or Why Is Exception Handling Bad.

The problem is that exceptions jumps through code, unwinds the function call stack and arrives somewhere higher up the call chain usually without the intermediate functions knowing. This has lots of technical and philosophical issues:

One big problem is when you read other people’s code using exceptions, there is no explicit mention that errors are being handled or that exceptions are passing through the code. You therefore cannot immediately tell if errors are being correctly handled. This leads to all sorts of invalid assumptions, missed error handling, bugs, etc. This can also happen when you are writing or rewriting code that uses exceptions.

Look at this code:

void WriteDocument(const Document &doc, Stream &output)
{
    WriteDocumentHeader(doc, output);
    for (auto &section : document)
        section.Write(output);
    WriteDocumentFooter(doc, output);
}

Can you see the error handling? No. Well, sure it’s nice and clean and simple to read, but there does not seem to be any error handling at all. Lets guess that the author is diligent and check to see if each of these functions does error handling. Starting with the first call:

void WriteDocumentHeader(const Document &doc, Stream &output)
{
    output.Write("<Document>\n");
}

Hmmm, lets hope that Stream::Write() handles errors. Best look in there:

void Stream::Write(const char *str)
{
    if (fputs(m_stream, str) < 0)
        throw IOException(m_stream, strerror(errno));
}

Ah! Yes, Stream::Write() seems to be handling the low level error by raising an exception. Lets check the other 32 functions just to make sure WriteDocument() uses exceptions correctly. Actually my boss just asked me when I’ll be done, lets hope the author was dilligent. Hmmm, okay, I didn’t remember all the exceptions that were raised, lets just be cautious and catch everything and hope for the best:

bool SaveDocument(const Document &doc, const char *path)
{
    try
    {
        Stream *output = Stream::Open(path, "w");
        WriteDocument(doc, output);
        output->Close();
    }
    catch (...)
    {
        return false;
    }
    return true;
}

Oh dear so many problems; assumptions that Stream::Open() succeeds or throws an exception (does not return null), if WriteDocument() fails, will the open stream be deallocated? Is path a valid non-null pointer? Catching everything but then returning false, how will the user know what went wrong? I’m sure there’s more problems here, but this is a pretty typical result of a coding session.

Done well, exceptions seem like a good solution. Unfortunately there be many dragons, death and destruction and only valiant and very dilligent software knights will succeed. I have yet to meet one (or become one) however. Unfortunately what’s worse is we all work with people (and are ourselves) not always dilligent.

Solving The Error And Semipredicate Problems!?

Oh dear. There has to be a better way? What we really need is an explicit, easy to understand, easy to write way to deal with errors. Next week we’ll look at some new interesting programming languages and constructs that might help us solve this problem and move on to something more interesting than error handling.

Read on: Houston We Have An Error Part 2

comments powered by Disqus