Wednesday, March 28, 2007

One of the hardest bugs to catch in C: Integer overflow

Here's some simple C code implements an array that can be accessed with some functions. (There are some casts I left out for clarity. They should be (unsigned char *).)







static void **array, **end;


int allocate (unsigned int size) ...


int deallocate () ...



int write (unsigned int index, void *data)

{

if (array + index < end) {

array[index] = data;

return 1;

} else {

return 0;

}

}



It does what it should: make a bounds check before writing to memory, and returning an error when the index is out of bounds.

Here's the bug: what the index + the beginning of the array overflows? The sum will overflow, resulting in a modulo equivalent value that's before the beginning of the array.

This type of bug is called an integer overflow. When writing code, sometimes the most legible order for expressions isn't the safest. The best way to prevent bugs like this is to write it out as it is above, then ask yourself what you know about the values of those numbers, and how you can insure the result will be within a certain range. Since end will always be greater or equal to array, using simple algebra, you can reorder the inequality to read (index < end - array). It might not be as easy to read, but since the end of the array is always after the beginning, i.e. greater, this code, which algebraically equivalent, won't have the same vulnerability.

For those who say this example is contrived, I have encountered real world code that requires logic like this. True, this example could just store size, but in some cases, perhaps where the elements in the array are of a dynamic size, the size can vary, so the end of the array is needed.

Remember those times in math when kids asked when they'd need to know about algebraic inequalities and modulo arithmetic? Yes, programming doesn't require lots of math, but it requires some. On one hand, writing C requires enough knowledge in computer architecture to know about integer overflows, and how they're not only good, they're needed, while on the other, it requires the mathematical background to understand and fix these problems.

No comments: