Monday, April 12, 2010

C Is Not Assembly

In my last article I dug down into some of the joys of undefined pointer behavior in C. But I did it in the context of an architecture that most developers aren't too likely to ever see again, the Intel 8086. I wanted to show that this stuff matters even with more mainstream architectures because compilers are free to do a lot of non-obvious things. C is not assembly.

The United States Computer Emergency Readiness Team (US-CERT) "is charged with providing response support and defense against cyber attacks for the Federal Civil Executive Branch (.gov) and information sharing and collaboration with state and local government, industry and international partners."

With a U.S. Department of Homeland Security badge on their page you know they're serious. When you overflow your buffers the terrorists win. So you'd think they'd take C seriously.

I found a real WTF gem caused by programmers treating C like assembly. Vulnerability Note VU#162289

"In the C language, given the following types:

        char *buf;
        int len;

some C compilers will assume that buf+len >= buf. As a result, code that performs wrapping checks similar to the following:

len = 1<<30;
[...]
if(buf+len < buf)  /* wrap check */
  [...overflow occurred...]

are optimized out by these compilers; no object code to perform the check will appear in the resulting executable program. In the case where the wrap test expression is optimized out, a subsequent manipulation of len could cause an overflow. As a result, applications that perform such checks may be vulnerable to buffer overflows."

The advisory is careful to admit that compilers are free to do just that. Here's why: greatly simplified, the C standard says a pointer must point at a valid object or just past the end. Any pointer arithmetic that might cause a pointer to step outside those bounds yields undefined behavior. Now, there's some code missing from their snippet, but I have to assume that the compiler knows that buf points at the beginning of something and not into the middle. So by definition either buf + len >= buf or the program is free to do anything up to and including launching shoulder mounted kitten missiles at Capitol Hill.

Still, there is a WTF to lay on the compiler writer here. If a programmer writes an "if" test then presumably he or she had some reason to believe that sometimes the test might be true. Before optimizing away the conditional the compiler really should have issued a warning.

In order for this code to have any hope of working a few assumptions must hold: sizeof(int) <= sizeof(char *), overflow must happen "silently", etc. But there's another major assumption here: the buffer pointed to by buf must be located at the end of its address space. With a check like this, if there are any objects located higher in the same overflow segment then those objects are getting some kitten missiles. So another WTF is a developer making an assumption about how a compiler works in the face of undefined code.

Now, there are a few scenarios where all these assumptions might be justified. For instance, if you're targeting some special purpose embedded device then memory layout might be well understood. In such situations, the optimization performed by the compiler might be shocking indeed, even if technically permissible.

The problem is that the developer is thinking at the assembly code level but the C standard says the compiler doesn't have to "think" the same way. In assembly the distance between what you write and the object code generated is pretty small (barring some kind of complicated macro magic). In C the distance is quite a bit larger.

Repeat after me, C is not assembly.

The Solution?

In my mind the real WTF is the US-CERT's proposed solution.

"Cast objects of type char* to uintptr_t before comparison. The faulty wrapping check listed above would be written

#include <stdint.h>
[...]
if((uintptr_t)buf+len < (uintptr_t)buf)
   [...]

Alternatively, developers can use size_t on platforms that do not provide the uintptr_t type."

Yes, that's right, the proposed solution is to replace one set of undefined behavior with another. Well...WTF? Didn't we get here because the compiler was free to do anything in the face of undefined semantics?

C is not assembly. Not even when you do some casting. Pointers might be segmented in such a way that overflow behavior for pointers is totally different from that of integral types. Or sizeof(int) > sizeof(uintptr_t). Or the cast to an integral type could do anything at all like invert the bits. Or shoulder mounted kitten missiles. Whatever.

The Real Solution

The best solution is "don't do that." Give the buffer a size and compare len to that size. The assumption that the buffer is the last object in its address space is just playing with fire. But I'll assume they really did need to use memory at the end of the address space. Fine. What's the solution?

It's this: you can't say "end of address space" portably in C so tread carefully when you use undefined behavior. Minimize the amount of undefined behavior you depend on. Pull the non-portable stuff out into separate compilation units and document that the assumptions. Maybe even write the non-portable stuff in assembly where you know what's going to happen, but at least write a unit test that ensures that the compiler behaves the way you think it does. And always, always remember that C is not assembly.

11 comments:

Anson MacKeracher said...

A wonderful write-up! Thanks!

Unknown said...

It'd be useful to know which C compilers make such assumptions like (buf + len >= len).

Anonymous said...

@agit8d: Read the Vulnerability Note. It's fairly short, and it has all the info you're asking for.

Michael Stone said...

CERT's INT30-C guidance explains why casting to unsigned integral types helps: namely, that addition of unsigned integral types is defined for all inputs by the (C99) standard. uintptr_t is the logical type to use for the comparison because it is, by definition, appropriately sized for pointers. Finally, for another excellent writeup of how to deal with integer overflows, see "Catching Integer Overflows in C".

Anonymous said...

The if statements are removed because they might come from inline functions, or from macros; for example it may originally have been

if (buf + len >= buf + start)

where start is a parameter that is zero in this particular inlined call. It'd be very hard for the compiler to decide where to stop optimizing.

James Iry said...

Michael,

You're making an implicit assumption that the overflow behavior of integral types is the same as that of pointers. But see my last article for an architecture where that is definitely not true.

7vies said...

Pointer math is evil... I would just check that len is valid, i.e. within [0,size).

BCSd said...

Regarding the assumption about the buffer being at the end of the address space: that assumption is only needed if you assume that that the original code comment is wrong. That is, if the intent of the code really is to check for a wrap/numeric overflow (not that I can think of a good reason to do that), then the location of the buffer is irrelevant.

Patrick said...

You're not a compiler writer, are you? :-)

If compilers were to assume every construct they see was put there intentionally, and warn when it was removed, you'd see thousands of useless warnings for every compile.

The if statement in this case could easily have come from preprocessing, or inlining, or a prior optimization. Trust me, you do not want your optimizing compiler to hold off on these kinds of transformations on your behalf in the quest for safety.

@agit8d: it would not be useful to know which C compilers make such assumptions, because the list would inevitably grow as optimizers get more and more powerful.

Josh Carter said...

You had me at "shoulder mounted kitten missile."

Jason M said...

An even simpler "C is Not Assembly" bug:

http://isc.sans.org/diary.html?storyid=6820