When I (George) started to write CIL I thought it was going to take two weeks. Exactly a year has passed since then and I am still fixing bugs in it. This gross underestimate was due to the fact that I thought parsing and making sense of C is simple. You probably think the same. What I did not expect was how many dark corners this language has, especially if you want to parse real-world programs such as those written for GCC or if you are more ambitious and you want to parse the Linux or Windows NT sources (both of these were written without any respect for the standard and with the expectation that compilers will be changed to accommodate the program).
The following examples were actually encountered either in real programs or are taken from the ISO C99 standard or from the GCC’s testcases. My first reaction when I saw these was: Is this C?. The second one was : What the hell does it mean?.
If you are contemplating doing program analysis for C on abstract-syntax trees then your analysis ought to be able to handle these things. Or, you can use CIL and let CIL translate them into clean C code.
int x; return x == (1 && x);
See the CIL output for this code fragment
return ((1 - sizeof(int)) >> 32);
See the CIL output for this code fragment
int x = 5; int f() { int x = 3; { extern int x; return x; } }
See the CIL output for this code fragment
int (*pf)(void); int f(void) { pf = &f; // This looks ok pf = ***f; // Dereference a function? pf(); // Invoke a function pointer? (****pf)(); // Looks strange but Ok (***************f)(); // Also Ok }
See the CIL output for this code fragment
struct { int x; struct { int y, z; } nested; } i = { .nested.y = 5, 6, .x = 1, 2 };
See the CIL output for this code fragment
typedef struct { char *key; char *value; } T1; typedef struct { long type; char *value; } T3; T1 a[] = { { "", ((char *)&((T3) {1, (char *) 1})) } }; int main() { T3 *pt3 = (T3*)a[0].value; return pt3->value; }
See the CIL output for this code fragment
return ((int []){1,2,3,4})[1];
See the CIL output for this code fragment
int foo() { static bar(); static (*pbar)() = bar; } static bar() { return 1; } static (*pbar)() = 0;
See the CIL output for this code fragment
unsigned long foo() { return (unsigned long) - 1 / 8; }See the CIL output for this code fragment
The correct interpretation is ((unsigned long) - 1) / 8, which is a relatively large number, as opposed to (unsigned long) (- 1 / 8), which is 0.
int x, y, z; return &(x ? y : z) - & (x++, x);
See the CIL output for this code fragment
extern int f(); return f() ? : -1; // Returns the result of f unless it is 0
See the CIL output for this code fragment
static void *jtab[2]; // A jump table static int doit(int x){ static int jtab_init = 0; if(!jtab_init) { // Initialize the jump table jtab[0] = &&lbl1; jtab[1] = &&lbl2; jtab_init = 1; } goto *jtab[x]; // Jump through the table lbl1: return 0; lbl2: return 1; } int main(void){ if (doit(0) != 0) exit(1); if (doit(1) != 1) exit(1); exit(0); }
See the CIL output for this code fragment
return ({goto L; 0;}) && ({L: 5;});See the CIL output for this code fragment
extern inline foo(void) { return 1; } int firstuse(void) { return foo(); } // A second, incompatible definition of foo int foo(void) { return 2; } int main() { return foo() + firstuse(); }
See the CIL output for this code fragment
The answer depends on whether the optimizations are turned on. If they are then the answer is 3 (the first definition is inlined at all occurrences until the second definition). If the optimizations are off, then the first definition is ignore (treated like a prototype) and the answer is 4.
CIL will misbehave on this example, if the optimizations are turned off (it always returns 3).
union u { int i; struct s { int i1, i2; } s; }; union u x = (union u)6; int main() { struct s y = {1, 2}; union u z = (union u)y; }See the CIL output for this code fragment
int __attribute__ ((__mode__ ( __QI__ ))) i8; int __attribute__ ((__mode__ ( __HI__ ))) i16; int __attribute__ ((__mode__ ( __SI__ ))) i32; int __attribute__ ((__mode__ ( __DI__ ))) i64;See the CIL output for this code fragment
static int bar(int x, char y) { return x + y; } //foo is considered another name for bar. int foo(int x, char y) __attribute__((alias("bar")));See the CIL output for this code fragment
This compiler has few extensions, so there is not much to say here.
return -3 >> (8 * sizeof(int));
struct { int x; struct { int y, z; struct { int u, v; }; }; } a; return a.x + a.y + a.z + a.u + a.v;
See the CIL output for this code fragment