We are developing a suite of modules that use CIL for program analyses and transformations that we have found useful. You can use these modules directly on your code, or generally as inspiration for writing similar modules. A particularly big and complex application written on top of CIL is CCured (../ccured/index.html).
The Cil.stmt datatype includes fields for intraprocedural control-flow information: the predecessor and successor statements of the current statement. This information is not computed by default. If you want to use the control-flow graph, or any of the extensions in this section that require it, you have to explicitly ask CIL to compute the CFG using one of these two methods:
The best way to compute the CFG is with the CFG module. Just invoke Cfg.computeFileCFG on your file. The Cfg API describes the rest of actions you can take with this module, including computing the CFG for one function at a time, or printing the CFG in dot form.
CIL can reduce high-level C control-flow constructs like switch and continue to lower-level gotos. This completely eliminates some possible classes of statements from the program and may make the result easier to analyze (e.g., it simplifies data-flow analysis).
You can invoke this transformation on the command line with --domakeCFG or programatically with Cil.prepareCFG. After calling Cil.prepareCFG, you can use Cil.computeCFGInfo to compute the CFG information and find the successor and predecessor of each statement.
For a concrete example, you can see how cilly --domakeCFG transforms the following code (note the fall-through in case 1):
int foo (int predicate) { int x = 0; switch (predicate) { case 0: return 111; case 1: x = x + 1; case 2: return (x+3); case 3: break; default: return 222; } return 333; }
See the CIL output for this code fragment
The Dataflow module (click for the ocamldoc) contains a parameterized framework for forward and backward data flow analyses. You provide the transfer functions and this module does the analysis. You must compute control-flow information (Section 8.1) before invoking the Dataflow module.
The file ext/inliner.ml contains a function inliner.
The module Dominators contains the computation of immediate dominators. It uses the Dataflow module.
The module ptranal.ml contains two interprocedural points-to analyses for CIL: Olf and Golf. Olf is the default. (Switching from olf.ml to golf.ml requires a change in Ptranal and a recompiling cilly.)
The analyses have the following characteristics:
The analysis itself is factored into two components: Ptranal, which walks over the CIL file and generates constraints, and Olf or Golf, which solve the constraints. The analysis is invoked with the function Ptranal.analyze_file: Cil.file -> unit. This function builds the points-to graph for the CIL file and stores it internally. There is currently no facility for clearing internal state, so Ptranal.analyze_file should only be called once.
The constructed points-to graph supports several kinds of queries, including alias queries (may two expressions be aliased?) and points-to queries (to what set of locations may an expression point?).
The main interface with the alias analysis is as follows:
The precision of the analysis can be customized by changing the values of several flags:
In practice, the best precision/efficiency tradeoff is achieved by setting
Ptranal.no_sub = false Ptranal.analyze_mono = true Ptranal.smart_aliases = false
These are the default values of the flags.
There are also a few flags that can be used to inspect or serialize the results of the analysis.
If Ptranal.analyze_mono and Ptranal.no_sub are both true, this output is sufficient to reconstruct the points-to graph. One nice feature is that there is a pretty printer for recursive types, so the print routine does not loop.
The module heapify.ml contains a transformation similar to the one described in “StackGuard: Automatic Adaptive Detection and Prevention of Buffer-Overflow Attacks”, Proceedings of the 7th USENIX Security Conference. In essence it modifies the program to maintain a separate stack for return addresses. Even if a buffer overrun attack occurs the actual correct return address will be taken from the special stack.
Although it does work, this CIL module is provided mainly as an example of how to perform a simple source-to-source program analysis and transformation. As an optimization only functions that contain a dangerous local array make use of the special return address stack.
For a concrete example, you can see how cilly --dostackGuard transforms the following dangerous code:
int dangerous() { char array[10]; scanf("%s",array); // possible buffer overrun! } int main () { return dangerous(); }
See the CIL output for this code fragment
The module heapify.ml also contains a transformation that moves all dangerous local arrays to the heap. This also prevents a number of buffer overruns.
For a concrete example, you can see how cilly --doheapify transforms the following dangerous code:
int dangerous() { char array[10]; scanf("%s",array); // possible buffer overrun! } int main () { return dangerous(); }
See the CIL output for this code fragment
The module oneret.ml contains a transformation the ensures that all function bodies have at most one return statement. This simplifies a number of analyses by providing a canonical exit-point.
For a concrete example, you can see how cilly --dooneRet transforms the following code:
int foo (int predicate) { if (predicate <= 0) { return 1; } else { if (predicate > 5) return 2; return 3; } }
See the CIL output for this code fragment
The partial.ml module provides a simple interprocedural partial evaluation and constant folding data-flow analysis and transformation. This transformation always requires the --domakeCFG option. It performs:
Several commandline options control the behavior of the feature.
For a concrete example, you can see how cilly --domakeCFG --dopartial transforms the following code (note the eliminated if-branch and the partial optimization of foo):
int foo(int x, int y) { int unknown; if (unknown) return y + 2; return x + 3; } int bar(void) { return -1; } int main(void) { int a, b, c; a = foo(5, 7) + foo(6, 7) + bar(); b = 4; c = b * b; if (b > c) return b - c; else return b + c; }
See the CIL output for this code fragment
The reachingdefs.ml module uses the dataflow framework and CFG information to calculate the definitions that reach each statement. After computing the CFG (Section 8.1) and calling computeRDs on a function declaration, ReachingDef.stmtStartData will contain a mapping from statement IDs to data about which definitions reach each statement. In particular, it is a mapping from statement IDs to a triple the first two members of which are used internally. The third member is a mapping from variable IDs to Sets of integer options. If the set contains Some(i), then the definition of that variable with ID i reaches that statement. If the set contains None, then there is a path to that statement on which there is no definition of that variable. Also, if the variable ID is unmapped at a statement, then no definition of that variable reaches that statement.
To summarize, reachingdefs.ml has the following interface:
The availexps.ml module uses the dataflow framework and CFG information to calculate something similar to a traditional available expressions analysis. After computeAEs is called following a CFG calculation (Section 8.1), AvailableExps.stmtStartData will contain a mapping from statement IDs to data about what expressions are available at that statement. The data for each statement is a mapping for each variable ID to the whole expression available at that point(in the traditional sense) which the variable was last defined to be. So, this differs from a traditional available expressions analysis in that only whole expressions from a variable definition are considered rather than all expressions.
The interface is as follows:
The liveness.ml module uses the dataflow framework and CFG information to calculate which variables are live at each program point. After computeLiveness is called following a CFG calculation (Section 8.1), LiveFlow.stmtStartData will contain a mapping for each statement ID to a set of varinfos for varialbes live at that program point.
The interface is as follows:
Also included in this module is a command line interface that will cause liveness data to be printed to standard out for a particular function or label.
The module deadcodeelim.ml uses the reaching definitions analysis to eliminate assignment instructions whose results are not used. The interface is as follows:
The simplemem.ml module allows CIL lvalues that contain memory accesses to be even futher simplified via the introduction of well-typed temporaries. After this transformation all lvalues involve at most one memory reference.
For a concrete example, you can see how cilly --dosimpleMem transforms the following code:
int main () { int ***three; int **two; ***three = **two; }
See the CIL output for this code fragment
The simplify.ml module further reduces the complexity of program expressions and gives you a form of three-address code. After this transformation all expressions will adhere to the following grammar:
basic::= Const _ Addrof(Var v, NoOffset) StartOf(Var v, NoOffset) Lval(Var v, off), where v is a variable whose address is not taken and off contains only "basic" exp::= basic Lval(Mem basic, NoOffset) BinOp(bop, basic, basic) UnOp(uop, basic) CastE(t, basic) lval ::= Mem basic, NoOffset Var v, off, where v is a variable whose address is not taken and off contains only "basic"
In addition, all sizeof and alignof forms are turned into constants. Accesses to arrays and variables whose address is taken are turned into "Mem" accesses. All field and index computations are turned into address arithmetic.
For a concrete example, you can see how cilly --dosimplify transforms the following code:
int main() { struct mystruct { int a; int b; } m; int local; int arr[3]; int *ptr; ptr = &local; m.a = local + sizeof(m) + arr[2]; return m.a; }
See the CIL output for this code fragment
The module canonicalize.ml performs several transformations to correct differences between C and C++, so that the output is (hopefully) valid C++ code. This may be incomplete — certain fixes which are necessary for some programs are not yet implemented.
Using the --doCanonicalize option with CIL will perform the following changes to your program:
The llvm.ml module generates LLVM (“Low Level Virtual Machine”, http://llvm.org) code from a CIL file. The current version only targets 32-bit x86 code (as of version 2.5, LLVM’s 64-bit x86 support is still incomplete), and has a few significant limitations:
LLVM support is enabled by running configure with the --with-llvm option before compiling CIL. You can then convert C code to LLVM assembly with cilly --dollvm somefile.c. Or you can call Llvm.generate on a CIL file to get LLVM assembly as a doc string.