Project contact: Daniel Wilkerson **** Intro We have a C++ front end, Elsa, which was written by and is maintained by Scott McPeak with a little help from me. I also wrote a Build Interceptor which when installed on your machine intercepts your gcc toolchain such that when you have build a project the .i files that went into the compiler end up in unused sections in the ELF binary. We have used Build Interceptor to get most of the .i files for Debian. Elsa is quite good: 200 million (yes million) lines of of the 260 million lines of code (before preprocessing) in Debian goes through Build Interceptor and then Elsa end-to-end. Of code that does not, it is often due to it not going through our Build Interceptor instead, or of us simply disqualifying it because of inline assembly; neither of these are failures of Elsa. I have written a whole-program dataflow analysis client of Elsa called Cqual++. It uses the new polymorphic version of the Cqual qualifier analysis of Cqual to provide a C++ version of Cqual: Cqual++. It lives in a meta-analysis project called Oink, which is actually intended to be and is structured as a whole framework for writing analyses for Elsa; Cqual++ is just the first tool in that framework. The tools of Oink are expected to be designed so that different analyses can use each other's results so that we will get a multiplying effect of possible analyses as tools are added. I maintain Oink and can facilitate that. **** Owner/Surfer analysis Scott McPeak proposed the following idea which I have modified and probably over-simplified in the re-telling of it here. Scott observes that most programmers do manual memory management (malloc()/free()) by thinking of some of the pointers in their programs as "owner" pointers and others as "non-owner" pointers. The idea is that we maintain the invariant that 1) a piece of heap data has exactly one owner at all times and 2) that all non-NULL owner pointers are themselves live data. A created object is immediately pointed to by an owner pointer and ownership can be transfered from one pointer to another by a special kind of assignment. An owner pointer is "responsible" for the heap data it points to: 1) free() is only allowed to be called on an owner pointer, and 2) the object pointed-to cannot contain non-null owner pointers at free-time. This invariant guarantees that both memory leaks and dangling references cannot occur. There is some debate as to what to call non-owner pointers; I like the metaphor suggested by "surfer": the surfer's hang out at the beach cottage, but the owner is responsible for it, only the owner can destroy it and must if he moves away, and he can sell it to one of the surfers. The neighborhood thus never has unattended derelict cottages nor arguments over who can live at a cottage. We need a boolean variable for each instance of a pointer expression that has value owner or surfer. Each pointer assignment is either an ownership transfer or it is not. We also need a NULL/non-NULL state for each variable. The constraints above could be encoded as a SAT problem. **** Helping the SAT solver with programmer annotations It is straightforward to extend the Elsa parser and we do that in Cqual++ so that people can attach qualifiers to their variables. These qualifiers say that the data returned by getenv() is not to be trusted whereas the format string to printf() must be trustable. char $tainted *getenv(const char *name); int printf(char const $untainted *fmt, ...); The programmer often knows which expressions are intended to be owners and surfers; to help the owner/surfer analysis, the programmer could work in an extended language with $owner and $surfer annotations (and probably $NULL and $non_NULL annotations). Adding a few such constraints could help the SAT solver considerably. Since they would always be optional, the programmer would not have a mandatory annotation burden. Of course it is possible to write programs that never leak nor dangle memory and yet this fact is too subtle to be proved at static time; I suggest that such programs violate the rule that "it is not enough for programs to be correct, they must be obviously correct". Given our infrastructure, this could be done with a SAT solver without too much trouble. It would be joint work with Scott and perhaps me as well. **** Note that the release tarball of Oink you will find at the oink site is very old and we have something much better that we will ship soon. Any student working on this project would be given direct access to the five CVS repositories required. References: Build Interceptor http://build-interceptor.tigris.org/ Elsa http://www.cs.berkeley.edu/~smcpeak/elkhound/sources/elsa/ Oink http://www.cs.berkeley.edu/~dsw/oink.html Cqual http://www.cs.umd.edu/~jfoster/cqual/ Mops http://www.cs.berkeley.edu/~daw/mops/