The Case for Application Specific Benchmarking

Compiler-directed program fault coverage for highly available Internet services

C. Fu, R. Martin et al. Rutgers Univ.
[PDF]

Summary by
AF

One-line summary:

Use compiler analysis on Java bytecode to determine (a) what faults to inject; (b) what % of eligible catch handlers are actually exercised as a result.

Overview/Main Points

Use compiler-like analysis to determine where to insert faults, and what kind, in Java inet apps.

Main approach differences from AFPI:

We induce exceptions by forcing the exception to be thrown by modifying the EJB container. They inject faults, and monitor whether the faults immediately manifest as Java exceptions or remain latent (eg may cause a later exception somewhere else).
They explicitly construct a table of mappings: "If fault type F occurs during operation O, the result is Java exception E." Quoting from the paper: "Unfortunately, the construction and use of this table are complicated by the layers of software between the hardware and the app... these layers can have a dramatic impact on the way in which low-level faults translate to exceptions at the program level." Two examples they provide:
1. A fault can be masked: if a HW error occurs on the link, TCP timeout and retransmit will compeltely mask that failure, so no app-level exception corresponding to the link error will be seen.
2. Latent fault: suppose a disk read fault occurs, but due to input buffering, the faulted block is not consumed by the app till much later; the exception will occur when read() finally consumes the (corrupted) buffered block.
I believe that with our approach of only considering the app-level excpetions that do occur, we don't have to deal w/this issue.
They inject faults on particular instruction boundaries: they analyze the bytecode for specific operations that are vulnerable to certain faults (eg, a socket read() call is vulenrable to NIC_DOWN, NET_EPIPE, etc.)

The most marked difference is that they rely on static analysis for two things:

determining where to inject faults. For a given JNI routine or Java method that raises some exception but doesn't handle it, they go up the (static) call tree to find the nearest exception handler for it (ie the one that actually would be used if the exception happened at runtime). Compare to our approach: we inject the exception, then examine the stack trace to see which handler actually was used.
Determining which data objects may be used as arguments to an operation that could cause an exception. For example, in a read() call, it matters whether the argument of the read is a FileInputStream or a NetworkInputStream (eg) because that determines which kinds of low-level faults are reasonable to inject. To figure this out, they use a form of static analysis called points-to analysis, which approximates this set of objects.

The kicker is that their prototype is still under construction (the preliminary results in the paper are based on hand-simulating their technique, ie manually determining which faults to inject & where and then telling Mendosus to do it). They admit that they are investigating various possible "approximate" represesntations of the call tree (from the literature) since the full call tree can be exponential in the size of the program.

Other points in the paper

They point out that the term coverage already means something different in dependability than it does in formal verification. In dependability, coverage is defined as: "Given that a particular fault occurs, does the system handle it correctly?" (ie, "100% coverage"== we tried injecting every possible kind of fault to confirm that the system will handle them properly) In testing/verification, it refers to what fraction of possible code paths have been executed.
Their proposed metric is fault-catch coverage: with respect to a particular test run and a particular catch() block, of the possible faults that could trigger this catch() block, what fraction of them did we actually inject?
There is a reference for the statement that rarely-executed code exhibits a higher failure rate than frequently-executed code. I added it to all.bib as hecht:rare. [Hecht & Crane, Rare Conditions and their effect on SW failures, Proc. Annual Reliab. and Maintainability Symp., Anaheim CA, 1/94]

Comments from our discussion

Back to index

Summaries may be used for non-commercial purposes only, provided the summary's author and origin are acknowledged. For all other uses, please contact us.