Proof-Carrying Code

Note: This page is out of date

Overview
Advantages over related techniques
Technical difficulties to overcome
Implementation
Experiments
Annotated bibliography
Online demo (as part of the Touchstone compiler demo)
See also: the Touchstone compiler page, my home page
A translation of this page to Czech by Barbora Lebedova.

Overview

Proof-Carrying Code (PCC) is a technique that can be used for safe execution of untrusted code. In a typical instance of PCC, a code receiver establishes a set of safety rules that guarantee safe behavior of programs, and the code producer creates a formal safety proof that proves, for the untrusted code, adherence to the safety rules. Then, the receiver is able to use a simple and fast proof validator to check, with certainty, that the proof is valid and hence the untrusted code is safe to execute.

PCC has many uses in systems whose trusted computing base is dynamic, either because of mobile code or because or regular bug fixes or updates. Examples include, but are not limited, to extensible operating systems, Internet browsers capable of downloading code, active network nodes and safety-critical embedded controllers.

Advantages over related techniques

Although there might be a large amount of effort in establishing and formally proving the safety of the untrusted code, almost the entire burden of doing this is on the code producer. The code consumer, on the other hand, has only to perform a fast, simple, and easy-to-trust proof-checking process. The trustworthiness of the proof-checker is an important advantage over approaches that involve the use of complex compilers or interpreters in the code consumer.
The code consumer does not care how the proofs are constructed. In our current experiments, we rely on a theorem prover, but in general there is no reason (except the effort required) that the proofs could not be generated by hand. No matter how the proofs are generated, there is also an important advantage that the consumer does not have to trust the proof-generation process.
PCC programs are ``tamperproof,'' in the sense that any modification (either accidental or malicious) will result in one of three possible outcomes: (1) the proof will no longer be valid (that is, it will no longer typecheck), and so the program will be rejected, (2) the proof will be valid, but will not be a safety proof for the program, and so again the program will be rejected, or (3) the proof will still be valid and will still be a safety proof for the program, despite the modifications. In the third case, even though the behavior of the program might have been changed, the guarantee of safety still holds.
No cryptography or trusted third parties are required because PCC is checking intrinsic properties of the code and not its origin. In this sense, PCC programs are ``self-certifying.'' On the other hand, PCC is completely compatible with other approaches to mobile-code security. For example, in an essay, we discuss how trust management and PCC can be used together for mobile code security. We also have some experience in using PCC to determine the correctness of applying Software Fault Isolation to network packet filters. In engineering terms, combining approaches leads to different tradeoffs (e.g., less effort required in proof generation at the cost of slower run-time performance) that lead to greater system design flexibility.
Because the untrusted code is verified statically, before being executed, we not only save execution time but we detect potentially hazardous operations early, thus avoiding the situations when the code consumer must kill the untrusted process after it has acquired resources or modified state.

These five advantages are essentially statements about the advantage of static checking over dynamic checking. We believe that static checking is essential for mobile-code security, and that system designers in general have a somewhat limited view of how static checking can be used.

Technical difficulties to overcome

The idea of Proof-Carrying Code is easy to comprehend but much more difficult to implement efficiently. Here are some obstacles that we had to overcome:

How to encode the formal proof? Trivial encoding of proofs of properties of programs are very large.
How to check the proof? This is not an easy task if you want your proofs to be terse and the checker to be small, fast, and mostly-independent of the actual safety policy that is being enforced.
How to relate the proof with the program? It is of no use to validate the proof if we cannot ensure that it says something about the program at hand.

Next section describes briefly our solutions to the above problems and gives pointers to papers containing more implementation details.

Implementation

A typical implementation of PCC contains the following five elements:

A formal specification language used to express the safety policy. We currently use first-order predicate logic extended with predicates for type safety and memory safety.
A formal semantics of the language used by the untrusted code, usually in the form of a logic relating programs to specifications. In our current implementation the untrusted code is in the form of Imachine code (the DEC Alpha, Intel x86, Strong ARM and Motorola 68000 are supported). As a means of relating machine code to specifications we use a form of Floyd's verification-condition generator that extracts the safety properties of a machine code program as a predicate in first-order logic. This predicate must then be proved by the code producer using axioms and inference rules supplied by the code consumer as part of the safety policy.
A language used to express the proofs. In the past we used a variant of Edinburgh Logical Framework (LF), which is essentially a typed lambda calculus, to encode and check the proofs. LF is suitable for proof encoding because it is general and can easily encode a wide variety of logics, including higher-order logics. Of particular benefit is the higher-order nature of LF, which allows for a natural representation of common features of logics, such as quantifiers and hypothetical and parametric judgments.
One drawback of LF representation of proofs is that they are usually large due to a significant amount of redundancy. To address this problem we are currently using a much more compact representation of proofs in the form of oracles that guide a simple non-deterministic theorem prover in verifying the existence of the proof.
An algorithm for validating the proofs. For proofs represented as LF terms we use a simple LF type checker to validate the proofs. The basic tenet of LF is that proofs are represented as expressions and predicates as types. In order to check the validity of a proof we only need to typecheck its representation.
For proofs represented as oracles we use a simple non-deterministic logic interpreter. The safety policy is represented as a logic program and the verification condition is the goal to be proved. The interpreter is non-deterministic in that it will assume that an "oracle" will select at each step the right clause to be used. This oracle is actually encoded in the proof as a sequence of bits (essentially one bit for each binary choice that the interpreter faces). The availability of the oracle both simplifies and speeds up the interpretation considerably since there is no need to implement and to perform backtracking.
A method for generating the safety proofs. This component of PCC is used only by the code producer. For this part, we have implemented a theorem prover that emits the required proofs. First, the code is scanned by the same verification generator that the consumer uses (see point 2 above). Then the predicate is submitted to the theorem prover that attempts to prove it and, in case of success, emits an oracle or an LF representation of the proof. The prover contains state-of-the-art decision procedures for linear arithmetic, equality and modular arithmetic.

Note the many instances where elements from logic, type theory and programming language semantics arise in a realization of PCC. Extended use of these formal systems is required in order to be able to make any guarantees about the safety of the approach. In fact, we are able to prove theorems that guarantee the safety of the PCC technique modulo a correct implementation of the proof checker and a sound safety policy.

Experiments

In order to gain some preliminary experience with PCC, we have performed several experiments with operating system extensions and mobile-code components. Below is a list of the most interesting experiments:

Java bytecode. In our most extensive set of experiments we use a compiler for Java bytecode into x86. The compiler produces both the optimized assembly code and a set of annotations (e.g. loop invariants) that allow a companion theorem prover to reconstruct the proof that the assembly language is type safe. In this case the safety policy is the same one as that enforced by the bytecode verification stage in a Java bytecode verifier. The main difference from a bytecode verifier is that the code that is verified is the same code that is run, without the need for an interpreter or a complex just-in-time compiler. We have recently created a web-based demo of this system. This system is described in a PLDI00 paper.
Network packet filters. We used PCC to ensure the safety of packet filters that are downloaded in an operating system kernel to inspect every incoming packet. For example, in our OSDI'96 paper we show that, using PCC, we can safely write the packet filters in hand-optimized assembly language, with obvious performance benefits over approaches using interpretation or safe languages. And all this with safety proofs smaller than 1K and one-time proof validation below 2ms.
IP checksum. One of the experiences of the Fox project at CMU was that by writing networking code in a type-safe language, the programmer exposes herself to the risk that the compiler is not powerful enough to eliminate the array-bounds checks and other checks mandated by type-safety. PCC gives the programmer a way to use hand-optimized assembly language routines without compromising the type safety of the entire system. This experiment is described in a technical report.
Type-safe assembly language. This is just a more general instance of the problem that we addressed for the IP checksum experiment. In our POPL'97 paper, we show how PCC can be used to ensure that native code coexists peacefully with code written in Standard ML.
Mobile agents. In this set of experiments we extend the safety policy to include resource usage bounds and data abstraction, in addition to memory safety. In a paper appeared in a special LNCS issue on mobile-code security, we describe the use of PCC to ensure the safety of untrusted agents that access a database of airfares. Such agents are assigned an access level when they are received and are required to look only at those database records whose access level is smaller. This requirement is not enforced at run-time, but instead it is a part of the proved safety properties. In addition, the agents must prove that they terminate within a given number of instructions, and if they send network packets, they are not exceeding a preset bandwidth. For an example agent using this safety policy, the proof is about 6 times the size of the code.

An annotated PCC bibliography

My home page