Proof-Carrying Code
Note: This page is out of date
Contents
Overview
Proof-Carrying Code (PCC) is a technique that can be used for safe
execution of untrusted code. In a typical instance of PCC, a code receiver establishes a
set of safety rules that guarantee safe behavior of programs, and the code producer
creates a formal safety proof that proves, for the untrusted code, adherence to
the safety rules. Then, the receiver is able to use a simple and fast proof validator
to check, with certainty, that the proof is valid and hence the untrusted code is safe to
execute.
PCC has many uses in systems whose trusted computing base is
dynamic, either because of mobile code or because or regular bug fixes or updates.
Examples include, but are not limited, to extensible operating systems, Internet browsers
capable of downloading code, active network nodes and safety-critical embedded
controllers.
Advantages over related techniques
- Although there might be a large amount of effort in establishing and
formally proving the safety of the untrusted code, almost the entire burden of doing this
is on the code producer. The code consumer, on the other hand, has only to perform a fast,
simple, and easy-to-trust proof-checking process. The trustworthiness of the proof-checker
is an important advantage over approaches that involve the use of complex compilers or
interpreters in the code consumer.
- The code consumer does not care how the proofs are constructed. In
our current experiments, we rely on a theorem prover, but in general there is no reason
(except the effort required) that the proofs could not be generated by hand. No matter how
the proofs are generated, there is also an important advantage that the consumer does not
have to trust the proof-generation process.
- PCC programs are ``tamperproof,'' in the sense that any modification
(either accidental or malicious) will result in one of three possible outcomes: (1) the
proof will no longer be valid (that is, it will no longer typecheck), and so the program
will be rejected, (2) the proof will be valid, but will not be a safety proof for the
program, and so again the program will be rejected, or (3) the proof will still be valid
and will still be a safety proof for the program, despite the modifications. In the third
case, even though the behavior of the program might have been changed, the guarantee of
safety still holds.
- No cryptography or trusted third parties are required because PCC is
checking intrinsic properties of the code and not its origin. In this sense, PCC programs
are ``self-certifying.'' On the other hand, PCC is completely compatible with other
approaches to mobile-code security. For example, in an essay,
we discuss how trust management and PCC can be used together for mobile code security. We
also have some experience in using PCC to determine the correctness of applying Software
Fault Isolation to network packet filters. In engineering terms, combining approaches
leads to different tradeoffs (e.g., less effort required in proof generation at the cost
of slower run-time performance) that lead to greater system design flexibility.
- Because the untrusted code is verified statically, before being
executed, we not only save execution time but we detect potentially hazardous operations
early, thus avoiding the situations when the code consumer must kill the untrusted process
after it has acquired resources or modified state.
These five advantages are essentially statements about the advantage
of static checking over dynamic checking. We believe that static checking is essential for
mobile-code security, and that system designers in general have a somewhat limited view of
how static checking can be used.
Technical difficulties to overcome
The idea of Proof-Carrying Code is easy to comprehend but much more
difficult to implement efficiently. Here are some obstacles that we had to overcome:
- How to encode the formal proof? Trivial encoding of proofs of
properties of programs are very large.
- How to check the proof? This is not an easy task if you want your
proofs to be terse and the checker to be small, fast, and mostly-independent of the actual
safety policy that is being enforced.
- How to relate the proof with the program? It is of no use to validate
the proof if we cannot ensure that it says something about the program at hand.
Next section describes briefly our solutions to the above problems and gives pointers
to papers containing more implementation details.
Implementation
A typical implementation of PCC contains the following five
elements:
- A formal specification language used to express the safety
policy. We currently use first-order predicate logic extended with predicates for
type safety and memory safety.
- A formal semantics of the language used
by the untrusted code, usually in the form of a logic relating programs to
specifications. In our current implementation the untrusted code is in the form of
Imachine code (the DEC Alpha, Intel x86, Strong ARM and Motorola 68000 are
supported). As a means of relating machine code to specifications we
use a form of Floyd's verification-condition generator that extracts the safety properties
of a machine code program as a predicate in first-order logic. This predicate must then be
proved by the code producer using axioms and inference rules supplied by the code consumer
as part of the safety policy.
- A language used to express the proofs.
In the past we used a variant of Edinburgh Logical Framework (LF), which is essentially a typed lambda
calculus, to encode and check the proofs. LF is suitable for proof encoding because it is
general and can easily encode a wide variety of logics, including higher-order logics. Of
particular benefit is the higher-order nature of LF, which allows for a natural
representation of common features of logics, such as quantifiers and hypothetical and
parametric judgments.
One drawback of LF representation of
proofs is that they are usually large due to a significant amount of redundancy.
To address this problem we are currently using a much more compact
representation of proofs in the form of oracles that guide a simple
non-deterministic theorem prover in verifying the existence of the proof.
- An algorithm for validating the proofs.
For proofs represented as LF terms we use a simple LF type checker to
validate the proofs. The basic tenet of LF is that proofs are
represented as expressions and predicates as types. In order to check the validity of a
proof we only need to typecheck its representation.
For proofs represented as oracles we use a simple
non-deterministic logic interpreter. The safety policy is represented as a
logic program and the verification condition is the goal to be proved. The
interpreter is non-deterministic in that it will assume that an
"oracle" will select at each step the right clause to be used.
This oracle is actually encoded in the proof as a sequence of bits
(essentially one bit for each binary choice that the interpreter faces). The
availability of the oracle both simplifies and speeds up the interpretation
considerably since there is no need to implement and to perform
backtracking.
- A method for generating the safety proofs. This
component of PCC is used only by the code producer. For this part, we have implemented a
theorem prover that emits the required proofs. First, the code is scanned by the same
verification generator that the consumer uses (see point 2 above).
Then the predicate is submitted to the theorem prover that attempts to prove it and, in
case of success, emits an oracle or an LF representation of the proof. The prover contains
state-of-the-art decision procedures for linear arithmetic, equality and modular
arithmetic.
Note the many instances where elements from logic, type theory and
programming language semantics arise in a realization of PCC. Extended use of these formal
systems is required in order to be able to make any guarantees about the safety of the
approach. In fact, we are able to prove theorems that guarantee the
safety
of the PCC technique modulo a correct implementation of the proof checker
and a sound
safety policy.
Experiments
In order to gain some preliminary experience with PCC, we
have performed several experiments with operating system extensions and mobile-code
components. Below is a list of the most interesting experiments:
- Java bytecode. In our most
extensive set of experiments we use a compiler for Java bytecode into x86.
The compiler produces both the optimized assembly code and a set of
annotations (e.g. loop invariants) that allow a companion theorem prover to
reconstruct the proof that the assembly language is type safe. In this case
the safety policy is the same one as that enforced by the bytecode
verification stage in a Java bytecode verifier. The main difference from a
bytecode verifier is that the code that is verified is the same code that is
run, without the need for an interpreter or a complex just-in-time compiler.
We have recently created a web-based
demo of this system. This system is described in a
PLDI00
paper.
- Network packet filters. We used PCC to
ensure the safety of packet filters that are downloaded in an operating system kernel to
inspect every incoming packet. For example, in our
OSDI'96 paper
we show that, using PCC, we can safely write the packet filters in hand-optimized assembly
language, with obvious performance benefits over approaches using interpretation or safe
languages. And all this with safety proofs smaller than 1K and one-time proof validation
below 2ms.
- IP checksum. One of the experiences of the
Fox project at CMU was that by writing networking code in a type-safe language, the
programmer exposes herself to the risk that the compiler is not powerful enough to
eliminate the array-bounds checks and other checks mandated by type-safety. PCC gives the
programmer a way to use hand-optimized assembly language routines without compromising the
type safety of the entire system. This experiment is described in a
technical report.
- Type-safe assembly language. This is just a
more general instance of the problem that we addressed for the IP checksum experiment. In
our POPL'97 paper, we show how PCC can be used to ensure that
native code coexists peacefully with code written in Standard ML.
- Mobile agents. In this set of experiments
we extend the safety policy to include resource usage bounds and data abstraction, in
addition to memory safety. In a paper appeared in a special
LNCS issue on mobile-code security, we describe the use of PCC to ensure the safety of
untrusted agents that access a database of airfares. Such agents are assigned an access
level when they are received and are required to look only at those database records whose
access level is smaller. This requirement is not enforced at run-time, but instead it is a
part of the proved safety properties. In addition, the agents must prove that they
terminate within a given number of instructions, and if they send network packets, they
are not exceeding a preset bandwidth. For an example agent using this safety policy, the
proof is about 6 times the size of the code.
My home page
Last revised: July 22, 2002 05:34 PM.