CS 261 Homework 3
Instructions
This problem set is due Tuesday, November 20th.
You may work together and discuss the questions on this homework
with others, but the writeup you turn in must be your own, and you
should list anyone who you collaborated with.
You may use any source you like (including other papers or textbooks),
but if you use any source not discussed in class,
you must cite it.
Question 1
An authenticated set is a signed data structure that
enables set-membership queries to be answered with a proof that the
answer is correct.
In more detail, we assume that the creator of the set has a
public/private keypair (pk,k), and that everyone knows the
public key pk.
There are two operations:
- create(S,k) accepts a set S and a signing key k,
and produces auxiliary data v (used below).
- query(x,v) accepts a value x and tests whether x ∈ S.
If x ∈ S, query(x,v) outputs (true, w), where
w is a witness that anyone with knowledge of the public key
pk can use to convince themselves that x is indeed an
element of S.
If x ∉ S, query(x,v) just outputs false.
The idea is that this represents a set that has been signed, but so
that there is an efficient way to answer membership queries and prove
the answer correct. Notice that there are no insert() or delete() operations;
once the set is created, it is frozen.
We impose several security and performance requirements:
- Publishing the auxiliary data v must not reveal
about the signing key k or endanger the security of the scheme.
This is a nice property, because it means that the set can be
signed in advance and the auxiliary data given to an untrusted third
party, who can answer membership queries even if no one trusts them.
- The auxiliary data v should be not too large.
In particular, its size should be at most
O(n), where n represents the size of S.
The witness w should be much smaller: say, of size
O(1) or O(log n) or so.
- Anyone who does not know the signing key k should be
unable to come up with a value x ∉ S and a fake
witness w' so that (true, w') would be accepted by
a recipient as a valid response to query(x,v).
It is easy to design a simple authenticated set data structure:
if S={x1,..,xn}, define
create(S,k)=v=(v1,..,vn), where
vi = sign(k,xi);
then we can define
query(xi,v) = (true, vi).
The recipient of a response (true, w) to query(x,v)
can check the validity of this response by using the public key
pk to verify that w is a valid signature on x.
One shortcoming of the above framework is that negative
responses to query() are unauthenticated. If an untrusted third
party is answering membership queries, then they can freely lie
and claim that the value x is not in S. This is a
point of vulnerability.
Your job is to devise an authenticated set data structure so
that both positive and negative responses to query() are authenticated.
In particular, the above definition is modified so that if
x ∉ S, then query(x,v) should return (false, w),
where w is a witness that anyone with knowledge of the public key
pk can use to convince themselves that x is indeed not an
element of S.
Everything else is as above.
Your solution should satisfy all of the above security and performance
requirements, as well as the following additional requirement:
- Anyone who does not know the signing key k should be
unable to come up with a value x ∈ S and a fake
witness w' so that (false, w') would be accepted by
a recipient as a valid response to query(x,v).
You should provide a definition of the create() and query() algorithms,
as well as specify how a recipient of an answer from query() can verify
its validity.
Motivation for the curious: This question is inspired by the DNS
security scheme sketched in lecture. As I described it in lecture,
each domain has a public key and an authenticated set of DNS records
for that domain. The administrator creates and signs the set in
advance using the create() operation, and then the DNS server responds
to queries using the query() operation. Note that the private key
can be stored on a computer that is never directly connected to the
Internet, and doesn't need to be known to the DNS server, which is
a nice property. Also DNS clients can verify the validity of these
DNS records, even if they have passed through several other caching
DNS servers on their way to the end client. However, the scheme I
described in lecture has the shortcoming that it leaves negative
responses unauthenticated: if you ask a caching DNS server whether
there is any record for foo.bar.com, it can lie to you and claim that
there is no such record, when in fact such a record does exist.
This is bad, and can open up subtle and nasty security vulnerabilities
if the end host has a DNS search path that spans multiple organizations.
In Question 1, you'll invent a data structure that could be used
to efficiently authenticate both positive and negative responses to
DNS queries.
Question 2
This question asks you to explore some of the consequences
of active networks, where packets
can contain mobile code that is executed by the routers along the
path.
For concreteness, we can think of 'adaptive routing'
as a sample application: if your TCP connection to France is too slow
because of poor bandwidth on the transatlantic link and for some reason
you happen to know that there is a much faster route to France via
China, you might wish to adaptively update the route your TCP packets
take. In this case, you would "push" some mobile code into each
router along the way; the mobile code would run at each router before
the packet is forwarded and select which interface to send it out over.
We describe below a series of extensions to the IP protocol suite
which allows for progressively more sophisticated active networks applications.
For each of the four extensions below, list the security threats
that might arise for that extension and how they could be addressed.
The purpose of this question is to study issues that are inherent
in the functionality; you may ignore the risk of implementation bugs such as
buffer overruns.
- In the simplest variant, we'd extend the IP packet format to allow
an optional extra header which contains some mobile code to run at each
router. The mobile code is specified in
the BPF
(Berkeley Packet Filter) bytecode language.
Each router which receives such a packet first verifies
that the bytecode contains no backwards jumps, and then interprets the
bytecode. The only memory locations the bytecodes are allowed to read
are (1) the packet itself, and (2) a global list of interfaces available
at the router. (Each interface in the list is annotated with a little bit
of relevant information that can be read by the handler, such as the IP
address of the next hop along that interface.
No writes to memory are allowed.)
There are no function calls, computed gotos, exceptions, or other forms
of indirect control flow.
Just before exiting,
the bytecode should store the name of the desired outbound interface
in a fixed register, and the router will forward the packet out via that
interface on towards its destination.
- One obvious performance issue with the previous scheme is that
it requires an overhead of potentially hundreds of bytes of code
in every packet.
So we introduce the notion of "flows" to amortize the cost of specifying
the mobile code.
Each packet is associated with a flow.
In TCP, the flow ID might be the (src host, dst host, src port, dst port)
tuple. For other protocols, we might simply extend the packet format to allow
for a 32-bit flow ID.
We add a "set handler" IP option which allows endpoints to specify
a single chunk of mobile code which will be run at the router
every time a packet is received on the same flow.
Thus one endpoint can send a packet with the "set handler" IP option
and containing a lengthy chunk of mobile code; that mobile code will
then be applied to all subsequent packets on that flow, and does not need
to be sent again.
This allows us to specify a chunk of mobile code once; then all
subsequent packets in the flow
will inherit the same code without incurring any bandwidth overhead.
- It occurs to us that we might like to allow the mobile code to make routing
policy decisions based on the payload of the packets, or even to compress
packets for us on the fly when bandwidth is scarce.
Since this might require scanning the entire packet and possibly
interpreting higher-level protocols, we will need to be able to write
loops in bytecode.
Therefore, we eliminate the restriction on backwards jumps, and allow
arbitrary control flow in the bytecode.
To implement compression, the handler will need to be able to modify
the contents of the packet.
Therefore, we also relax our security policy so that handlers are allowed
both read and write access to the packet itself. If the handler
modifies the packet during execution, the router will forward the
modified packet instead of the original contents.
Also, we allow handlers to maintain state across packet reception events.
Thus, when a new flow is created, we set aside a chunk of memory for
use by that flow's handler; the handler is allowed read and write
access only its own chunk of memory.
- An astute reader points out that decompression may increase the
size of a packet. If this exceeds the network's MTU, our decompression
handler may need to send multiple packets.
Therefore, we extend the scheme so that
handlers can construct whole IP packets in their own memory
space and invoke a special operation to send that packet over the wire.
Clarification (added 11/15): Don't forget: In each part,
you should list security threats, and also propose a way that
those threats could be addressed (e.g., propose a fix).