A version of this paper appeared in the Proceedings of the Fifteenth Symposium on Operating Systems Principles
Extensibility, Safety and Performance in the
SPIN Operating System
Brian N. Bershad Stefan Savage Przemys/law Pardyak Emin G╱n Sirer
Marc E. Fiuczynski David Becker Craig Chambers Susan Eggers
Department of Computer Science and Engineering
University of Washington
Seattle, WA 98195
Abstract
This paper describes the motivation, architecture and
performance of SPIN, an extensible operating system.
SPIN provides an extension infrastructure, together
with a core set of extensible services, that allow applica-
tions to safely change the operating system's interface
and implementation. Extensions allow an application to
specialize the underlying operating system in order to
achieve a particular level of performance and function-
ality. SPIN uses language and link-time mechanisms to
inexpensively export fine-grained interfaces to operat-
ing system services. Extensions are written in a type
safe language, and are dynamically linked into the op-
erating system kernel. This approach offers extensions
rapid access to system services, while protecting the op-
erating system code executing within the kernel address
space. SPIN and its extensions are written in Modula-3
and run on DEC Alpha workstations.
1 Introduction
SPIN is an operating system that can be dynamically
specialized to safely meet the performance and function-
ality requirements of applications. SPIN is motivated
by the need to support applications that present de-
mands poorly matched by an operating system's imple-
mentation or interface. A poorly matched implementa-
tion prevents an application from working well, while a
poorly matched interface prevents it from working at all.
For example, the implementations of disk buffering and
This research was sponsored by the Advanced Research
Projects Agency, the National Science Foundation (Grants no.
CDA-9123308 and CCR-9200832) and by an equipment grant
from Digital Equipment Corporation. Bershad was partially sup-
ported by a National Science Foundation Presidential Faculty Fel-
lowship. Chambers was partially sponsored by a National Science
Foundation Presidential Young Investigator Award. Sirer was
supported by an IBM Graduate Student Fellowship. Fiuczynski
was partially supported by a National Science Foundation GEE
Fellowship.
paging algorithms found in modern operating systems
can be inappropriate for database applications, result-
ing in poor performance [Stonebraker 81]. General pur-
pose network protocol implementations are frequently
inadequate for supporting the demands of high perfor-
mance parallel applications [von Eicken et al. 92]. Other
applications, such as multimedia clients and servers, and
realtime and fault tolerant programs, can also present
demands that poorly match operating system services.
Using SPIN, an application can extend the operating
system's interfaces and implementations to provide a
better match between the needs of the application and
the performance and functional characteristics of the
system.
1.1 Goals and approach
The goal of our research is to build a general purpose
operating system that provides extensibility, safety and
good performance. Extensibility is determined by the
interfaces to services and resources that are exported
to applications; it depends on an infrastructure that
allows fine-grained access to system services. Safety de-
termines the exposure of applications to the actions of
others, and requires that access be controlled at the
same granularity at which extensions are defined. Fi-
nally, good performance requires low overhead commu-
nication between an extension and the system.
The design of SPIN reflects our view that an operat-
ing system can be extensible, safe, and fast through the
use of language and runtime services that provide low-
cost, fine-grained, protected access to operating system
resources. Specifically, the SPIN operating system re-
lies on four techniques implemented at the level of the
language or its runtime:
ffl Co-location. Operating system extensions are dy-
namically linked into the kernel virtual address
space. Co-location enables communication between
system and extension code to have low cost.

ffl Enforced modularity. Extensions are written in
Modula-3 [Nelson 91], a modular programming lan-
guage for which the compiler enforces interface
boundaries between modules. Extensions, which
execute in the kernel's virtual address space, can-
not access memory or execute privileged instruc-
tions unless they have been given explicit access
through an interface. Modularity enforced by the
compiler enables modules to be isolated from one
another with low cost.
ffl Logical protection domains. Extensions exist
within logical protection domains, which are ker-
nel namespaces that contain code and exported in-
terfaces. Interfaces, which are language-level units,
represent views on system resources that are pro-
tected by the operating system. An in-kernel dy-
namic linker resolves code in separate logical pro-
tection domains at runtime, enabling cross-domain
communication to occur with the overhead of a pro-
cedure call.
ffl Dynamic call binding. Extensions execute in re-
sponse to system events. An event can describe
any potential action in the system, such as a virtual
memory page fault or the scheduling of a thread.
Events are declared within interfaces, and can be
dispatched with the overhead of a procedure call.
Co-location, enforced modularity, logical protection
domains, and dynamic call binding enable interfaces to
be defined and safely accessed with low overhead. How-
ever, these techniques do not guarantee the system's ex-
tensibility. Ultimately, extensibility is achieved through
the system service interfaces themselves, which define
the set of resources and operations that are exported
to applications. SPIN provides a set of interfaces to
core system services, such as memory management and
scheduling, that rely on co-location to efficiently export
fine-grained operations, enforced modularity and logical
protection domains to manage protection, and dynamic
call binding to define relationships between system com-
ponents and extensions at runtime.
1.2 System overview
The SPIN operating system consists of a set of extension
services and core system services that execute within the
kernel's virtual address space. Extensions can be loaded
into the kernel at any time. Once loaded, they integrate
themselves into the existing infrastructure and provide
system services specific to the applications that require
them. SPIN is primarily written in Modula-3, which
allows extensions to directly use system interfaces with-
out requiring runtime conversion when communicating
with other system code.
Although SPIN relies on language features to ensure
safety within the kernel, applications can be written in
any language and execute within their own virtual ad-
dress space. Only code that requires low-latency access
to system services is written in the system's safe ex-
tension language. For example, we have used SPIN to
implement a UNIX operating system server. The bulk
of the server is written in C, and executes within its own
address space (as do applications). The server consists
of a large body of code that implements the DEC OSF/1
system call interface, and a small number of SPIN ex-
tensions that provide the thread, virtual memory, and
device interfaces required by the server.
We have also used extensions to specialize SPIN to
the needs of individual application programs. For ex-
ample, we have built a client/server video system that
requires few control and data transfers as images move
from the server's disk to the client's screen. Using SPIN
the server defines an extension that implements a direct
stream between the disk and the network. The client
viewer application installs an extension into the kernel
that decompresses incoming network video packets and
displays them to the video frame buffer.
1.3 The rest of this paper
The rest of this paper describes the motivation, design,
and performance of SPIN. In the next section we moti-
vate the need for extensible operating systems and dis-
cuss related work. In Section 3 we describe the sys-
tem's architecture in terms of its protection and exten-
sion facilities. In Section 4 we describe the core services
provided by the system. In Section 5 we discuss the
system's performance and compare it against that of
several other operating systems. In Section 6 we discuss
our experiences writing an operating system in Modula-
3. Finally, in Section 7 we present our conclusions.
2 Motivation
Most operating systems are forced to balance gener-
ality and specialization. A general system runs many
programs, but may run few well. In contrast, a spe-
cialized system may run few programs, but runs them
all well. In practice, most general systems can, with
some effort, be specialized to address the performance
and functional requirements of a particular application's
needs, such as interprocess communication, synchro-
nization, thread management, networking, virtual mem-
ory and cache management [Draves et al. 91, Bershad
et al. 92b, Stodolsky et al. 93, Bershad 93, Yuhara
et al. 94, Maeda & Bershad 93, Felten 92, Young
et al. 87, Harty & Cheriton 91, McNamee & Armstrong
90, Anderson et al. 92, Fall & Pasquale 94, Wheeler
& Bershad 92, Romer et al. 94, Romer et al. 95, Cao
et al. 94]. Unfortunately, existing system structures
are not well-suited for specialization, often requiring a
substantial programming effort to affect even a small
change in system behavior. Moreover, changes intended
to improve the performance of one class of applications
can often degrade that of others. As a result, system
specialization is a costly and error-prone process.

An extensible system is one that can be changed dy-
namically to meet the needs of an application. The need
for extensibility in operating systems is shown clearly
by systems such as MS-DOS, Windows, or the Macin-
tosh Operating System. Although these systems were
not designed to be extensible, their weak protection
mechanisms have allowed application programmers to
directly modify operating system data structures and
code [Schulman et al. 92]. While individual applica-
tions have benefited from this level of freedom, the lack
of safe interfaces to either operating system services or
operating system extension services has created system
configuration ``chaos'' [Draves 93].
2.1 Related work
Previous efforts to build extensible systems have demon-
strated the three-way tension between extensibility,
safety and performance. For example, Hydra [Wulf et al.
81] defined an infrastructure that allowed applications
to manage resources through multi-level policies. The
kernel defined the mechanism for allocating resources
between processes, and the processes themselves im-
plemented the policies for managing those resources.
Hydra's architecture, although highly influential, had
high overhead due to its weighty capability-based pro-
tection mechanism. Consequently, the system was de-
signed with ``large objects'' as the basic building blocks,
requiring a large programming effort to affect even a
small extension.
Researchers have recently investigated the use of
microkernels as a vehicle for building extensible sys-
tems [Black et al. 92, Mullender et al. 90, Cheriton
& Zwaenepoel 83, Cheriton & Duda 94, Thacker et al.
88]. A microkernel typically exports a small number
of abstractions that include threads, address spaces,
and communication channels. These abstractions can
be combined to support more conventional operating
system services implemented as user-level programs.
Application-specific extensions in a microkernel occur
at or above the level of the kernel's interfaces. Unfortu-
nately, applications often require substantial changes to
a microkernel's implementation to compensate for limi-
tations in interfaces [Lee et al. 94, Davis et al. 93, Wald-
spurger & Weihl 94].
Although a microkernel's communication facilities
provide the infrastructure for extending nearly any ker-
nel service [Barrera 91, Abrossimov et al. 89, Forin et al.
91], few have been so extended. We believe this is be-
cause of high communication overhead [Bershad et al.
90, Draves et al. 91, Chen & Bershad 93], which lim-
its extensions mostly to coarse-grained services [Golub
et al. 90, Stevenson & Julin 95, Bricker et al. 91].
Otherwise, protected interaction between system com-
ponents, which occurs frequently in a system with fine-
grained extensions, can be a limiting performance fac-
tor.
Although the performance of cross-domain communi-
cation has improved substantially in recent years [Hamil-
ton & Kougiouris 93, Hildebrand 92, Engler et al. 95],
it still does not approach that of a procedure call, en-
couraging the construction of monolithic, non-extensible
systems. For example, the L3 microkernel, even with its
aggressive design, has a protected procedure call imple-
mentation with overhead of nearly 100 procedure call
times [Liedtke 92, Liedtke 93, Int 90]. As a point of
comparison, the Intel 432 [Int 81], which provided hard-
ware support for protected cross-domain transfer, had
a cross-domain communication overhead on the order
of about 10 procedure call times [Colwell 85], and was
generally considered unacceptable.
Some systems rely on ``little languages'' to safely ex-
tend the operating system interface through the use
of interpreted code that runs in the kernel [Lee et al.
94, Mogul et al. 87, Yuhara et al. 94]. These systems
suffer from three problems. First, the languages, being
little, make the expression of arbitrary control and data
structures cumbersome, and therefore limit the range
of possible extensions. Second, the interface between
the language's programming environment and the rest
of the system is generally narrow, making system in-
tegration difficult. Finally, interpretation overhead can
limit performance.
Many systems provide interfaces that enable arbitrary
code to be installed into the kernel at runtime [Heide-
mann & Popek 94, Rozier et al. 88]. In these systems
the right to define extensions is restricted because any
extension can bring down the entire system; application-
specific extensibility is not possible.
Several projects [Lucco 94, Engler et al. 95, Small &
Seltzer 94] are exploring the use of software fault isola-
tion [Wahbe et al. 93] to safely link application code,
written in any language, into the kernel's virtual ad-
dress space. Software fault isolation relies on a binary
rewriting tool that inserts explicit checks on memory
references and branch instructions. These checks al-
low the system to define protected memory segments
without relying on virtual memory hardware. Software
fault isolation shows promise as a co-location mecha-
nism for relatively isolated code and data segments. It
is unclear, though, if the mechanism is appropriate for a
system with fine-grained sharing, where extensions may
access a large number of segments. In addition, soft-
ware fault isolation is only a protection mechanism and
does not define an extension model or the service inter-
faces that determine the degree to which a system can
be extended.
Aegis [Engler et al. 95] is an operating system that
relies on efficient trap redirection to export hardware
services, such as exception handling and TLB manage-
ment, directly to applications. The system itself defines
no abstractions beyond those minimally provided by the
hardware [Engler & Kaashoek 95]. Instead, conven-
tional operating system services, such as virtual memory
and scheduling, are implemented as libraries executing
in an application's address space. System service code
executing in a library can be changed by the applica-
tion according to its needs. SPIN shares many of the

same goals as Aegis although its approach is quite dif-
ferent. SPIN uses language facilities to protect the ker-
nel from extensions and implements protected commu-
nication using procedure call. Using this infrastructure,
SPIN provides an extension model and a core set of ex-
tensible services. In contrast, Aegis relies on hardware
protected system calls to isolate extensions from the ker-
nel and leaves unspecified the manner by which those
extensions are defined or applied.
Several systems [Cooper et al. 91, Redell et al.
80, Mossenbock 94, Organick 73] like SPIN, have re-
lied on language features to extend operating system
services. Pilot, for instance, was a single-address space
system that ran programs written in Mesa [Geschke
et al. 77], an ancestor of Modula-3. In general, sys-
tems such as Pilot have depended on the language for
all protection in the system, not just for the protection
of the operating system and its extensions. In contrast,
SPIN's reliance on language services applies only to ex-
tension code within the kernel. Virtual address spaces
are used to otherwise isolate the operating system and
programs from one another.
3 The SPIN Architecture
The SPIN architecture provides a software infrastruc-
ture for safely combining system and application code.
The protection model supports efficient, fine-grained ac-
cess control of resources, while the extension model en-
ables extensions to be defined at the granularity of a
procedure call. The system's architecture is biased to-
wards mechanisms that can be implemented with low-
cost on conventional processors. Consequently, SPIN
makes few demands of the hardware, and instead relies
on language-level services, such as static typechecking
and dynamic linking.
Relevant properties of Modula-3
SPIN and its extensions are written in Modula-3, a
general purpose programming language designed in the
early 1990's. The key features of the language include
support for interfaces, type safety, automatic storage
management, objects, generic interfaces, threads, and
exceptions. We rely on the language's support for ob-
jects, generic interfaces, threads, and exceptions for aes-
thetic reasons only; we find that these features simplify
the task of constructing a large system.
The design of SPIN depends only on the language's
safety and encapsulation mechanisms; specifically inter-
faces, type safety, and automatic storage management.
An interface declares the visible parts of an implemen-
tation module, which defines the items listed in the in-
terface. All other definitions within the implementation
module are hidden. The compiler enforces this restric-
tion at compile-time. Type safety prevents code from
accessing memory arbitrarily. A pointer may only re-
fer to objects of its referent's type, and array indexing
operations must be checked for bounds violation. The
first restriction is enforced at compile-time, and the sec-
ond is enforced through a combination of compile-time
and run-time checks. Automatic storage management
prevents memory used by a live pointer's referent from
being returned to the heap and reused for an object of
a different type.
3.1 The protection model
A protection model controls the set of operations that
can be applied to resources. For example, a protection
model based on address spaces ensures that a process
can only access memory within a particular range of vir-
tual addresses. Address spaces, though, are frequently
inadequate for the fine-grained protection and manage-
ment of resources, being expensive to create and slow
to access [Lazowska et al. 81].
Capabilities
All kernel resources in SPIN are referenced by capabil-
ities. A capability is an unforgeable reference to a re-
source which can be a system object, an interface, or a
collection of interfaces. An example of each of these is a
physical page, a physical page allocation interface, and
the entire virtual memory system. Individual resources
are protected to ensure that extensions reference only
the resources to which they have been given access. In-
terfaces and collections of interfaces are protected to
allow different extensions to have different views on the
set of available services.
Unlike other operating systems based on capabilities,
which rely on special-purpose hardware [Carter et al.
94], virtual memory mechanisms [Wulf et al. 81], prob-
abilistic protection [Engler et al. 94], or protected mes-
sage channels [Black et al. 92], SPIN implements ca-
pabilities directly using pointers, which are supported
by the language. A pointer is a reference to a block of
memory whose type is declared within an interface. Fig-
ure 1 demonstrates the definition and use of interfaces
and capabilities (pointers) in SPIN.
The compiler, at compile-time, prevents a pointer
from being forged or dereferenced in a way inconsis-
tent with its type. There is no run-time overhead for
using a pointer, passing it across an interface, or deref-
erencing it, other than the overhead of going to memory
to access the pointer or its referent. A pointer can be
passed from the kernel to a user-level application, which
cannot be assumed to be type safe, as an externalized
reference. An externalized reference is an index into a
per-application table that contains type safe references
to in-kernel data structures. The references can later
be recovered using the index. Kernel services that in-
tend to pass a reference out to user level externalize the
reference through this table and instead pass out the
index.

Protection domains
A protection domain defines the set of accessible names
available to an execution context. In a conventional op-
erating system, a protection domain is implemented us-
ing virtual address spaces. A name within one domain,
a virtual address, has no relationship to that same name
in another domain. Only through explicit mapping and
sharing operations is it possible for names to become
meaningful between protection domains.
INTERFACE Console; (* An interface. *)
TYPE T !: REFANY; (* Read as ''Console.T is opaque.'' *)
CONST InterfaceName = ''ConsoleService'';
(* A global name *)
PROCEDURE Open():T;
(* Open returns a capability for the console. *)
PROCEDURE Write(t: T; msg: TEXT);
PROCEDURE Read(t: VAR msg: TEXT);
PROCEDURE Close(t: T);
END Console;
MODULE Console; (* An implementation module. *)
(* The implementation of Console.T *)
TYPE Buf = ARRAY [0..31] OF CHAR;
REVEAL T = BRANDED REF RECORD (* T is a pointer *)
inputQ: Buf; (* to a record *)
outputQ: Buf;
(* device specific info *)
END;
(* Implementations of interface functions *)
(* have direct access to the revealed type. *)
PROCEDURE Open():T = ...
END Console;
MODULE Gatekeeper; (* A client *)
IMPORT Console;
VAR c: Console.T; (* A capability for *)
(* the console device *)
PROCEDURE IntruderAlert() =
BEGIN
c := Console.Open();
Console.Write(c, ''Intruder Alert'');
Console.Close(c);
END IntruderAlert;
BEGIN
END Gatekeeper;
Figure 1: The Gatekeeper module interacts with SPIN's Con-
sole service through the Console interface. Although Gate-
keeper.IntruderAlert manipulates objects of type Console.T, it
is unable to access the fields within the object, even though it
executes within the same virtual address space as the Console
module.
In SPIN the naming and protection interface is at
the level of the language, not of the virtual memory
system. Consequently, namespace management must
occur at the language level. For example, if the name
c is an instance of the type Console.T, then both c and
Console.T occupy a portion of some symbolic names-
pace. An extension that redefines the type Console.T,
creates an instance of the new type, and passes it to
a module expecting a Console.T of the original type
creates a type conflict that results in an error. The
error could be avoided by placing all extensions into
a global module space, but since modules, procedures,
and variable names are visible to programmers, we felt
that this would introduce an overly restrictive program-
ming model for the system. Instead, SPIN provides fa-
cilities for creating, coordinating, and linking program-
level namespaces in the context of protection domains.
INTERFACE Domain;
TYPE T !: REFANY; (* Domain.T is opaque *)
PROCEDURE Create(coff:CoffFile.T):T;
(* Returns a domain created from the specified object
file (``coff'' is a standard object file format). *)
PROCEDURE CreateFromModule():T;
(* Create a domain containing interfaces defined by the
calling module. This function allows modules to
name and export themselves at runtime. *)
PROCEDURE Resolve(source,target: T);
(* Resolve any undefined symbols in the target domain
against any exported symbols from the source.*)
PROCEDURE Combine(d1, d2: T):T;
(* Create a new aggregate domain that exports the
interfaces of the given domains. *)
END Domain.
Figure 2: The Domain interface. This interface operates on in-
stances of type Domain.T, which are described by type safe point-
ers. The implementation of the Domain interface is unsafe with
respect to Modula-3 memory semantics, as it must manipulate
linker symbols and program addresses directly.
A SPIN protection domain defines a set of names, or
program symbols, that can be referenced by code with
access to the domain. A domain, named by a capability,
is used to control dynamic linking, and corresponds to
one or more safe object files with one or more exported
interfaces. An object file is safe if it is unknown to the
kernel but has been signed by the Modula-3 compiler,
or if the kernel can otherwise assert the object file to be
safe. For example, SPIN's lowest level device interface
is identical to the DEC OSF/1 driver interface [Dig 93],
allowing us to dynamically link vendor drivers into the
kernel. Although the drivers are written in C, the kernel
asserts their safety. In general, we prefer to avoid using
object files that are ``safe by assertion'' rather than by
compiler verification, as they tend to be the source of
more than their fair share of bugs.
Domains can be intersecting or disjoint, enabling ap-

plications to share services or define new ones. A do-
main is created using the Create operation, which ini-
tializes a domain with the contents of a safe object file.
Any symbols exported by interfaces defined in the ob-
ject file are exported from the domain, and any im-
ported symbols are left unresolved. Unresolved symbols
correspond to interfaces imported by code within the
domain for which implementations have not yet been
found.
The Resolve operation serves as the basis for dynamic
linking. It takes a target and a source domain, and
resolves any unresolved symbols in the target domain
against symbols exported from the source. During reso-
lution, text and data symbols are patched in the target
domain, ensuring that, once resolved, domains are able
to share resources at memory speed. Resolution only
resolves the target domain's undefined symbols; it does
not cause additional symbols to be exported. Cross-
linking, a common idiom, occurs through a pair of Re-
solve operations.
The Combine operation creates linkable namespaces
that are the union of existing domains, and can be used
to bind together collections of related interfaces. For
example, the domain SpinPublic combines the system's
public interfaces into a single domain available to ex-
tensions. Figure 2 summarizes the major operations on
domains.
The domain interface is commonly used to import
or export particular named interfaces. A module that
exports an interface explicitly creates a domain for its
interface, and exports the domain through an in-kernel
nameserver. The exported name of the interface, which
can be specified within the interface, is used to coor-
dinate the export and import as in many RPC sys-
tems [Schroeder & Burrows 90, Brockschmidt 94]. The
constant Console.InterfaceName in Figure 1 defines a
name that exporters and importers can use to uniquely
identify a particular version of a service.
Some interfaces, such as those for devices, restrict ac-
cess at the time of the import. An exporter can register
an authorization procedure with the nameserver that
will be called with the identity of the importer when-
ever the interface is imported. This fine-grained control
has low cost because the importer, exporter, and autho-
rizer interact through direct procedure calls.
3.2 The extension model
An extension changes the way in which a system pro-
vides service. All software is extensible in one way
or another, but it is the extension model that deter-
mines the ease, transparency, and efficiency with which
an extension can be applied. SPIN's extension model
provides a controlled communication facility between
extensions and the base system, while allowing for a
variety of interaction styles. For example, the model
allows extensions to passively monitor system activity,
and provide up-to-date performance information to ap-
plications. Other extensions may offer hints to the sys-
tem to guide certain operations, such as page replace-
ment. In other cases, an extension may entirely replace
an existing system service, such as a scheduler, with a
new one more appropriate to a specific application.
Extensions in SPIN are defined in terms of events
and handlers. An event is a message that announces a
change in the state of the system or a request for ser-
vice. An event handler is a procedure that receives the
message. An extension installs a handler on an event by
explicitly registering the handler with the event through
a central dispatcher that routes events to handlers.
Event names are protected by the domain machinery
described in the previous section. An event is defined
as a procedure exported from an interface and its han-
dlers are defined as procedures having the same type. A
handler is invoked with the arguments specified by the
event raiser. 1 The kernel is preemptive, ensuring that a
handler cannot take over the processor.
The right to call a procedure is equivalent to the right
to raise the event named by the procedure. In fact, the
two are indistinguishable in SPIN, and any procedure
exported by an interface is also an event. The dispatcher
exploits this similarity to optimize event raise as a direct
procedure call where there is only one handler for a
given event. Otherwise, the dispatcher uses dynamic
code generation [Engler & Proebsting 94] to construct
optimized call paths from the raiser to the handlers.
The primary right to handle an event is restricted
to the default implementation module for the event,
which is the module that statically exports the proce-
dure named by the event. For example, the module
Console is the default implementation module for the
event Console.Open() shown in Figure 1. Other mod-
ules may request that the dispatcher install additional
handlers or even remove the primary handler. For each
request, the dispatcher contacts the primary implemen-
tation module, passing the event name provided by the
installer. The implementation module can deny or allow
the installation. If denied, the installation fails. If al-
lowed, the implementation module can provide a guard
to be associated with the handler. The guard defines
a predicate, expressed as a procedure, that is evaluated
by the dispatcher prior to the handler's invocation. If
the predicate is true when the event is raised, then the
handler is invoked; otherwise the handler is ignored.
Guards are used to restrict access to events at a gran-
ularity finer than the event name, allowing events to be
dispatched on a per-instance basis. For example, the
SPIN extension that implements IP layer processing de-
fines the event IP.PacketArrived(pkt: IP.Packet), which
it raises whenever an IP packet is received. The IP
module, which defines the default implementation of the
PacketArrived event, upon each installation, constructs
a guard that compares the type field in the header of
the incoming packet against the set of IP protocol types
that the handler may service. In this way, IP does not
1 The dispatcher also allows a handler to specify an additional
closure to be passed to the handler during event processing. The
closure allows a single handler to be used within more than one
context.

have to export a separate interface for each event in-
stance. A handler can stack additional guards on an
event, further constraining its invocation.
There may be any number of handlers installed on a
particular event. The default implementation module
may constrain a handler to execute synchronously or
asynchronously, in bounded time, or in some arbitrary
order with respect to other handlers for the same event.
Each of these constraints reflects a different degree of
trust between the default implementation and the han-
dler. For example, a handler may be bounded by a time
quantum so that it is aborted if it executes too long. A
handler may be asynchronous, which causes it to exe-
cute in a separate thread from the raiser, isolating the
raiser from handler latency. When multiple handlers
execute in response to an event, a single result can be
communicated back to the raiser by associating with
each event a procedure that ultimately determines the
final result [Pardyak & Bershad 94]. By default, the dis-
patcher mimics procedure call semantics, and executes
handlers synchronously, to completion, in undefined or-
der, and returns the result of the final handler executed.
4 The core services
The SPIN protection and extension mechanisms de-
scribed in the previous section provide a framework for
managing interfaces between services within the ker-
nel. Applications, though, are ultimately concerned
with manipulating resources such as memory and the
processor. Consequently, SPIN provides a set of core
services that manage memory and processor resources.
These services, which use events to communicate be-
tween the system and extensions, export interfaces with
fine-grained operations. In general, the service inter-
faces that are exported to extensions within the kernel
are similar to the secondary internal interfaces found
in conventional operating systems; they provide simple
functionality over a small set of objects. In SPIN it
is straightforward to allocate a single virtual page, a
physical page, and then create a mapping between the
two. Because the overhead of accessing each of these
operations is low (a procedure call), it is feasible to pro-
vide them as interfaces to separate abstractions, and to
build up higher level abstractions through direct com-
position. By contrast, traditional operating systems ag-
gregate simpler abstractions into more complex ones,
because the cost of repeated access to the simpler ab-
stractions is too high.
4.1 Extensible memory management
A memory management system is responsible for the
allocation of virtual addresses, physical addresses, and
mappings between the two. Other systems have demon-
strated significant performance improvements from spe-
cialized or ``tuned'' memory management policies that
are accessible through interfaces exposed by the mem-
ory management system. Some of these interfaces have
made it possible to manipulate large objects, such as en-
tire address spaces [Young et al. 87, Khalidi & Nelson
93], or to direct expensive operations, for example page-
out [Harty & Cheriton 91, McNamee & Armstrong 90],
entirely from user level. Others have enabled control
over relatively small objects, such as cache pages [Romer
et al. 94] or TLB entries [Bala et al. 94], entirely from
the kernel. None have allowed for fast, fine-grained con-
trol over the physical and virtual memory resources re-
quired by applications. SPIN's virtual memory system
provides such control, and is enabled by the system's
low-overhead invocation and protection services.
The SPIN memorymanagement interface decomposes
memory services into three basic components: physi-
cal storage, naming, and translation. These correspond
to the basic memory resources exported by processors,
namely physical addresses, virtual addresses, and trans-
lations. Application-specific services interact with these
three services to define higher level virtual memory ab-
stractions, such as address spaces.
Each of the three basic components of the memory
system is provided by a separate service interface, de-
scribed in Figure 3. The physical address service con-
trols the use and allocation of physical pages. Clients
raise the Allocate event to request physical memory with
a certain size and an optional series of attributes that
reflect preferences for machine specific parameters such
as color or contiguity. A physical page represents a unit
of high speed storage. It is not, for most purposes,
a nameable entity and may not be addressed directly
from an extension or a user program. Instead, clients
of the physical address service receive a capability for
the memory. The virtual address service allocates ca-
pabilities for virtual addresses, where the capability's
referent is composed of a virtual address, a length,
and an address space identifier that makes the address
unique. The translation service is used to express the re-
lationship between virtual addresses and physical mem-
ory. This service interprets references to both virtual
and physical addresses, constructs mappings between
the two, and installs the mappings into the processor's
memory management unit (MMU).
The translation service raises a set of events that
correspond to various exceptional MMU conditions.
For example, if a user program attempts to access
an unallocated virtual memory address, the Transla-
tion.BadAddress event is raised. If it accesses an al-
located, but unmapped virtual page, then the Transla-
tion.PageNotPresent event is raised. Implementors of
higher level memory management abstractions can use
these events to define services, such as demand pag-
ing, copy-on-write [Rashid et al. 87], distributed shared
memory [Carter et al. 91], or concurrent garbage col-
lection [Appel & Li 91].
The physical page service may at any time re-
claim physical memory by raising the PhysAddr.Reclaim
event. The interface allows the handler for this event to
volunteer an alternative page, which may be of less im-
portance than the candidate page. The translation ser-

INTERFACE PhysAddr;
TYPE T !: REFANY; (* PhysAddr.T is opaque *)
PROCEDURE Allocate(size: Size; attrib: Attrib): T;
(* Allocate some physical memory with particular
attributes. *)
PROCEDURE Deallocate(p: T);
PROCEDURE Reclaim(candidate: T): T;
(* Request to reclaim a candidate page. Clients
may handle this event to nominate
alternative candidates. *)
END PhysAddr.
INTERFACE VirtAddr;
TYPE T !: REFANY; (* VirtAddr.T is opaque *)
PROCEDURE Allocate(size: Size; attrib: Attrib): T;
PROCEDURE Deallocate(v: T);
END VirtAddr.
INTERFACE Translation;
IMPORT PhysAddr, VirtAddr;
TYPE T !: REFANY; (* Translation.T is opaque *)
PROCEDURE Create(): T;
PROCEDURE Destroy(context: T);
(* Create or destroy an addressing context *)
PROCEDURE AddMapping(context: T; v: VirtAddr.T;
p: PhysAddr.T; prot: Protection);
(* Add [v,p] into the named translation context
with the specified protection. *)
PROCEDURE RemoveMapping(context: T; v: VirtAddr.T);
PROCEDURE ExamineMapping(context: T;
v: VirtAddr.T): Protection;
(* A few events raised during *)
(* illegal translations *)
PROCEDURE PageNotPresent(v: T);
PROCEDURE BadAddress(v: T);
PROCEDURE ProtectionFault(v: T);
END Translation.
Figure 3: The interfaces for managing physical addresses, virtual addresses, and translations.
vice ultimately invalidates any mappings to a reclaimed
page.
The SPIN core services do not define an address space
model directly, but can be used to implement a range
of models using a variety of optimization techniques.
For example, we have built an extension that imple-
ments UNIX address space semantics for applications.
It exports an interface for copying an existing address
space, and for allocating additional memory within one.
For each new address space, the extension allocates a
new context from the translation service. This context
is subsequently filled in with virtual and physical ad-
dress resources obtained from the memory allocation
services. Another kernel extension defines a memory
management interface supporting Mach's task abstrac-
tion [Young et al. 87]. Applications may use these in-
terfaces, or they may define their own in terms of the
lower-level services.
4.2 Extensible thread management
An operating system's thread management system pro-
vides applications with interfaces for scheduling, concur-
rency, and synchronization. Applications, though, can
require levels of functionality and performance that a
thread management system is unable to deliver. User-
level thread management systems have addressed this
mismatch[Wulf et al. 81, Cooper & Draves 88, Marsh
et al. 91, Anderson et al. 92], but only partially.
For example, Mach's user-level C-Threads implemen-
tation [Cooper & Draves 88] can have anomalous be-
havior because it is not well-integrated with kernel ser-
vices [Anderson et al. 92]. In contrast, scheduler acti-
vations, which are integrated with the kernel, have high
communication overhead [Davis et al. 93].
In SPIN an application can provide its own thread
package and scheduler that executes within the kernel.
The thread package defines the application's execution
model and synchronization constructs. The scheduler
controls the multiplexing of the processor across multi-
ple threads. Together these packages allow an applica-
tion to define arbitrary thread semantics and to imple-
ment those semantics close to the processor and other
kernel services.
Although SPIN does not define a thread model for
applications, it does define the structure on which an
implementation of a thread model rests. This structure
is defined by a set of events that are raised or handled
by schedulers and thread packages. A scheduler multi-
plexes the underlying processing resources among com-
peting contexts, called strands. A strand is similar to
a thread in traditional operating systems in that it re-
flects some processor context. Unlike a thread though,
a strand has no minimal or requisite kernel state other
than a name. An application-specific thread package
defines an implementation of the strand interface for its
own threads.
Together, the thread package and the scheduler im-
plement the control flow mechanisms for user-space con-
texts. Figure 4 describes this interface. The interface
contains two events, Block and Unblock, that can be
raised to signal changes in a strand's execution state. A
disk driver can direct a scheduler to block the current
strand during an I/O operation, and an interrupt han-

dler can unblock a strand to signal the completion of the
I/O operation. In response to these events, the sched-
uler can communicate with the thread package man-
aging the strand using Checkpoint and Resume events,
allowing the package to save and restore execution state.
INTERFACE Strand;
TYPE T !: REFANY; (* Strand.T is opaque *)
PROCEDURE Block(s:T);
(* Signal to a scheduler that s is not runnable. *)
PROCEDURE Unblock(s: T);
(* Signal to a scheduler that s is runnable. *)
PROCEDURE Checkpoint(s: T);
(* Signal that s is being descheduled and that it
should save any processor state required for
subsequent rescheduling. *)
PROCEDURE Resume(s: T);
(* Signal that s is being placed on a processor and
that it should reestablish any state saved during
a prior call to Checkpoint. *)
END Strand.
Figure 4: The Strand Interface. This interface describes the
scheduling events affecting control flow that can be raised within
the kernel. Application-specific schedulers and thread packages
install handlers on these events, which are raised on behalf of
particular strands. A trusted thread package and scheduler pro-
vide default implementations of these operations, and ensure that
extensions do not install handlers on strands for which they do
not possess a capability.
Application-specific thread packages only manipulate
the flow of control for application threads executing out-
side of the kernel. For safety reasons, the responsibil-
ity for scheduling and synchronization within the ker-
nel belongs to the kernel. As a thread transfers from
user mode to kernel mode, it is checkpointed and a
Modula-3 thread executes in the kernel on its behalf.
As the Modula-3 thread leaves the kernel, the blocked
application-specific thread is resumed.
A global scheduler implements the primary pro-
cessor allocation policy between strands. Additional
application-specific schedulers can be placed on top
of the global scheduler using Checkpoint and Resume
events to relinquish or receive control of the processor.
That is, an application-specific scheduler presents itself
to the global scheduler as a thread package. The deliv-
ery of the Resume event indicates that the new sched-
uler can schedule its own strands, while Checkpoint sig-
nals that the processor is being reclaimed by the global
scheduler.
The Block and Unblock events, when raised on strands
scheduled by application-specific schedulers, are routed
by the dispatcher to the appropriate scheduling imple-
mentation. This allows new scheduling policies to be
implemented and integrated into the kernel, provided
that an application-specific policy does not conflict with
the global policy. While the global scheduling policy is
replaceable, it cannot be replaced by an arbitrary appli-
cation, and its replacement can have global effects. In
the current implementation, the global scheduler imple-
ments a round-robin, preemptive, priority policy.
We have used the strand interface to implement as
kernel extensions a variety of thread management inter-
faces including DEC OSF/1 kernel threads [Dig 93], C-
Threads [Cooper & Draves 88], and Modula-3 threads.
The implementations of these interfaces are built di-
rectly from strands and not layered on top of others.
The interface supporting DEC OSF/1 kernel threads
allows us to incorporate the vendor's device drivers di-
rectly into the kernel. The C-Threads implementation
supports our UNIX server, which uses the Mach C-
Threads interface for concurrency. Within the kernel,
a trusted thread package and scheduler implements the
Modula-3 thread interface [Nelson 91].
4.3 Implications for trusted services
The processor and memory services are two instances of
SPIN's core services, which provide interfaces to hard-
ware mechanisms. The core services are trusted, which
means that they must perform according to their in-
terface specification. Trust is required because the ser-
vices access underlying hardware facilities and at times
must step outside the protection model enforced by the
language. Without trust, the protection and extension
mechanisms described in the previous section could not
function safely, as they rely on the proper management
of the hardware. Because trusted services mediate ac-
cess to physical resources, applications and extensions
must trust the services that are trusted by the SPIN
kernel.
In designing the interfaces for SPIN's trusted services,
we have worked to ensure that an extension's failure to
use an interface correctly is isolated to the extension
itself (and any others that rely on it). For example,
the SPIN scheduler raises events that are handled by
application-specific thread packages in order to start or
stop threads. Although it is in the handler's best in-
terests to respect, or at least not interfere with, the
semantics implied by the event, this is not enforced.
An application-specific thread package may ignore the
event that a particular user-level thread is runnable,
but only the application using the thread package will
be affected. In this way, the failure of an extension is
no more catastrophic than the failure of code executing
in the runtime libraries found in conventional systems.
5 System performance
In this section we show that SPIN enables applications
to compose system services in order to define new kernel
services that perform well. Specifically, we evaluate the
performance of SPIN from four perspectives:

ffl System size. The size of the system in terms of lines
of code and object size demonstrates that advanced
runtime services do not necessarily create an oper-
ating system kernel of excessive size. In addition,
the size of the system's extensions shows that they
can be implemented with reasonable amounts of
code.
ffl Microbenchmarks. Measurements of low-level sys-
tem services, such as protected communication,
thread management and virtual memory, show that
SPIN's extension architecture enables us to con-
struct communication-intensive services with low
overhead. The measurements also show that con-
ventional system mechanisms, such as a system call
and cross-address space protected procedure call,
have overheads that are comparable to those in con-
ventional systems.
ffl Networking. Measurements of a suite of network-
ing protocols demonstrate that SPIN's extension
architecture enables the implementation of high-
performance network protocols.
ffl End-to-end performance. Finally, we show that
end-to-end application performance can benefit
from SPIN's architecture by describing two appli-
cations that use system extensions.
We compare the performance of operations on three
operating systems that run on the same platform: SPIN
(V0.4 of August 1995), DEC OSF/1 V2.1 which is a
monolithic operating system, and Mach 3.0 which is a
microkernel. We collected our measurements on DEC
Alpha 133MHz AXP 3000/400 workstations, which are
rated at 74 SPECint 92. Each machine has 64 MBs of
memory, a 512KB unified external cache, an HP C2247-
300 1GB disk-drive, a 10Mb/sec Lance Ethernet inter-
face, and a FORE TCA-100 155Mb/sec ATM adapter
card connected to a FORE ASX-200 switch. The FORE
cards use programmed I/O and can maximally deliver
only about 53Mb/sec between a pair of hosts [Brustoloni
& Bershad 93]. We avoid comparisons with operating
systems running on different hardware as benchmarks
tend to scale poorly for a variety of architectural rea-
sons [Anderson et al. 91]. All measurements are taken
while the operating systems run in single-user mode.
5.1 System components
SPIN runs as a standalone kernel on DEC Alpha work-
stations. The system consists of five main components,
sys, core, rt, lib and sal, that support different classes
of service. Table 1 shows the size of each component
in source lines, object bytes, and percentages. The first
component, sys, implements the extensibility machin-
ery, domains, naming, linking, and dispatching. The
second component, core, implements the virtual mem-
ory and scheduling services described in the previous
section, as well as device management, a disk-based and
network-based file system, and a network debugger [Re-
dell 88]. The third component, rt, contains a version of
the DEC SRC Modula-3 runtime system that supports
automatic memory management and exception process-
ing. The fourth component, lib, includes a subset of the
standard Modula-3 libraries and handles many of the
more mundane data structures (lists, queues, hash ta-
bles, etc.) generally required by any operating system
kernel. The final component, sal, implements a low-
level interface to device drivers and the MMU, offering
functionality such as ``install a page table entry,'' ``get
a character from the console,'' and ``read block 22 from
SCSI unit 0.'' We build sal by applying a few dozen file
diffs against a small subset of the files from the DEC
OSF/1 kernel source tree. This approach, while increas-
ing the size of the kernel, allows us to track the vendor's
hardware without requiring that we port SPIN to each
new system configuration.
Component Source size Text size Data size
lines % bytes % bytes %
sys 1646 2.5 42182 5.2 22397 5.0
core 10866 16.5 170380 21.0 89586 20.0
rt 14216 21.7 176171 21.8 104738 23.4
lib 1234 1.9 10752 1.3 3294 .8
sal 37690 57.4 411065 50.7 227259 50.8
Total kernel 65652 100 810550 100 447274 100
Table 1: This table shows the size of different components of the
system. The sys, core and rt components contain the interfaces
visible to extensions. The column labeled ``lines'' does not include
comments. We use the DEC SRC Modula-3 compiler, release 3.5.
5.2 Microbenchmarks
Microbenchmarks reveal the overhead of basic system
functions, such a protected procedure call, thread man-
agement, and virtual memory. They define the bounds
of system performance and provide a framework for
understanding larger operations. Times presented in
this section, measured with the Alpha's internal cycle
counter, are the average of a large number of iterations,
and may therefore be overly optimistic regarding cache
effects [Bershad et al. 92a].
Protected communication
In a conventional operating system, applications, ser-
vices and extensions communicate using two protected
mechanisms: system calls and cross-address space calls.
The first enables applications and kernel services to in-
teract. The second enables interaction between appli-
cations and services that are not part of the kernel.
The overhead of using either of these mechanisms is the
limiting factor in a conventional system's extensibility.
High overhead discourages frequent interaction, requir-
ing that a system be built from coarse-grained interfaces
to amortize the cost of communication over large oper-
ations.

SPIN's extension model offers a third mechanism
for protected communication. Simple procedure calls,
rather than system calls, can be used for communica-
tion between extensions and the core system. Similarly,
simple procedure calls, rather than cross-address pro-
cedure calls, can be used for communication between
applications and other services installed into the kernel.
In Table 2 we compare the performance of the dif-
ferent protected communication mechanisms when in-
voking the ``null procedure call'' on DEC OSF/1, Mach,
and SPIN. The null procedure call takes no arguments
and returns no results; it reflects only the cost of con-
trol transfer. The protected in-kernel call in SPIN
is implemented as a procedure call between two do-
mains that have been dynamically linked. Although
this test does not measure data transfer, the overhead
of passing arguments between domains, even large ar-
guments, is small because they can be passed by ref-
erence. System call overhead reflects the time to cross
the user-kernel boundary, execute a procedure and re-
turn. In Mach and DEC OSF/1, system calls flow from
the trap handler through to a generic, but fixed, sys-
tem call dispatcher, and from there to the requested
system call (written in C). In SPIN, the kernel's trap
handler raises a Trap.SystemCall event which is dis-
patched to a Modula-3 procedure installed as a handler.
The third line in the table shows the time to perform
a protected, cross-address space procedure call. DEC
OSF/1 supports cross-address space procedure call us-
ing sockets and SUN RPC. Mach provides an optimized
path for cross-address space communication using mes-
sages [Draves 94]. SPIN's cross-address space procedure
call is implemented as an extension that uses system
calls to transfer control in and out of the kernel and
cross-domain procedure calls within the kernel to trans-
fer control between address spaces.
Operation DEC OSF/1 Mach SPIN
Protected in-kernel call n/a n/a .13
System call 5 7 4
Cross-address space call 845 104 89
Table 2: Protected communication overhead in microseconds.
Neither DEC OSF/1 nor Mach support protected in-kernel com-
munication.
The table illustrates two points about communication
and system structure. First, the overhead of protected
communication in SPIN can be that of procedure call
for extensions executing in the kernel's address space.
SPIN's protected in-kernel calls provide the same func-
tionality as cross-address space calls in DEC OSF/1 and
Mach, namely the ability to execute arbitrary code in
response to an application's call. Second, SPIN's ex-
tensible architecture does not preclude the use of tradi-
tional communication mechanisms having performance
comparable to that in non-extensible systems. However,
the disparity between the performance of a protected in-
kernel call and the other mechanisms encourages the use
of in-kernel extensions.
SPIN's in-kernel protected procedure call time is con-
servative. Our Modula-3 compiler generates code for
which an intermodule call is roughly twice as slow as an
intramodule call. A more recent version of the Modula-3
compiler corrects this disparity. In addition, our com-
piler does not perform inlining, which can be an impor-
tant optimization when calling many small procedures.
These optimizations do not affect the semantics of the
language and will therefore not change the system's pro-
tection model.
Thread management
Thread management packages implement concurrency
control operations using underlying kernel services. As
previously mentioned, SPIN's in-kernel threads are im-
plemented with a trusted thread package exporting the
Modula-3 thread interface. Application-specific exten-
sions also rely on threads executing in the kernel to im-
plement their own concurrent operations. At user level,
thread management overhead determines the granular-
ity with which threads can be used to control concurrent
user-level operations.
Table 3 shows the overhead of thread management
operations for kernel and user threads using the differ-
ent systems. Fork-Join measures the time to create,
schedule, and terminate a new thread, synchronizing
the termination with another thread. Ping-Pong reflects
synchronization overhead, and measures the time for a
pair of threads to synchronize with one another; the first
thread signals the second and blocks, then the second
signals the first and blocks.
We measure kernel thread overheads using the na-
tive primitives provided by each kernel (thread sleep and
thread wakeup in DEC OSF/1 and Mach, and locks with
condition variables in SPIN). At user-level, we measure
the performance of the same program using C-Threads
on Mach and SPIN, and P-Threads, a C-Threads super-
set, on DEC OSF/1. The table shows measurements for
two implementations of C-Threads on SPIN. The first
implementation, labeled ``layered,'' is implemented as
a user-level library layered on a set of kernel extensions
that implement Mach's kernel thread interface. The sec-
ond implementation, labeled ``integrated,'' is structured
as a kernel extension that exports the C-Threads inter-
face using system calls. The latter version uses SPIN's
strand interface, and is integrated with the scheduling
behavior of the rest of the kernel. The table shows
that SPIN's extensible thread implementation does not
incur a performance penalty when compared to non-
extensible ones, even when integrated with kernel ser-
vices.
Virtual memory
Applications can exploit the virtual memory fault path
to extend system services [Appel & Li 91]. For example,
concurrent and generational garbage collectors can use
write faults to maintain invariants or collect reference
information. A longstanding problem with fault-based

DEC OSF/1 Mach SPIN
kernel user kernel user kernel user
Operation layered integrated
Fork-Join 198 1230 101 338 22 262 111
Ping-Pong 21 264 71 115 17 159 85
Table 3: Thread management overhead in microseconds.
strategies has been the overhead of handling a page fault
in an application [Thekkath & Levy 94, Anderson et al.
91]. There are two sources of this overhead. First, han-
dling each fault in a user application requires crossing
the user/kernel boundary several times. Second, con-
ventional systems provide quite general exception inter-
faces that can perform many functions at once. As a
result, applications requiring only a subset of the inter-
face's functionality must pay for all of it. SPIN allows
applications to define specialized fault handling exten-
sions to avoid user/kernel boundary crossings and im-
plement precisely the functionality that is required.
Table 4 shows the time to execute several commonly
referenced virtual memory benchmarks [Appel & Li
91, Engler et al. 95]. The line labeled Dirty in the
table measures the time for an application to query the
status of a particular virtual page. Neither DEC OSF/1
nor Mach provide this facility. The time shown in the
table is for an extension to invoke the virtual memory
system; an additional 4 microseconds (system call time)
is required to invoke the service from user level. Trap
measures the latency between a page fault and the time
when a handler executes. Fault is the perceived latency
of the access from the standpoint of the faulting thread.
It measures the time to reflect a page fault to an appli-
cation, enable access to the page within a handler, and
resume the faulting thread. Prot1 measures the time
to increase the protection of a single page. Similarly,
Prot100 and Unprot100 measure the time to increase
and decrease the protection over a range of 100 pages.
Mach's unprotection is faster than protection since the
operation is performed lazily; SPIN's extension does not
lazily evaluate the request, but enables the access as re-
quested. Appel1 and Appel2 measure a combination of
traps and protection changes. The Appel1 benchmark
measures the time to fault on a protected page, resolve
the fault in the handler, and protect another page in
the handler. Appel2 measures the time to protect 100
pages, and fault on each one, resolving the fault in the
handler (Appel2 is shown as the average cost per page).
SPIN outperforms the other systems on the virtual
memory benchmarks for two reasons. First, SPIN uses
kernel extensions to define application-specific system
calls for virtual memory management. The calls pro-
vide access to the virtual and physical memory inter-
faces described in the previous section, and install han-
dlers for Translation.ProtectionFault events that occur
within the application's virtual address space. In con-
trast, DEC OSF/1 requires that applications use the
UNIX signal and mprotect interfaces to manage virtual
memory, and Mach requires that they use the exter-
nal pager interface [Young et al. 87]. Neither signals
nor external pagers, though, have especially efficient im-
plementations, as the focus of each is generalized func-
tionality [Thekkath & Levy 94]. The second reason for
SPIN's dominance is that each virtual memory event,
which requires a series of interactions between the ker-
nel and the application, is reflected to the application
through a fast in-kernel protected procedure call. DEC
OSF/1 and Mach, though, communicate these events
by means of more expensive traps or messages.
Operation DEC OSF/1 Mach SPIN
Dirty n/a n/a 2
Fault 329 415 29
Trap 260 185 7
Prot1 45 106 16
Prot100 1041 1792 213
Unprot100 1016 302 214
Appel1 382 819 39
Appel2 351 608 29
Table 4: Virtual memory operation overheads in microseconds.
Neither DEC OSF/1 nor Mach provide an interface for querying
the internal state of a page frame.
5.3 Networking
We have used SPIN's extension architecture to imple-
ment a set of network protocol stacks for Ethernet and
ATM networks [Fiuczynski & Bershad 96]. Figure 5 il-
lustrates the structure of the protocol stacks, which are
similar to the x-kernel's [Hutchinson et al. 89] except
that SPIN permits user code to be dynamically placed
within the stack. Each incoming packet is ``pushed''
through the protocol graph by events and ``pulled'' by
handlers. The handlers at the top of the graph can pro-
cess the message entirely within the kernel, or copy it
out to an application. The RPC and A.M. extensions,
for example, implement the network transport for a re-
mote procedure call package and active messages [von
Eicken et al. 92]. The video extension provides a di-
rect path for video packets from the network to the
framebuffer. The UDP and TCP extensions support
the Internet protocols. 2 The Forward extension pro-
vides transparent UDP/IP and TCP/IP forwarding for
packets arriving on a specific port. Finally, the HTTP
extension implements the HyperText Transport Proto-
col [Berners-Lee et al. 94] directly within the kernel,
enabling a server to respond quickly to HTTP requests
by splicing together the protocol stack and the local file
system.
Latency and Bandwidth
Table 5 shows the round trip latency and reliable band-
width between two applications using UDP/IP on DEC
2 We currently use the DEC OSF/1 TCP engine as a SPIN
extension, and manually assert that the code, which is written in
C, is safe.

Ping
IP
UDP TCP
ICMP
A.M. Video
ATM.PktArrived
Fore
device driver
Event
Handler
Lance
device driver
Ether.PktArrived
ICMP.PktArrived TCP.PktArrived
IP.PktArrived
HTTP
RPC
UDP.PktArrived
Forward
Event
Figure 5: This figure shows a protocol stack that routes incom-
ing network packets to application-specific endpoints within the
kernel. Ovals represent events raised to route control to handlers,
which are represented by boxes. Handlers implement the protocol
corresponding to their label.
OSF/1 and SPIN. For DEC OSF/1, the application
code executes at user level, and each packet sent in-
volves a trap and several copy operations as the data
moves across the user/kernel boundary. For SPIN, the
application code executes as an extension in the kernel,
where it has low-latency access to both the device and
data. Each incoming packet causes a series of events
to be generated for each layer in the UDP/IP proto-
col stack (Ethernet/ATM, IP, UDP) shown in Figure 5.
For SPIN, protocol processing is done by a separately
scheduled kernel thread outside of the interrupt handler.
We do not present networking measurements for Mach,
as the system neither provides a path to the Ethernet
more efficient than DEC OSF/1, nor supports our ATM
card.
Latency Bandwidth
DEC OSF/1 SPIN DEC OSF/1 SPIN
Ethernet 789 565 8.9 8.9
ATM 631 421 27.9 33
Table 5: Network protocol latency in microseconds and receive
bandwidth in Mb/sec. We measure latency using small packets
(16 bytes), and bandwidth using large packets (1500 for Ethernet
and 8132 for ATM).
The table shows that processing packets entirely
within the kernel can reduce round-trip latency when
compared to a system in which packets are handled in
user space. Throughput, which tends not to be latency
sensitive, is roughly the same on both systems.
We use the same vendor device drivers for both DEC
OSF/1 and SPIN to isolate differences due to system
architecture from those due to the characteristics of the
underlying device driver. Neither the Lance Ethernet
driver nor the FORE ATM driver are optimized for la-
tency [Thekkath & Levy 93], and only the Lance Ether-
net driver is optimized for throughput. Using different
device drivers we achieve a round-trip latency of 337
痵ecs on Ethernet and 241 痵ecs on ATM, while reli-
able ATM bandwidth between a pair of hosts rises to
41 Mb/sec. We estimate the minimum round trip time
using our hardware at roughly 250痵ecs on Ethernet and
100痵ecs on ATM. The maximum usable Ethernet and
ATM bandwidths between a pair of hosts are roughly 9
Mb/sec and 53Mb/sec.
Protocol forwarding
SPIN's extension architecture can be used to provide
protocol functionality not generally available in con-
ventional systems. For example, some TCP redirection
protocols [Balakrishnan et al. 95] that have otherwise
required kernel modifications can be straightforwardly
defined by an application as a SPIN extension. A for-
warding protocol can also be used to load balance ser-
vice requests across multiple servers.
In SPIN an application installs a node into the pro-
tocol stack which redirects all data and control packets
destined for a particular port number to a secondary
host. We have implemented a similar service using DEC
OSF/1 with a user-level process that splices together
an incoming and outgoing socket. The DEC OSF/1
forwarder is not able to forward protocol control pack-
ets because it executes above the transport layer. As
a result it cannot maintain a protocol's end-to-end se-
mantics. In the case of TCP, end-to-end connection
establishment and termination semantics are violated.
A user-level intermediary also interferes with the proto-
col's algorithms for window size negotiation, slow start,
failure detection, and congestion control, possibly de-
grading the overall performance of connections between
the hosts. Moreover, on the user-level forwarder, each
packet makes two trips through the protocol stack where
it is twice copied across the user/kernel boundary. Ta-
ble 6 compares the latency for the two implementations,
and reveals the additional work done by the user-level
forwarder.
TCP UDP
DEC OSF/1 SPIN DEC OSF/1 SPIN
Ethernet 2080 1420 1607 1344
ATM 1730 1067 1389 1024
Table 6: Round trip latency in microseconds to route 16 byte
packets through a protocol forwarder.
5.4 End-to-end performance
We have implemented several applications that exploit
SPIN's extensibility. One is a networked video system
that consists of a server and a client viewer. The server
is structured as three kernel extensions, one that uses
the local file system to read video frames from the disk,
another that sends the video out over the network, and a
third that registers itself as a handler on the SendPacket

event, transforming the single send into a multicast to
a list of clients. The server transmits 30 frames per
second to each client. On the client, an extension awaits
incoming video packets, decompresses and writes them
directly to the frame buffer using the structure shown
in Figure 5.
Because each outgoing packet is pushed through the
protocol graph only once, and not once per client
stream, SPIN's server can support a larger number of
clients than one that processes each packet in isolation.
To show this, we measure processor utilization as a func-
tion of the number of clients for the SPIN server and for
a server that runs on DEC OSF/1. The DEC OSF/1
server executes in user space and communicates with
clients using sockets; each outgoing packet is copied into
the kernel and is pushed through the kernel's protocol
stack into the device driver. We determine processor
utilization by measuring the progress of a low-priority
idle thread that executes on the server.
Using the FORE interface, we find that both SPIN
and DEC OSF/1 consume roughly the same fraction of
the server's processor for a given number of clients. Al-
though the SPIN server does less work in the protocol
stack, the majority of the server's CPU resources are
consumed by the programmed I/O that copies data to
the network one word at a time. Using a network inter-
face that supports DMA, though, we find that the SPIN
server's processor utilization grows less slowly than the
DEC OSF/1 server's. Figure 6 shows server proces-
sor utilization as a function of the number of supported
client streams when the server is configured with a Dig-
ital T3PKT adapter. The ``T3'' is an experimental net-
work interface that can send 45 Mb/sec using DMA. We
use the same device driver in both operating systems.
At 15 streams, both SPIN and DEC OSF/1 saturate
the network, but SPIN consumes only half as much of
the processor. Compared to DEC OSF/1, SPIN can
support more clients on a faster network, or as many
clients on a slower processor.
Another application that can benefit from SPIN's
architecture is a web server. To service requests
quickly, a web server should cache recently accessed
objects, not cache large objects that are infrequently
accessed [Chankhunthod et al. 95], and avoid double
buffering with other caching agents [Stonebraker 81].
A server that does not itself cache but is built on top
of a conventional caching file system avoids the double
buffering problem, but is unable to control the caching
policy. In contrast, a server that controls its own cache
on top of the file system's suffers from double buffering.
SPIN allows a server to both control its cache and
avoid the problem of double buffering. A SPIN web
server implements its own hybrid caching policy based
on file type: LRU for small files, and no-cache for large
files which tend to be accessed infrequently. The client-
side latency of an HTTP transaction to a SPIN web
server running as a kernel extension is 5 milliseconds
when the requested file is in the server's cache. Oth-
erwise, the server goes through a non-caching file sys-
0
5
10
15
20
25
30
35
40
45
2 4 6 8 10 12 14
CPU
Utilization
Number of Clients
SPIN T3 Driver
DEC OSF/1 T3 Driver
Figure 6: Server utilization as a function of the number of client
video streams. Each stream requires approximately 3 Mb/sec.
tem to find the file. A comparable user-level web server
on DEC OSF/1 that relies on the operating system's
caching file system (no double buffering) takes about 8
milliseconds per request for the same cached file.
5.5 Other issues
Scalability and the dispatcher
SPIN's event dispatcher matches event raisers to han-
dlers. Since every procedure in the system is effectively
an event, the latency of the dispatcher is critical. As
mentioned, in the case of a single synchronous han-
dler, an event raise is implemented as a procedure call
from the raiser to the handler. In other cases, such as
when there are many handlers registered for a particular
event, the dispatcher takes a more active role in event
delivery. For each guard/handler pair installed on an
event, the dispatcher evaluates the guard and, if true,
invokes the handler. Consequently, dispatcher latency
depends on the number and complexity of the guards,
and the number of event handlers ultimately invoked.
In practice, the overhead of an event dispatch is linear
with the number of guards and handlers installed on
the event. For example, round trip Ethernet latency,
which we measure at 565 痵ecs, rises to about 585 痵ecs
when 50 additional guards and handlers register inter-
est in the arrival of some UDP packet but all 50 guards
evaluate to false. When all 50 guards evaluate to true,
latency rises to 637 痵ecs. Presently, we perform no
guard-specific optimizations such as evaluating common
subexpressions [Yuhara et al. 94] or representing guard
predicates as decision trees. As the system matures, we
plan to apply these optimizations.

Impact of automatic storage management
An extensible system cannot depend on the correctness
of unprivileged clients for its memory integrity. As pre-
viously mentioned, memory management schemes that
allow extensions to return objects to the system heap are
unsafe because a rogue client can violate the type system
by retaining a reference to a freed object. SPIN uses a
trace-based, mostly-copying, garbage collector [Bartlett
88] to safely reclaim memory resources. The collector
serves as a safety net for untrusted extensions, and en-
sures that resources released by an extension, either
through inaction or as a result of premature termina-
tion, are eventually reclaimed.
Clients that allocate large amounts of memory can
trigger frequent garbage collections with adverse global
effects. In practice, this is less of a problem than might
be expected because SPIN and its extensions avoid allo-
cation on fast paths. For example, none of the measure-
ments presented in this section change when we disable
the collector during the tests. Even in systems with-
out garbage collection, generalized allocation is avoided
because of its high latency. Instead, subsystems imple-
ment their own allocators optimized for some expected
usage pattern. SPIN services do this as well and for the
same reason (dynamic memory allocation is relatively
expensive). As a consequence, there is less pressure on
the collector, and the pressure is least likely to be ap-
plied during a critical path.
Size of extensions
Table 7 shows the size of some of the extensions de-
scribed in this section. SPIN extensions tend to require
an amount of code commensurate with their functional-
ity. For example, the Null syscall and IPC extensions,
are conceptually simple, and also have simple imple-
mentations. Extensions tend to import relatively few
(about a dozen) interfaces, and use the domain and
event system in fairly stylized ways. As a result, we
have not found building extensions to be exceptionally
difficult. In contrast, we had more trouble correctly im-
plementing a few of our benchmarks on DEC OSF/1
or Mach, because we were sometimes forced to follow
circuitous routes to achieve a particular level of func-
tionality. Mach's external pager interface, for instance,
required us to implement a complete pager in user space,
although we were only interested in discovering write
protect faults.
6 Experiences with Modula-3
Our decision to use Modula-3 was made with some care.
Originally, we had intended to define and implement a
compiler for a safe subset of C. All of us, being C pro-
grammers, were certain that it was infeasible to build
an efficient operating system without using a language
having the syntax, semantics and performance of C. As
the design of our safe subset proceeded, we faced the dif-
Component Source size Text size Data size
lines bytes bytes
NULL syscall 19 96 656
IPC 127 1344 1568
CThreads 219 2480 1792
DEC OSF/1 threads 305 2304 3488
VM workload 263 5712 1472
IP 744 19008 13088
UDP 1046 23968 16704
TCP 5077 69040 9840
HTTP 392 5712 4176
TCP Forward 187 4592 2080
UDP Forward 138 4592 2144
Video Client 95 2736 1952
Video Server 304 9228 3312
Table 7: This table shows the size of some different system
extensions described in this paper.
ficult issues that typically arise in any language design
or redesign. For each major issue that we considered
in the context of a safe version of C (type semantics,
objects, storage management, naming, etc.), we found
the issue already satisfactorily addressed by Modula-3.
Moreover, we understood that the definition of our ser-
vice interfaces was more important than the language
with which we implemented them.
Ultimately, we decided to use Modula-3 for both the
system and its extensions. Early on we found evidence
to abandon our two main prejudices about the language:
that programs written in it are slow and large, and that
C programmers could not be effective using another lan-
guage. In terms of performance, we have found nothing
remarkable about the language's code size or execution
time, as shown in the previous section. In terms of pro-
grammer effectiveness, we have found that it takes less
than a day for a competent C programmer to learn the
syntax and more obvious semantics of Modula-3, and
another few days to become proficient with its more
advanced features. Although anecdotal, our experience
has been that the portions of the SPIN kernel written
in Modula-3 are much more robust and easier to under-
stand than those portions written in C.
7 Conclusions
The SPIN operating system demonstrates that it is pos-
sible to achieve good performance in an extensible sys-
tem without compromising safety. The system provides
a set of efficient mechanisms for extending services, as
well as a core set of extensible services. Co-location,
enforced modularity, logical protection domains and dy-
namic call binding allow extensions to be dynamically
defined and accessed at the granularity of a procedure
call.
In the past, system builders have only relied on
the programming language to translate operating sys-
tem policies and mechanisms into machine code. Us-
ing a programming language with the appropriate fea-
tures, we believe that operating system implementors
can more heavily rely on compiler and language run-

time services to construct systems in which structure
and performance are complementary.
Additional information about the SPIN project is
available at http://www-spin.cs.washington.edu, an Al-
pha workstation running SPIN and the HTTP extension
described in this paper.
Acknowledgements
Many people have contributed to the SPIN project.
David Dion has been responsible for bringing up the
system's UNIX server. Jan Sanislo made it possible for
us to use the DEC OSF/1 SCSI driver from SPIN. An-
thony Lamarca, Dylan McNamee, Geoff Voelker, and
Alec Wolman assisted in understanding system perfor-
mance on DEC OSF/1 and Mach. David Nichols, Hank
Levy, and Terri Watson provided feedback on earlier
drafts of this paper. David Boggs provided us with the
T3 cards that we used in the video server experiment.
Special thanks are due to DEC SRC, who provided us
with much of our compiler infrastructure.
References
[Abrossimov et al. 89] Abrossimov, V., Rozier, M., and Shapiro,
M. Generic Virtual Memory Management for Operating
System Kernels. In Proceedings of the Thirteenth ACM
Symposium on Operating Systems Principles, pages 123--
136, Litchfield Park, AZ, December 1989.
[Anderson et al. 91] Anderson, T. E., Levy, H. M., Bershad,
B. N., and Lazowska, E. D. The Interaction of Architecture
and Operating System Design. In Proceedings of the Fourth
International Conference on Architectural Support for Pro-
gramming Languages and Operating Systems (ASPLOS-
IV), pages 108--120, Santa Clara, CA, April 1991.
[Anderson et al. 92] Anderson, T. E., Bershad, B. N., Lazowska,
E. D., and Levy, H. M. Scheduler Activations: Effec-
tive Kernel Support for the User-Level Management of
Parallelism. ACM Transactions on Computer Systems,
10(1):53--79, February 1992.
[Appel & Li 91] Appel, W. and Li, K. Virtual Memory Primi-
tives for User Programs. In Proceedings of the Fourth In-
ternational Conference on Architectural Support for Pro-
gramming Languages and Operating Systems (ASPLOS-
IV), pages 96--107, Santa Clara, CA, April 1991.
[Bala et al. 94] Bala, K., Kaashoek, M. F., and Weihl, W. E.
Software Prefetching and Caching for Translation Looka-
side Buffers. In Proceedings of the First USENIX Sym-
posium on Operating Systems Design and Implementation
(OSDI), pages 243--253, Monterey, CA, November 1994.
[Balakrishnan et al. 95] Balakrishnan, H., Seshan, S., Amir, E.,
and Katz., R. H. Improving TCP/IP Performance over
Wireless Networks. In Proceedings of the First ACM Con-
ference on Mobile Computing and Networking, November
1995.
[Barrera 91] Barrera, J. S. A Fast Mach Network IPC Imple-
mentation. In Proceedings of the Second USENIX Mach
Symposium, pages 1--11, Monterey, CA, November 1991.
[Bartlett 88] Bartlett, J. F. Compacting Garbage Collection with
Ambiguous Roots. Technical Report WRL-TR-88-2, Digi-
tal Equipment Corporation Western Research Labs, Febru-
ary 1988.
[Berners-Lee et al. 94] Berners-Lee, T., Cailliau, R., Luotonen,
A., Nielsen, H. F., and Secretr, A. The World-Wide Web.
Communications of the ACM, 37(8):76--82, August 1994.
[Bershad 93] Bershad, B. N. Practical Considerations for Non-
Blocking Concurrent Objects. In Proceedings of the Thir-
teenth International Conference on Distributed Computing
Systems, pages 264--274, Pittsburgh, PA, May 1993.
[Bershad et al. 90] Bershad, B. N., Anderson, T. E., Lazowska,
E. D., and Levy, H. M. Lightweight Remote Procedure
Call. ACM Transactions on Computer Systems, 8(1):37--
55, February 1990.
[Bershad et al. 92a] Bershad, B. N., Draves, R. P., and Forin, A.
Using Microbenchmarks to Evaluate System Performance.
In Proceedings of the Third Workshop on Workstation Op-
erating Systems, pages 148--153, Key Biscayne, FL, April
1992.
[Bershad et al. 92b] Bershad, B. N., Redell, D. D., and Ellis, J. R.
Fast Mutual Exclusion for Uniprocessors. In Proceedings of
the Fifth International Conference on Architectural Sup-
port for Programming Languages and Operating Systems
(ASPLOS-V), pages 223--233, Boston, MA, October 1992.
[Black et al. 92] Black, D. L. et al. Microkernel Operating Sys-
tem Architecture and Mach. In Proceedings of the USENIX
Workshop on Micro-Kernels and Other Kernel Architec-
tures, pages 11--30, Seattle, WA, April 1992.
[Bricker et al. 91] Bricker, A., Gien, M., Guillemont, M., Lip-
kis, J., Orr, D., and Rozier, M. A New Look at Micro-
kernel-based UNIX Operating Systems: Lessons in Perfor-
mance and Compatibility. In Proceedings of the EurOpen
Spring'91 Conference, Tromsoe, Norway, May 1991.
[Brockschmidt 94] Brockschmidt, K. Inside OLE 2. Microsoft
Press, 1994.
[Brustoloni & Bershad 93] Brustoloni, J. C. and Bershad, B. N.
Simple Protocol Processing for High-Bandwidth Low-
Latency Networking. Technical Report CMU-CS-93-132,
Carnegie Mellon University, March 1993.
[Cao et al. 94] Cao, P., Felten, E. W., and Li, K. Implementation
and Performance of Application-Controlled File Caching.
In Proceedings of the First USENIX Symposium on Oper-
ating Systems Design and Implementation (OSDI), pages
165--177, Monterey, CA, November 1994.
[Carter et al. 91] Carter, J. B., Bennett, J. K., and Zwaenepoel,
W. Implementation and Performance of Munin. In Pro-
ceedings of the Thirteenth ACM Symposium on Operating
Systems Principles, pages 152--64, Pacific Grove, CA, Oc-
tober 1991.
[Carter et al. 94] Carter, N. P., Keckler, S. W., and Dally, W. J.
Hardware Support for Fast Capability-Based Addressing.
In Proceedings of the Sixth International Conference on
Architectural Support for Programming Languages and Op-
erating Systems (ASPLOS-VI), pages 319--327, San Jose,
CA, October 1994.
[Chankhunthod et al. 95] Chankhunthod, A., Danzig, P., Neer-
daels, C., Schwartz, M., and Worrell, K. A Hierarchical
Internet Object Cache. Technical Report CU-CS-766-95,
DCS University of Colorado, July 1995.
[Chen & Bershad 93] Chen, J. B. and Bershad, B. N. The Im-
pact of Operating System Structure on Memory System
Performance. In Proceedings of the Fourteenth ACM Sym-
posium on Operating Systems Principles, pages 120--133,
Asheville, NC, December 1993.
[Cheriton & Duda 94] Cheriton, D. R. and Duda, K. J. A
Caching Model of Operating System Kernel Functionality.
In Proceedings of the First USENIX Symposium on Oper-
ating Systems Design and Implementation (OSDI), pages
179--194, Monterey, CA, November 1994.
[Cheriton & Zwaenepoel 83] Cheriton, D. R. and Zwaenepoel,
W. The Distributed V Kernel and its Performance for Disk-
less Workstations. In Proceedings of the Ninth ACM Sym-
posium on Operating Systems Principles, pages 129--140,
Bretton Woods, NH, October 1983.

[Colwell 85] Colwell, R. The Performance Effects of Func-
tional Migration and Architectural Complexity in Object-
Oriented Systems. Technical Report CMU-CS-85-159,
Carnegie Mellon University, August 1985.
[Cooper & Draves 88] Cooper, E. C. and Draves, R. P. C
Threads. Technical Report CMU-CS-88-154, Carnegie Mel-
lon University, June 1988.
[Cooper et al. 91] Cooper, E., Harper, R., and Lee, P. The
Fox Project: Advanced Development of Systems Software.
Technical Report CMU-CS-91-178, Carnegie Mellon Uni-
versity, August 1991.
[Davis et al. 93] Davis, P.-B., McNamee, D., Vaswani, R., and
Lazowska, E. Adding Scheduler Activations to Mach 3.0.
In Proceedings of the Third USENIX Mach Symposium,
pages 119--136, Santa Fe, NM, April 1993.
[Dig 93] Digital Equipment Corporation. DEC OSF/1 Writing
Device Drivers: Advanced Topics, 1993.
[Draves 93] Draves, R. The Case for Run-Time Replaceable Ker-
nel Modules. In Proceedings of the Fourth Workshop on
Workstation Operating Systems, pages 160--164, Napa, CA,
October 1993.
[Draves 94] Draves, R. P. Control Transfer in Operating System
Kernels. Technical Report CMU-CS-94-142, Carnegie Mel-
lon University, May 1994.
[Draves et al. 91] Draves, R. P., Bershad, B. N., Rashid, R. F.,
and Dean, R. W. Using Continuations to Implement
Thread Management and Communication in Operating
Systems. In Proceedings of the Thirteenth ACM Sympo-
sium on Operating Systems Principles, pages 122--136, Pa-
cific Grove, CA, October 1991.
[Engler & Kaashoek 95] Engler, D. and Kaashoek, M. F. Exter-
minate All Operating System Abstractions. In Proceedings
of the Fifth Workshop on Hot Topics in Operating Systems,
pages 78--83, Orcas Island, WA, May 1995.
[Engler & Proebsting 94] Engler, D. R. and Proebsting, T. A.
DCG: An Efficient, Retargettable Dynamic Code Genera-
tion System. In Proceedings of the Sixth International Con-
ference on Architectural Support for Programming Lan-
guages and Operating Systems (ASPLOS-VI), pages 263--
272, San Jose, CA, October 1994.
[Engler et al. 94] Engler, D., Kaashoek, M. F., and O'Toole, J.
The Operating System Kernel as a Secure Programmable
Machine. In Proceedings of the 1994 ACM European
SIGOPS Workshop, September 1994.
[Engler et al. 95] Engler, D. R., Kaashoek, M. F., and Jr,
J. O. Exokernel: An Operating System Architecture for
Application-Level Resource Management. In Proceedings
of the Fifteenth ACM Symposium on Operating Systems
Principles, Copper Mountain, CO, December 1995.
[Fall & Pasquale 94] Fall, K. and Pasquale, J. Improving
Continuous-Media Playback Performance with In-Kernel
Data Paths. In Proceedings of the First IEEE Interna-
tional Conference on Multimedia Computing and Systems,
pages 100--109, Boston, MA, May 1994.
[Felten 92] Felten, E. W. The Case for Application-Specific Com-
munication Protocols. In Intel Supercomputer Systems
Technology Focus Conference, pages 171--181, April 1992.
[Fiuczynski & Bershad 96] Fiuczynski, M. and Bershad, B. An
Extensible Protocol Architecture for Application-Specific
Networking. In Proceedings of the 1996 Winter USENIX
Conference, San Diego, CA, January 1996.
[Forin et al. 91] Forin, A., Golub, D., and Bershad, B. N. An
I/O System for Mach 3.0. In Proceedings of the Second
USENIX Mach Symposium, pages 163--176, Monterey, CA,
November 1991.
[Geschke et al. 77] Geschke, C., Morris, J., and Satterthwaite,
E. Early Experiences with Mesa. Communications of the
ACM, 20(8):540--553, August 1977.
[Golub et al. 90] Golub, D., Dean, R., Forin, A., and Rashid,
R. Unix as an Application Program. In Proceedings of
the 1990 Summer USENIX Conference, pages 87--95, June
1990.
[Hamilton & Kougiouris 93] Hamilton, G. and Kougiouris, P.
The Spring Nucleus: A Microkernel for Objects. In Pro-
ceedings of the 1993 Summer USENIX Conference, pages
147--159, Cincinnati, OH, June 1993.
[Harty & Cheriton 91] Harty,
K. and Cheriton, D. R. Application-Controlled Physical
Memory using External Page-Cache Management. In Pro-
ceedings of the Fourth International Conference on Archi-
tectural Support for Programming Languages and Operat-
ing Systems (ASPLOS-IV), pages 187--197, Santa Clara,
CA, April 1991.
[Heidemann & Popek 94] Heidemann, J. and Popek, G. File-
System Development with Stackable Layers. Communi-
cations of the ACM, 12(1):58--89, February 1994.
[Hildebrand 92] Hildebrand, D. An Architectural Overview of
QNX. In Proceedings of the USENIX Workshop on Micro-
Kernels and Other Kernel Architectures, pages 113--126,
Seattle, WA, April 1992.
[Hutchinson et al. 89] Hutchinson, N. C., Peterson, L., Abbott,
M. B., and O'Malley, S. RPC in x-kernel: Evaluating New
Design Techniques. In Proceedings of the Thirteenth ACM
Symposium on Operating Systems Principles, pages 91--
101, Litchfield Park, AZ, December 1989.
[Int 81] Intel Corporation. Introduction to the iAPX 432 Archi-
tecture, 1981.
[Int 90] Intel Corporation. i486 Microprocessor Programmer's
Reference Manual, 1990.
[Khalidi & Nelson 93] Khalidi, Y. A. and Nelson, M. An Im-
plementation of UNIX on an Object-Oriented Operating
System. In Proceedings of the 1993 Winter USENIX Con-
ference, pages 469--480, San Diego, CA, January 1993.
[Lazowska et al. 81] Lazowska, E. D., Levy, H. M., Almes, G. T.,
Fischer, M., Fowler, R., and Vestal, S. The Architecture
of the Eden System. In Proceedings of the Eighth ACM
Symposium on Operating Systems Principles, pages 148---
159, December 1981.
[Lee et al. 94] Lee, C. H., Chen, M. C., and Chang, R. C. HiPEC:
High Performance External Virtual Memory Caching. In
Proceedings of the First USENIX Symposium on Operating
Systems Design and Implementation (OSDI), pages 153--
164, Monterey, CA, November 1994.
[Liedtke 92] Liedtke, J. Fast Thread Management and Com-
munication Without Continuations. In Proceedings of the
USENIX Workshop on Micro-Kernels and Other Kernel
Architectures, pages 213--221, Seattle, WA, April 1992.
[Liedtke 93] Liedtke, J. Improving IPC by Kernel Design. In
Proceedings of the Fourteenth ACM Symposium on Oper-
ating Systems Principles, pages 175--188, Asheville, NC,
December 1993.
[Lucco 94] Lucco, S. High-Performance Microkernel Systems. In
Proceedings of the First USENIX Symposium on Operat-
ing Systems Design and Implementation (OSDI), page 199,
Monterey, CA, November 1994.
[Maeda & Bershad 93] Maeda, C. and Bershad, B. N. Protocol
Service Decomposition for High-Performance Networking.
In Proceedings of the Fourteenth ACM Symposium on Op-
erating Systems Principles, pages 244--255, Asheville, NC,
December 1993.
[Marsh et al. 91] Marsh, B., Scott, M., LeBlanc, T., and
Markatos, E. First-Class User-Level Threads. In Proceed-
ings of the Thirteenth ACM Symposium on Operating Sys-
tems Principles, pages 110--121, Pacific Grove, CA, Octo-
ber 1991.

[McNamee & Armstrong 90] McNamee, D. and Armstrong, K.
Extending the Mach External Pager Interface to Accommo-
date User-Level Page Replacement Policies. In Proceedings
of the USENIX Mach Symposium, pages 17--29, Burling-
ton, VT, October 1990.
[Mogul et al. 87] Mogul, J., Rashid, R., and Accetta, M. The
Packet Filter: An Efficient Mechanism for User-level Net-
work Code. In Proceedings of the Eleventh ACM Sym-
posium on Operating Systems Principles, pages 39--51,
Austin, TX, November 1987.
[Mossenbock 94] Mossenbock, H. Extensibility in the Oberon
System. Nordic Journal of Computing, 1(1):77--93, Febru-
ary 1994.
[Mullender et al. 90] Mullender, S. J., Rossum, G. V., Tanen-
baum, A. S., Renesse, R. V., and van Staveren, H. Amoeba
-- A Distributed Operating System for the 1990's. IEEE
Computer, pages 44--54, May 1990.
[Nelson 91] Nelson, G., editor. System Programming in Modula-
3. Prentice Hall, 1991.
[Organick 73] Organick, E., editor. Computer System Organiza-
trion: The B5700/B6700 Series. Academic Press, 1973.
[Pardyak & Bershad 94] Pardyak, P. and Bershad, B. A Group
Structuring Mechanism for a Distributed Object Oriented
Language Objects. In Proceedings of the Fourteenth Inter-
national Conference on Distributed Computing Systems,
pages 312--219, Poznan, Poland, June 1994.
[Rashid et al. 87] Rashid, R., Tevanian, Jr., A., Young, M.,
Golub, D., Baron, R., Black, D., Bolosky, W., and Chew,
J. Machine-Independent Virtual Memory Management for
Paged Uniprocessor and Multiprocessor Architectures. In
Proceedings of the Second International Conference on Ar-
chitectural Support for Programming Languages and Oper-
ating Systems (ASPLOS-II), pages 31--39, Palo Alto, CA,
April 1987.
[Redell 88] Redell, D. Experience with Topaz Teledebugging. In
Proceedings of the ACM SIGPLAN and SIGOPS Work-
shop on Parallel and Distributed Debugging, October 1988.
[Redell et al. 80] Redell, D. D., Dalal, Y. K., Horsley, T. R.,
Lauer, H. C., Lynch, W. C., McJones, P. R., Murray, H. G.,
and Purcell, S. C. Pilot: An Operating System for a Per-
sonal Computer. Communications of the ACM, 23(2):81--
92, February 1980.
[Romer et al. 94] Romer, T. H., Lee, D., and Bershad, B. N. Dy-
namic Page Mapping Policies for Cache Conflict Resolu-
tion on Standard Hardware. In Proceedings of the First
USENIX Symposium on Operating Systems Design and
Implementation (OSDI), pages 255--266, Monterey, CA,
November 1994.
[Romer et al. 95] Romer, T., Ohlrich, W., Karlin, A., and Ber-
shad, B. Reducing TLB and Memory Overhead Using On-
line Superpage Promotion. In Proceedings of the Twenty-
Third International Symposium on Computer Architecture,
pages 176--187, 1995.
[Rozier et al. 88] Rozier, M., Abrossimov, V., Armand, F., Boule,
I., Giend, M., Guillemont, M., Herrmann, F., Leonard,
P., Langlois, S., and Neuhauser, W. The Chorus Dis-
tributed Operating System. Computing Systems, 1(4):305--
370, 1988.
[Schroeder & Burrows 90] Schroeder, M. D. and Burrows, M.
Performance of Firefly RPC. ACM Transactions on Com-
puter Systems, 8(1):1--17, February 1990.
[Schulman et al. 92] Schulman, A., Maxey, D., and Pietrek, M.
Undocumented Windows. Addison-Wesley, 1992.
[Small & Seltzer 94] Small, C. and Seltzer, M. VINO: An Inte-
grated Platform for Operating System and Database Re-
search. Technical Report TR-30-94, Harvard University,
1994.
[Stevenson & Julin 95] Stevenson, J. M. and Julin, D. P. Mach-
US: Unix On Generic OS Object Servers. In Proceedings of
the 1995 Winter USENIX Conference, New Orleans, LA,
January 1995.
[Stodolsky et al. 93] Stodolsky, D., Bershad, B. N., and Chen, B.
Fast Interrupt Priority Management for Operating System
Kernels. In Proceedings of the Second USENIX Workshop
on Microkernels and Other Kernel Architectures, pages
105--110, San Diego, CA, September 1993.
[Stonebraker 81] Stonebraker, M. Operating System Support
for Database Management. Communications of the ACM,
24(7):412--418, July 1981.
[Thacker et al. 88] Thacker, C. P., Stewart, L. C., and Satterth-
waite, Jr., E. H. Firefly: a Multiprocessor Workstation.
IEEE Transactions on Computers, 37(8):909--920, August
1988.
[Thekkath & Levy 93] Thekkath, C. A. and Levy, H. M. Limits
to Low-Latency RPC. ACM Transactions on Computer
Systems, 11(2):179--203, May 1993.
[Thekkath & Levy 94] Thekkath, C. A. and Levy, H. M. Hard-
ware and Software Support for Efficient Exception Han-
dling. In Proceedings of the Sixth International Confer-
ence on Architectural Support for Programming Languages
and Operating Systems (ASPLOS-VI), pages 145--156, San
Jose, CA, October 1994.
[von Eicken et al. 92] von Eicken, T., Culler, D. E., Goldstein,
S. C., and Schauser, K. E. Active Messages: A Mecha-
nism for Integrated Communication and Computation. In
Proceedings of the Nineteenth International Symposium on
Computer Architecture, pages 256--266, Gold Coast, Aus-
tralia, May 1992.
[Wahbe et al. 93] Wahbe, R., Lucco, S., Anderson, T. E., and
Graham, S. L. Efficient Software-Based Fault Isolation. In
Proceedings of the Fourteenth ACM Symposium on Oper-
ating Systems Principles, pages 203--216, Asheville, NC,
December 1993.
[Waldspurger & Weihl 94] Waldspurger, C. A. and Weihl, W. E.
Lottery Scheduling: Flexible Proportional-Share Resource
Management. In Proceedings of the First USENIX Sym-
posium on Operating Systems Design and Implementation
(OSDI), pages 1--11, Monterey, CA, November 1994.
[Wheeler & Bershad 92] Wheeler, B. and Bershad, B. N. Consis-
tency Management for Virtually Indexed Caches. In Pro-
ceedings of the Fifth International Conference on Architec-
tural Support for Programming Languages and Operating
Systems (ASPLOS-V), pages 124--136, Boston, MA, Octo-
ber 1992.
[Wulf et al. 81] Wulf, W. A., Levin, R., and Harbison, S. P.
Hydra/C.mmp: An Experimental Computer System.
McGraw--Hill, 1981.
[Young et al. 87] Young, M., Tevanian, A., Rashid, R., Golub,
D., Eppinger, J., Chew, J., Bolosky, W., Black, D., and
Baron, R. The Duality of Memory and Communication
in the Implementation of a Multiprocessor Operating Sys-
tem. In Proceedings of the Eleventh ACM Symposium on
Operating Systems Principles, pages 63--76, Austin, TX,
November 1987.
[Yuhara et al. 94] Yuhara, M., Bershad, B. N., Maeda, C., and
Moss, J. E. B. Efficient Packet Demultiplexing for Mul-
tiple Endpoints and Large Messages. In Proceedings of
the 1994 Winter USENIX Conference, pages 153--165, San
Francisco, CA, January 1994.