5. User-Level Processes

Nachos runs each user program in its own private address space. Nachos can run any COFF MIPS binaries that meet a few restrictions. Most notably, the code must only make system calls that Nachos understands. Also, the code must not use any floating point instructions, because the Nachos MIPS simulator does not support coprocessors.

5.1. Loading COFF Binaries

COFF (Common Object File Format) binaries contain a lot of information, but very little of it is actually relevent to Nachos programs. Further, Nachos provides a COFF loader class, nachos.machine.Coff, that abstracts away most of the details. But a few details are still important.

A COFF binary is broken into one or more sections. A section is a contiguous chunk of virtual memory, all the bytes of which have similar attributes (code vs. data, read-only vs. read-write, initialized vs. uninitialized). When Nachos loads a program, it creates a new processor, and then copies each section into the program's virtual memory, at some start address specified by the section. A COFF binary also specifies an initial value for the PC register. The kernel must initialize this register, as well as the stack pointer, and then instruct the processor to start executing the program.

The Coff constructor takes one argument, an OpenFile referring to the MIPS binary file. If there is any error parsing the headers of the specified binary, an EOFException is thrown. Note that if this constructor succeeds, the file belongs to the Coff object; it should not be closed or accessed anymore, except through Coff operations.

There are four Coff methods:

The CoffSection class allows Nachos to access a single section within a COFF executable. Note that while the MIPS cross-compiler generates a variety of sections, the only important distinction to the Nachos kernel is that some sections are read-only (i.e. the program should never write to any byte in the section), while some sections are read-write (i.e. non-const data). There are four methods for accessing COFF sections:

5.2. Starting a Process

The kernel starts a process in two steps. First, it calls UserProcess.newUserProcess() to instantiate a process of the appropriate class. This is necessary because the process class changes as more functionality is added to each process. Second, it calls execute() to load and execute the program, passing the name of the file containing the binary and an array of arguments.

execute() in turn takes two steps. It first loads the program into the process's address space by calling load(). It then forks a new thread, which initializes the processor's registers and address translation information and then calls Machine.processor().run() to start executing user code.

load() opens the executable's file, instantiates a COFF loader to process it, verifies that the sections are contiguously placed in virtual memory, verifies that the arguments will fit within a single page, calculates the size of the program in pages (including the stack and arguments), calls loadSections() to actually load the contents of each section, and finally writes the command line arguments to virtual memory.

load() lays out the program in virtual memory as follows: first, starting at virtual address 0, the sections of the executable occupy a contiguous region of virtual memory. Next comes the stack, the size of which is determined by the variable stackPages. Finally, one page is reserved for command line arguments (that argv array).

loadSections() allocates physical memory for the program and initializes its page table, and then loads sections to physical memory (though for the VM project, this loading is done lazily, delayed until pages are demanded). This is separated from the rest of load() because the loading mechanism depends on the details of the paging system.

In the code you are given, Nachos assumes that only a single process can exist at any given time. Therefore, loadSections() assumes that no one else is using physical memory, and it initializes its page table so as to map virtual memory addresses directly to physical memory addresses, without any translation (i.e. virtual address n maps to physical address n).

The method initRegisters() zeros out the processor's registers, and then initializes the program counter, the stack pointer, and the two argument registers (which hold argc and argv) with the values computed by load(). initRegisters() is called exactly once by the thread forked in execute().

5.3. User Threads

User threads (that is, kernel threads that will be used to run user code) require additional state. Specifically, whenever a user thread starts running, it must restore the processor's registers, and possibly restore some address translation information as well. Right before a context switch, a user thread needs to save the processor's registers.

To accomplish this, there is a new thread class, UThread, that extends KThread. It is necessary to know which process, if any, the current thread belongs to. Therefore each UThread is bound to a single process.

UThread overrides saveState() and restoreState() from KThread so as to save/restore the additional information. These methods deal only with the user register set, and then direct the current process to deal with process-level state (i.e. address translation information). This separation makes it possible to allow multiple threads to run within a single process.

5.4. System Calls and Exception Handling

User programs invoke system calls by executing the MIPS syscall instruction, which causes the Nachos kernel exception handler to be invoked (with the cause register set to Processor.exceptionSyscall). The kernel must first tell the processor where the exception handler is by calling Machine.processor().setExceptionHandler().

The default Kernel exception handler, UserKernel.exceptionHandler(), reads the value of the processor's cause register, determines the current process, and invokes handleException on the current process, passing the cause of the exception as an argument. Again, for a syscall, this value will be Processor.exceptionSyscall.

The syscall instruction indicates a system call is requested, but doesn't indicate which system call to perform. By convention, user programs place the value indicating the particular system call desried into MIPS register r2 (the first return register, v0) before executing the syscall instruction. Arguments to the system call, when necessary, are passed in MIPS registers r4 through r7 (i.e. the argument registers, a0 ... a3), following the standard C procedure call convention. Function return values, including system call return values, are expected to be in register r2 (v0) on return.

Note: When accessing user memory from within the exception handler (or within Nachos in general), user-level addresses cannot be referenced directly. Recall that user-level processes execute in their own private address spaces, which the kernel cannot reference directly. Use readVirtualMemory(), readVirtualMemoryString(), and writeVirtualMemory() to make use of pointer arguments to syscalls.