NOTE: For this phase, we provide you with a protocol documented here and with a state diagram.

Phase 4: Networks and Distributed Systems

The final assignment is to experiment with networking. We will provide you with some low-level network communications facilities; you will build a nicer abstraction on top of those, and then use that abstraction in building a distributed chat program.

Each node in the network will be implemented as a separate instance of Nachos. Therefore you will have one JVM running for each network node; these JVMs will be running on the same actual machine, even though we are pretending as though they are distributed on a network.

Each Nachos "node" has a single connection to the network, which is implemented using UDP sockets. The class nachos.machine.NetworkLink provides this functionality for you. Each node has a unique link address which represents the physical network address of that node. You can access the network link using the method Machine.networkLink(). Calling getLinkAddress() on the NetworkLink object will return the link address of this node; the type of the address is just an int. Since multiple Nachos nodes will be running on the same actual machine, you will need to use different swap files for each node to avoid interference. An easy way to do this is to append the network link address to the swap filename.

You can test out the basic network functionality by building in the proj4 directory and simultaneously running nachos in two different windows on the same machine. As Nachos initializes, the first line of text contains in it the address of its network link. For example, if you see:

nachos 5.0j initializing... config interrupt timer processor console network(1) user-check grader
this indicates that the network link address is 1 (the value in parenthesis after network in the above output).

The new machine files for this assignment include:

The new kernel files for this assignment, found in the network directory, include:

PostOffice provides a more convenient communications abstraction than the raw network link. It provides user-to-user communication on top of the network link's node-to-node communication, but does not provide reliable message delivery. In this project you will have to implement other layers on top of the PostOffice and network link abstractions. For example, you will need to implement reliable delivery on top of the underlying unreliable communications link. Since the PostOffice is not entirely suitable for this project, you are free to use it as a starting point, but you are also welcome to start from scratch if you wish.

Note that NetKernel extends VMKernel and NetProcess extends VMProcess, meaning that this project depends upon your phase 3 solution working correctly. If you wish, you may modify the classes to extend UserKernel and UserProcess instead, so that your code only depends upon project phase 2. In order for this to work, ensure that the total number of physical memory pages is large enough to hold the entire code for your test programs. This is controlled by the line

Processor.numPhysPages = 16
in the nachos.conf file.

  1. (75%) Implement the two networking syscalls (connect() and accept(), documented in syscall.h), which provide reliable, connection-oriented, byte-stream communication between connection endpoints. A connection endpoint is a link address combined with a port number. A user program connects to a remote node by calling connect() with a remote network link ID (also called a host ID) and a port number. The remote node must call the accept() system call to accept the connection; the system call takes a local port number on which connections can be made.

    When the connection is established, each system call returns a new file descriptor representing the connection (in UNIX terminology, this file descriptor is called a "socket"). The application can then send messages to the network by calling write() on the socket, and can read messages from the network by calling read() on the socket. The connection is closed by one end of the connection performing a close() operation on the socket file descriptor. After a connection is closed, any read() or write() calls on the socket (on either end of the connection!) should return -1, indicating an error.

    There is one important detail to consider here. If a program does something like:

         write(socket, buf, 100);
    close(socket);
    then although the connection has been closed, the 100 bytes of data have probably not been acknowledged yet by the remote host. Your implementation must ensure that all data written to a socket is delivered to the remote host, even if the socket is closed before all outgoing data is acknowledged. This also means that any read calls on the socket by the remote host must succeed until all of pending data has been read, even though the socket has been closed. This means that you cannot simply "drop" the state for a socket when it is closed; you have to keep the socket "alive" until all of the data is delivered!

    Note that the close() call given above should not block waiting for all outgoing data to be transmitted; the close() should return immediately and the network layer should handle transmitting the remaining data.

    Your implementation of read() and write() on sockets must provide full-duplex, reliable, byte-stream communication with no message size limits. This means that packets can flow in both directions on the connection simultaneously. Also, if one node calls write() with a buffer of size N, then the remote host will receive exactly the same data by calling read() (although multiple calls to read() might be required). The read() call may not return all of the data at once. For example, if node A calls

    write(socket, buffer, 100);  /* Sender */
    and node B calls
    read(socket, buffer, 10); /* Receiver */ 
    then B will have to call read() ten times before receiving all of the data. Also, read() may return less data than the user requested. For example, if B calls
    read(socket, buffer, 1000); /* Receiver */ 
    then if only 100 bytes are available, read() will return only those 100 bytes. In general the amount of data returned by each call to read need not be equal to the corresponding write() call on the remote host. Also, read() does not block waiting for data to be received on the network; it returns immediately (with a return value of 0).

    You are free to use the PostOffice class as a starting point for your implementation, or you can start from scratch on top of the network link. It's best that you not modify PostOffice itself, but rather you should implement your own communications interface which makes use of PostOffice (if you wish).

    Your implementation should use the protocol documented here. A state transition table for this protocol is provided here. As you design your implementation, think about all of the possible cases of failure in the underlying network layer. The underlying network link may drop arbitrary packets, but it may not reorder packets or corrupt their contents. (Hint: This will make your job a lot easier!)

    To test your implementation, the Nachos network link can be made to randomly drop packets by modifying the value of the NetworkLink.reliability key in nachos.conf. This is a floating-point number, between 0 and 1, representing the likelihood that a packet will be successfully delivered.

  2. (25%) Demonstrate your message passing primitives work by using them to implement an IRC-like "network chat" application among multiple (at least 3) users. Network chat is like the UNIX talk utility, except it can be used for N-way conversations (not just 2-way); each user sits at a different machine, and anything anyone types is broadcast to everyone else connected to the chat room.

    You should write two programs: a chat server (called chatserver.c) and a chat client (called chat.c). The server runs on one node and accepts connections from chat clients. The chat client connects to the chat server and accepts user input from the console. After the user types a line of text, the line is transmitted to the chat server. The server then broadcasts the message to all of the clients connected to it. Each message received by the chat client should then be displayed on the console. Users can dynamically connect and disconnect from the chat room.