Computer Science 252: Graduate Computer Architecture

University of California
Dept. of Electrical Engineering and Computer Sciences

David E. Culler
TA: Jason Hill

Project Suggestions

Spring 2001

A collection of projects were outlined at the end of Jason's Wireless Network Sensor Lecture and some appeared in Kurt's reconfigurable lecture.
Hardening the IP stack into silicon. There have many efforts over the years to offload substantial portions of the TCP/IP stack into co-processors, network interfaces, etc. However, most of these projects focused on support for general purpose machines. Now we are seeing all sorts of small, embedded networked devices. Printers, cameras, all sorts of things sit on the end of an ethernet connection. Given that these function in very limited modes, it may be possible to cast the limited slice of the stack required for these devices into hardware, or at least cast major portions into hardware. Is this true? How do the networking characteristics vary from general purpose PCs? What are the opportunities for simplification? What hardware structures would be appropriate? What do you gain?
Phil Buonnadonna has proposed a compromise between Infiniband and TCP/IP which takes the simplified queue pairs abstraction for Infiniband and the lower layers of the stack from TCP/IP in this simplified context. He's done a nice implementation comparison in firmware on the Myricom Lanai network interface. How would you implement this is hardware? How would its complexity compare to a full blown Infiniband implementation?
Infiniband is proposed for Storage Area Networks, however, I've never seen a concrete proposal as to how to implement storage over it. There are concrete proposals for NASD and now some storage over IP proposals. Propose and evaluate a strategy for providing storage access over Infiniband. Which mode woulod you use? Connection vs datagram, reliable vs unreliable?
Many of the performance aspects of multithreaded and SMT architectures have been analyzed and some decent power models exist for pipelined processors. What are the energy implications of multithreading? Are there energy optimizations? Is it possible to use the multithreading structure to get cheap wake-up?
Many of you have probably noticed that network access to and from campus is the pits these days. The campus limited its external bandwidth to 70 mb/s as a cost cutting measure and now all the traffic is backed up behind the bottleneck. Many complain that their interactive connections are dismal in the face of all that streaming Nabster traffic. We have all sorts of new network processor techniques - classifying packets on the fly and scheduling various flows. These have usually been employed to provide QoS for streams. How could you utilize network processors to improve interactive performance?
In the coming year you will see two on-chip techniques for exploiting thread-level parallelism: simultaneous multithreading (SMT) and chip multi-processors (CMP). What are the trade-offs that favor one or the other? Is there a continuum of designs between? Can you develop a quantitative framework for identifying optimal deisgn points?
There have been numerous studies of cache design, but multithreading introduces new behavior. How does MT impact cache design? What should you do differently?
Several researchers have proposed reconfigurable hardware as a means of application specific optimization and late binding of functionality. However, there are many ways to introduce reconfigurable hardware into a design: function units of definable function, new instructions, co-processors? Survey the space of approaches. What are the trade-offs? Does this lead to some new techniques for integrating reconfigurable logic into processors and/or memories?
Virtual machine monitors have become extremely popular. Many of you probably use VMWare. It relies on a mix of fast traps, instruction emulation, and dynamic translation. What is the performance impact of virtual machine monitors? Where does the time go? How does it change cache, tlb, vm behavior?
What about many virtual machines per physical machine as is used in many web hosting applications. (cf. ensim.com).
What are the power trade-offs implied by virtualization?
IA64 provides a whole host of performance counters. Use these to verify various classical studies on REAL workloads.