Computer Science 252: Graduate Computer Architecture

University of California
Dept. of Electrical Engineering and Computer Sciences

David E. Culler
TA: Steven Sorkin

Project Suggestions

Spring 2003

Jason Hill has designed and implemented a Mote Spec of roughly 5 mm^2 following a series of TinyOS-based designs. BWRC has conducted a series of picoradio designs with a chip scheduled out this year. Compare and contrast the two design approaches. Develop a methodology for a quantitative comparison.
Jason Hill's Mote Spec design has primitive support for multiple threads. It would seem that multiple hardware contexts is a another important option in using extra real-estate to reduce power. By avoiding context switch times, you eliminate energy-consuming cycles, but you'd like to avoid paying a penalty for having the registers sitting around. Survey what has been done in the uses of multithreading for low-power operation. Identify an area where it would be an attractive approach (if such is the case), develop a concrete proposal, and evaluate it.
In [ http://www.ccs.neu.edu/home/rraj/Pubs/topology.html ] an interesting algorithm is developed for building network topologies from local information and for MAC and routing within it. It is backed by some interesting models of radio network behavior. Explore translating these ideas into practice on low-power wireless systems and evaluate how well they work.
Real cycle-level simulator for whole networks ***
Commodity "effector motes". TinyOS on lego mindstorms extended with a radio. Build CC or RFM interfaced to lego mindstorm, perhaps dot on serial port. Port TinyOS to 8051 and push on impact of continuously wireless connected robots.
From Dave Patterson: IA-64 studies

From Bob Broderson: What is the best FPGA architecture to do general purpose computing?
Analysis of Zigbee directions. Zigbee is an industrial effort to produce standards for sensor nets. Unfortunately, it is pursuing a closed process, so it is hard to seriously evaluate the proposals from the outside. However, we can observe the propietary starting points that motivate the participants, including Motorola's ClusterTree and Embers GRAD. We also have a huge amount of open work in AODV. It is possible to interview members to get a sense of what the goal/requirements of the standard are perceived to be. (No, they don't seem to be coherently written down.) Then one can look at how well existing approaches or extensions thereof might meet those.
Address-based routing in irregular ad hoc networks. In parallel machines we have done routing forever based on simple algebraic relationships of the addresses assigned to nodes, eg, grids, n-cubes, butterflies, etc. Without such regular structures, we have tended to use either table driven routing or source based routing, which also relies on tables at the endpoints. We have nice ways to produce the tables, such as up*-down*, which essentially embeds a spanning tree as a reference structure. We could use such embedded structures as a basis for address based routing. For example, it is easy to see how to number nodes according to their position in a spanning tree so to go from any node to any other, you know whether to route up or down to a child. The interesting part is that the underlying connectivity is constantly changing. So you'd like to leave enough room in the namespace that you can adapt to small changes with limited renaming. There are a couple of other issues that grow out of this.

Names vs addresses. You'd like to have a name for each node that is different from its routing name and unchanging.
Translation. Can you use the same or a related structure to translate from the name to the routing name (current routable address).

Connection between DHTs and Sensor Nets. Current DHTs describe a way of doing translation in a very large namespace that is continuously evolving in parallel by combining routing and lookup. The proposals rely on an underly routing infrastructure (IP) that essentially connects any point in the overlay graph to any other (through possibly many underlying hops). It is interesting to look at combining DHTs with proposals for ad hoc routing. A core problem with the ad hoc routing proposals is the amount of state (albeit softstate) that they build up per node. DHTs have the property that you take extra hops, but you only need to learn about a fixed number of routes. Does this give a way to manage the routing state for tiny devices? Given that you only need to learn a few paths, can you eliminate the need for an underlying routing infrastructure?
Investigate the use of erasure codes (e.g., LT codes) for robust storage

Investigate the use of erasure codes for sensor network communication

On one hand, it is most efficient for sensor networks to communicate in a nearest neighbor mesh (c.f., Airplane Routing). However, it is

From CS252 S02

A collection of projects were outlined at the end of Jason's Wireless Network Sensor Lecture and some appeared in Kurt's reconfigurable lecture.
Hardening the IP stack into silicon. There have many efforts over the years to offload substantial portions of the TCP/IP stack into co-processors, network interfaces, etc. However, most of these projects focused on support for general purpose machines. Now we are seeing all sorts of small, embedded networked devices. Printers, cameras, all sorts of things sit on the end of an ethernet connection. Given that these function in very limited modes, it may be possible to cast the limited slice of the stack required for these devices into hardware, or at least cast major portions into hardware. Is this true? How do the networking characteristics vary from general purpose PCs? What are the opportunities for simplification? What hardware structures would be appropriate? What do you gain?
Phil Buonnadonna has proposed a compromise between Infiniband and TCP/IP which takes the simplified queue pairs abstraction for Infiniband and the lower layers of the stack from TCP/IP in this simplified context. He's done a nice implementation comparison in firmware on the Myricom Lanai network interface. How would you implement this is hardware? How would its complexity compare to a full blown Infiniband implementation?
Infiniband is proposed for Storage Area Networks, however, I've never seen a concrete proposal as to how to implement storage over it. There are concrete proposals for NASD and now some storage over IP proposals. Propose and evaluate a strategy for providing storage access over Infiniband. Which mode woulod you use? Connection vs datagram, reliable vs unreliable?
Many of the performance aspects of multithreaded and SMT architectures have been analyzed and some decent power models exist for pipelined processors. What are the energy implications of multithreading? Are there energy optimizations? Is it possible to use the multithreading structure to get cheap wake-up?
Many of you have probably noticed that network access to and from campus is the pits these days. The campus limited its external bandwidth to 70 mb/s as a cost cutting measure and now all the traffic is backed up behind the bottleneck. Many complain that their interactive connections are dismal in the face of all that streaming Nabster traffic. We have all sorts of new network processor techniques - classifying packets on the fly and scheduling various flows. These have usually been employed to provide QoS for streams. How could you utilize network processors to improve interactive performance?
In the coming year you will see two on-chip techniques for exploiting thread-level parallelism: simultaneous multithreading (SMT) and chip multi-processors (CMP). What are the trade-offs that favor one or the other? Is there a continuum of designs between? Can you develop a quantitative framework for identifying optimal deisgn points?
There have been numerous studies of cache design, but multithreading introduces new behavior. How does MT impact cache design? What should you do differently?
Several researchers have proposed reconfigurable hardware as a means of application specific optimization and late binding of functionality. However, there are many ways to introduce reconfigurable hardware into a design: function units of definable function, new instructions, co-processors? Survey the space of approaches. What are the trade-offs? Does this lead to some new techniques for integrating reconfigurable logic into processors and/or memories?
Virtual machine monitors have become extremely popular. Many of you probably use VMWare. It relies on a mix of fast traps, instruction emulation, and dynamic translation. What is the performance impact of virtual machine monitors? Where does the time go? How does it change cache, tlb, vm behavior?
What about many virtual machines per physical machine as is used in many web hosting applications. (cf. ensim.com).
What are the power trade-offs implied by virtualization?
IA64 provides a whole host of performance counters. Use these to verify various classical studies on REAL workloads.