University of California Dept. of Electrical Engineering and Computer Sciences
David E. Culler TA: Steven Sorkin
Project Suggestions
Spring 2003
Jason Hill has designed and implemented a Mote Spec of roughly 5 mm^2 following
a series of TinyOS-based designs. BWRC has conducted a series of
picoradio designs with a chip scheduled out this year. Compare and
contrast the two design approaches. Develop a methodology for a quantitative
comparison.
Jason Hill's Mote Spec design has primitive support for multiple threads.
It would seem that multiple hardware contexts is a another important option
in using extra real-estate to reduce power. By avoiding context switch
times, you eliminate energy-consuming cycles, but you'd like to avoid paying
a penalty for having the registers sitting around. Survey what has
been done in the uses of multithreading for low-power operation.
Identify an area where it would be an attractive approach (if such is the
case), develop a concrete proposal, and evaluate it.
In [ http://www.ccs.neu.edu/home/rraj/Pubs/topology.html
] an interesting algorithm is developed for building network topologies
from local information and for MAC and routing within it. It is backed
by some interesting models of radio network behavior. Explore translating
these ideas into practice on low-power wireless systems and evaluate how
well they work.
Real cycle-level simulator for whole networks ***
Commodity "effector motes". TinyOS on lego mindstorms extended with
a radio. Build CC or RFM interfaced to lego mindstorm, perhaps dot on serial
port. Port TinyOS to 8051 and push on impact of continuously wireless
connected robots.
From Dave Patterson: IA-64 studies
* gcc vs. native compiler performance
* how does performance change if simplify flags to just -o
* code size: how does it stack up as you vary compiler and optimization
as compared to x86, RISC machines
* how does code size affect I cache performance, which I presume you
can measure on IA and x86
* Can you replicate the SPEC numbers at Berkeley with what is available?
* FP performance: how does IA-64 stack up against P4 SSE 2 instructions?
Can any compiler produce SSE2 instructions? Code size?
From Bob Broderson: What is the best FPGA architecture to do general
purpose computing?
Analysis of Zigbee directions. Zigbee is an industrial effort to
produce standards for sensor nets. Unfortunately, it is pursuing
a closed process, so it is hard to seriously evaluate the proposals from
the outside. However, we can observe the propietary starting points
that motivate the participants, including Motorola's ClusterTree and Embers
GRAD. We also have a huge amount of open work in AODV. It is
possible to interview members to get a sense of what the goal/requirements
of the standard are perceived to be. (No, they don't seem to be coherently
written down.) Then one can look at how well existing approaches
or extensions thereof might meet those.
Address-based routing in irregular ad hoc networks. In parallel machines
we have done routing forever based on simple algebraic relationships of
the addresses assigned to nodes, eg, grids, n-cubes, butterflies, etc.
Without such regular structures, we have tended to use either table driven
routing or source based routing, which also relies on tables at the endpoints.
We have nice ways to produce the tables, such as up*-down*, which essentially
embeds a spanning tree as a reference structure. We could use such
embedded structures as a basis for address based routing. For example,
it is easy to see how to number nodes according to their position in a
spanning tree so to go from any node to any other, you know whether to
route up or down to a child. The interesting part is that the underlying
connectivity is constantly changing. So you'd like to leave enough
room in the namespace that you can adapt to small changes with limited
renaming. There are a couple of other issues that grow out of this.
Names vs addresses. You'd like to have a name for each node that
is different from its routing name and unchanging.
Translation. Can you use the same or a related structure to translate
from the name to the routing name (current routable address).
Connection between DHTs and Sensor Nets. Current DHTs describe a
way of doing translation in a very large namespace that is continuously
evolving in parallel by combining routing and lookup. The proposals
rely on an underly routing infrastructure (IP) that essentially connects
any point in the overlay graph to any other (through possibly many underlying
hops). It is interesting to look at combining DHTs with proposals
for ad hoc routing. A core problem with the ad hoc routing proposals
is the amount of state (albeit softstate) that they build up per node.
DHTs have the property that you take extra hops, but you only need to learn
about a fixed number of routes. Does this give a way to manage the
routing state for tiny devices? Given that you only need to learn
a few paths, can you eliminate the need for an underlying routing infrastructure?
Investigate the use of erasure codes (e.g., LT codes) for robust storage
schemes a la RAID.
Investigate the use of erasure codes for sensor network communication
protocols.
On one hand, it is most efficient for sensor networks to communicate in
a nearest neighbor mesh (c.f., Airplane Routing). However, it is
possible that there are a few base stations with unlimited power.
Investigate routing algorithms that take this into account.
From CS252 S02
A collection of projects were outlined at the end of Jason's Wireless Network
Sensor Lecture and some appeared in Kurt's reconfigurable lecture.
Hardening the IP stack into silicon. There have many efforts over
the years to offload substantial portions of the TCP/IP stack into co-processors,
network interfaces, etc. However, most of these projects focused
on support for general purpose machines. Now we are seeing all sorts
of small, embedded networked devices. Printers, cameras, all sorts
of things sit on the end of an ethernet connection. Given that these
function in very limited modes, it may be possible to cast the limited
slice of the stack required for these devices into hardware, or at least
cast major portions into hardware. Is this true? How do the
networking characteristics vary from general purpose PCs? What are
the opportunities for simplification? What hardware structures would
be appropriate? What do you gain?
Phil Buonnadonna has proposed a compromise between Infiniband and TCP/IP
which takes the simplified queue pairs abstraction for Infiniband and the
lower layers of the stack from TCP/IP in this simplified context.
He's done a nice implementation comparison in firmware on the Myricom
Lanai network interface. How would you implement this is hardware?
How would its complexity compare to a full blown Infiniband implementation?
Infiniband is proposed for Storage Area Networks, however, I've never seen
a concrete proposal as to how to implement storage over it. There
are concrete proposals for NASD and now some storage over IP proposals.
Propose and evaluate a strategy for providing storage access over Infiniband.
Which mode woulod you use? Connection vs datagram, reliable vs unreliable?
Many of the performance aspects of multithreaded and SMT architectures
have been analyzed and some decent power models exist for pipelined processors.
What are the energy implications of multithreading? Are there energy
optimizations? Is it possible to use the multithreading structure
to get cheap wake-up?
Many of you have probably noticed that network access to and from campus
is the pits these days. The campus limited its external bandwidth
to 70 mb/s as a cost cutting measure and now all the traffic is backed
up behind the bottleneck. Many complain that their interactive connections
are dismal in the face of all that streaming Nabster traffic. We
have all sorts of new network processor techniques - classifying packets
on the fly and scheduling various flows. These have usually been
employed to provide QoS for streams. How could you utilize network
processors to improve interactive performance?
In the coming year you will see two on-chip techniques for exploiting thread-level
parallelism: simultaneous multithreading (SMT) and chip multi-processors
(CMP). What are the trade-offs that favor one or the other?
Is there a continuum of designs between? Can you develop a quantitative
framework for identifying optimal deisgn points?
There have been numerous studies of cache design, but multithreading introduces
new behavior. How does MT impact cache design? What should
you do differently?
Several researchers have proposed reconfigurable hardware as a means of
application specific optimization and late binding of functionality.
However, there are many ways to introduce reconfigurable hardware into
a design: function units of definable function, new instructions, co-processors?
Survey the space of approaches. What are the trade-offs? Does
this lead to some new techniques for integrating reconfigurable logic into
processors and/or memories?
Virtual machine monitors have become extremely popular. Many of you
probably use VMWare. It relies on a mix of fast traps, instruction
emulation, and dynamic translation. What is the performance impact
of virtual machine monitors? Where does the time go? How does
it change cache, tlb, vm behavior?
What about many virtual machines per physical machine as is used in many
web hosting applications. (cf. ensim.com).
What are the power trade-offs implied by virtualization?
IA64 provides a whole host of performance counters. Use these to
verify various classical studies on REAL workloads.