Adaptive Construction of Hierarchy from Peer-to-Peer Network

by Yitao Duan and Ling Huang
duan@eecs.berkeley.edu
hlion@newton.berkeley.edu

In our initial proposal we tried to attack the service discovery problem in a dynamic 
network. In particular, we planned to improve or modify algorithms that discover the 
existence of all network nodes. As been pointed out, this is a too broad area and there 
has already been much research. In rethinking about our project plan, we realized that, 
along the same general direction of our initial proposal, many systems that provide or 
use some kind of service discovery service(such as SDS, distributed web caching, or ad 
hoc sensor network) either assume the existence of a hierarchical structure in the 
network or do without it. There isn't too much literature showing how to construct such 
hierarchy other than manual configuration. We feel that having some form of structure 
in the network is beneficial in many ways. For example, nodes are asymmetric in terms of 
capabilities (CPU, memory, bandwidth, etc..) and it would be more efficient to have them 
perform asymmetric tasks. Another example would be if a sensor network consisting of many 
small sensors and a few big sensors can self-organize into some hierarchy, it can be more 
efficient and robust than simple peer-to-peer cooperation.

Therefore we narrowed our focus down to finding a reasonably good algorithm/protocol to 
construct network hierarchy from flat peer-to-peer node mesh to support efficient routing 
and data query based on the notion of "supernode". Supernode is a just a node who is 
powerful in terms of capabilities (CPU, bandwidth, etc.) and willing to be a query server. 
All the peer-to-peer clients can be partitioned into disjoint "local network group". Each 
group is serviced by its nearby strategically selected supernode. A collection of supernode
collaboratively provides all the query service. Supernodes can also form higher level of 
hierarchy if desired. Clients can locate a nearby supernode and tap into the query service 
via the supernode. The problem can be abstracted to the following:

Given a partially connected directed graph with each edge representing a node's initial 
knowledge of another, find a tree(possibly of a given depth) or forest, according to 
certain criteria, that encompass all nodes.

Different variants of the problem specification can be applied to other application. For 
example, in a modified Nepstor service model, there is no central server(nothing to be shut 
down!), but a number of participating machines will elect a local supernode which will 
cooperate with supernodes from other groups and provide services to its clients. Whatever 
protocol that is used to form such two-level hierarchy must be adaptive in that 
adding/removing of supernodes and clients must not disrupt the operation.

We believe this is more manageable than our initial proposal because it is clearly defined 
and we already have some rough ideas of how it will work. Also many difficulties we 
anticipated are design decisions. Issues we expect to handle include but not limited to:

1. Node selection: which node, out of those willing, should be selected as supernode and 
   how is this done.
2. Node discovery between superNodes: How would roots of each tree in the resulting forest 
   discover each other -- Name Dropper algorithm?.
3. Dynamic join and disappearing of supernodes and clients.
4. Optimization of the architecture: supernodes mesh optimization to improve the architecture 
   if more supernodes are known. This will make the supernodes mesh more wisely connected, 
   and minimize the number of "hops" a query must travel before it find the metadata for its
   destination.
5. Evaluation: basically we have three ways to evaluate our algorithm: (1) mathematical 
   analysis, (2) simulation and (3) comparing application performance with systems that 
   doesn't use hierarchy(e.g., is routing more efficient?)

Tentative Timeline

Week 8 : Literature survey. This already be doing for a while. We are tying to get more idea 
         about algorithm and application space. We will also develop our algorithm concurrently.
Week 9 : Node selection/(discovery?).
Week 10: Message passing mechanism and format. Local data structure.
Week 11: Prove algorithm correctness. Optimization.
Week 12 - 13: Simulation/Algorithm refinement.
Week 14 - 15: Paper write-up.

The above is only tentative. We will try to be early to avoid any problem.