Adaptive Construction of Hierarchy from Peer-to-Peer Network

by Yitao Duan and Ling Huang
duan@eecs.berkeley.edu
hlion@newton.berkeley.edu

In our initial proposal we tried to attack the service discovery problem in a dynamic
network. In particular, we planned to improve or modify algorithms that discover the
existence of all network nodes. As been pointed out, this is a too broad area and there
has already been much research. In rethinking about our project plan, we realized that,
along the same general direction of our initial proposal, many systems that provide or
use some kind of service discovery service(such as SDS, distributed web caching, or ad
hoc sensor network) either assume the existence of a hierarchical structure in the
network or do without it. There isn't too much literature showing how to construct such
hierarchy other than manual configuration. We feel that having some form of structure
in the network is beneficial in many ways. For example, nodes are asymmetric in terms of
capabilities (CPU, memory, bandwidth, etc..) and it would be more efficient to have them
perform asymmetric tasks. Another example would be if a sensor network consisting of many
small sensors and a few big sensors can self-organize into some hierarchy, it can be more
efficient and robust than simple peer-to-peer cooperation.

Therefore we narrowed our focus down to finding a reasonably good algorithm/protocol to
construct network hierarchy from flat peer-to-peer node mesh to support efficient routing
and data query based on the notion of "supernode". Supernode is a just a node who is
powerful in terms of capabilities (CPU, bandwidth, etc.) and willing to be a query server.
All the peer-to-peer clients can be partitioned into disjoint "local network group". Each
group is serviced by its nearby strategically selected supernode. A collection of supernode
collaboratively provides all the query service. Supernodes can also form higher level of
hierarchy if desired. Clients can locate a nearby supernode and tap into the query service
via the supernode. The problem can be abstracted to the following:

Given a partially connected directed graph with each edge representing a node's initial
knowledge of another, find a tree(possibly of a given depth) or forest, according to
certain criteria, that encompass all nodes.

Different variants of the problem specification can be applied to other application. For
example, in a modified Nepstor service model, there is no central server(nothing to be shut
down!), but a number of participating machines will elect a local supernode which will
cooperate with supernodes from other groups and provide services to its clients. Whatever
protocol that is used to form such two-level hierarchy must be adaptive in that
adding/removing of supernodes and clients must not disrupt the operation.

We believe this is more manageable than our initial proposal because it is clearly defined
and we already have some rough ideas of how it will work. Also many difficulties we
anticipated are design decisions. Issues we expect to handle include but not limited to:

1. Node selection: which node, out of those willing, should be selected as supernode and
how is this done.
2. Node discovery between superNodes: How would roots of each tree in the resulting forest
discover each other -- Name Dropper algorithm?.
3. Dynamic join and disappearing of supernodes and clients.
4. Optimization of the architecture: supernodes mesh optimization to improve the architecture
if more supernodes are known. This will make the supernodes mesh more wisely connected,
and minimize the number of "hops" a query must travel before it find the metadata for its
destination.
5. Evaluation: basically we have three ways to evaluate our algorithm: (1) mathematical
analysis, (2) simulation and (3) comparing application performance with systems that
doesn't use hierarchy(e.g., is routing more efficient?)

Tentative Timeline

Week 8 : Literature survey. This already be doing for a while. We are tying to get more idea
about algorithm and application space. We will also develop our algorithm concurrently.
Week 9 : Node selection/(discovery?).
Week 10: Message passing mechanism and format. Local data structure.
Week 11: Prove algorithm correctness. Optimization.
Week 12 - 13: Simulation/Algorithm refinement.
Week 14 - 15: Paper write-up.

The above is only tentative. We will try to be early to avoid any problem.