A Reliable Multicast Framework for Light-weight Sessions and Application Level Framing

Sally Floyd, Van Jacobson, Steve McCanne, LBL; C.G. Liu, USC; Lixia Zhang, Xerox PARC

One-line summary: Reliable multicast that allows app-level policies for reliability and ordering, rather than putting it in the protocol. Scalability achieved by having one client multicast a repair request if none is seen after a small random delay.

Overview/Main Points

Many-to-many multicast (based on IP multicast) where source ID's of mcasts are persistent. All data has a unique name.
Ordering and naming/sequence semantics are application-level only; thus used only when needed, and apps can choose whatever naming convention works well for them, eg in wb need to know what seq numbers you've missed right after you join.
BW management/allocation are also at app level.
To prevent ACK implosion, NAK's are used.
When someone needs repair, they wait a random amt of time for someone else to mcast a repair request; if not seen, they mcast it themselves.
Since ADU names can be independent of the original sender, anyone who has a copy of the requested data may transmit it.
If repair is lost, timeout triggers a retransmit.
Wb instantiation: most ops are idempotent; time stamps on drawops are used to determine rendering order. Captures reasonable temporal causality without heavyweight causal delivery. For repairs, someone who has a copy of requested data mcasts it after a random time; this prevents response implosion.
Some nice simulations of request/repair behavior on various topologies suggest that it works well even with large numbers of nodes.
Other apps for which RMP would be interesting: distributed Web caching; Usenet; internet routing information exchange.
Related work: mostly distinguishes between token-based and distributed responsibility for reliability and ordering.

Relevance

Convincing application of the end-to-end argument to reliable multicast. Effective prevention of ACK implosion and response implosion.

Flaws

There are lots of parameters in the simulations whose values were fixed (timeout, etc.) - not clear how they affect performance.
If repair is dropped, have to rely on timer retransmit. On poor networks, this could suck.
WWW application seems somewhat gratuitous - not clear if mcast is the right way to deliver WWW updates to distributed caches.

Back to index