Scalability Issues for High Performance Digital Libraries on the World Wide Web

Dan Andresen, Tao Yang, Omer Egecioglu, Oscar H. Ibarra, Terence R. Smith, UC Santa Barbara

One-line summary: Httpd distributed across multiple NOW nodes and disks using DNS rotation and HTTP redirect. Redirection decision is based on distributed scheduler that measures CPU and network (due to NFS traffic) load at each node. A particular instance of distillation and refinement for maps, based on prefix encodings using wavelet transforms, is explicitly supported.

Overview/Main Points

Each node is a SPARC-10 with local disk. Database is partitioned, not replicated, so node may have to satisfy request from a neighbor's disk (using NFS -- yuck).
Maps in digital lib. initially delivered as coarse thumbnail; refinement can zoom in on a region or add layers of resolution.
Both are achieved using wavelet transforms; maps are evidently stored in this form, with thumbnails and additional layers precomputed.
Distributed scheduling chosen to avoid having single point of failure. A load daemon at each node periodically broadcasts its load info. An "oracle" analyzes the communication/computation needs of each request (similar to our per-distiller cost functions).
"Smart" scheduler is 18-54% better than round-robin, since CPU time neither dominates all requests nor is the same across different requests. (We expect similar result with PTM.)
Under heavy load, superlinear speedup is observed (due to VM thrashing, more network interrupts, etc., same as Inktomi).

Differences from TCS Proxy

They address a specific case of distillation/refinement where the layers and representations are precomputed. We address the general case using datatype-specific on-demand techniques.
They do distributed scheduling. We do centralized scheduling because our nodes are not all identical (need a mapping facility per request) and because we want to be able to implement flexible policy. PTM is already F/T with respect to distillers dying, and will soon be F/T with respect to itself dying.
We always assume a fast interconnect between the nodes. They don't, so inter-node NFS traffic can contribute significantly to the decision of which node to assign a request. (E.g. benefit of choosing node with light CPU load may be negated if the node has heavy network utilization and needs to contact neighbor via NFS.)
They use URL redirection for request distribution; we can't because we are a proxy. But there are other good reasons (roundtrips) to avoid this approach if possible.
They use DNS rotation for initial coarse load balancing; we will use a Magic Router, which can detect and exclude failed nodes and react more quickly to changes in coarse load balancing if desired (avoids DNS caching problem).
Authors only got 5-15% of TCP thruput for Meiko CS-2 Elan network. Hopefully Fast Sockets and GAM will do better for us.

Flaws

Claim that one SPARC-10 can handle 4 HTTP requests/sec delivering ~1.5MB files. Is this limited by network stack thruput, storage I/O, or CPU?
Critical path of a request goes thru HTTPd, thru "oracle" which analyzes computing/communication needs of the request, to "broker" that either satisfies it or sends it to another node. Significant CPU resources have already been invested in the request at this point! Does the decision to redirect account for this?

Back to index