Alex Slivkins talked about work on metric space embeddings inspired by some recent systems projects. To start us off, he gave us an example problem. Suppose you have a list of servers that hold a certain file. You want to download the file from the closest server. How do you tell which server is closest? You could ping each server, but perhaps that is too much work.

Instead, what if the servers had well-known "labels" corresponding to coordinates in, say, a low-dimensional euclidean space? What if the distance in this low-dimensional euclidean space was a close approximation of the round-trip time? You could then compute the distance in label-space offline, without needing to test network round-trip time directly.

While this may sound silly, it turns out recent systems projects have been
successful at computing just such coordinates in a meaningful way.
In more detail, these projects pick a small number of **beacons**
scattered around the network. Then, the "label" for a server is the vector
of its distances to each beacon. We approximate the real distance d(u,v)
between nodes u and v by the distance between the appropriate beacon nodes
B_u and B_v.

"For some fairly magical reasons it worked pretty well." I didn't catch the name of any specific project, but Alex mentioned that one such project had tested its approach on 900 machines randomly chosen on the Internet and found the pairwise label distance to be close to the real pairwise round-trip times, i.e. the relative error is small, in 90% of cases.

So empirically the results are good. The question, then, is whether we can prove anything. What is the right approach, what can we ask for from such an approach, and what can we prove?

For Alex, the right approach is to think of the beacon mechanism as an embedding from the underlying routing metric into the beacon distance metric, then use the triangle inequality to derive upper and lower bounds on the true distance d(u,v). By writing out the triangle, Alex defined a lower bound d-(u,v) as the maximum value over all beacons of the underestimate on the true distance, and an upper bound d+(u,v) as a minimum value over all beacons of the overestimate. He pointed out that if we trust the triangle inequality, then either bound is just as good. The key issue is then whether the bounds are close to the true value or not, given the number of beacons, configuration of nodes, metric of interest, and so on.

Ideally, we would want a beacon embedding with the following properties:

- For all delta > 0, d+/d- = 1 + delta (for a "small" number of beacons)
- Works for all metrics
- Guarantee holds for all node pairs

Alex first showed us a counterexample for the all node pairs case: a metric over k "clusters," where the number of clusters is greater than the number of beacons. We then define the distance between two nodes to be 0 if they are in the same cluster, and 1 otherwise. By the pigeonhole principle, some cluster has no beacon, and from this we can derive nodes whose distance will be overestimated by any beacon method. The relaxation here is to allow epsilon-slack for our method; we allow error on a fraction epsilon of our nodes, and then we let the number of beacons depend on epsilon.

For
the case of all metrics, Alex then pointed out that the uniform metric, in
which all distances are 1, has a triangle lower bound of 0. As a result,
it cannot be approximated well by the beacon mechanism. Here, the
relaxation is to **doubling metrics**, which are metrics with the
following property: for all balls of radius R, there exists a cover of the
ball by a **constant** number of sub-balls of radius R/2. In the case
of point sets in a k-dimensional Euclidean metric, this constant is
2^(O(k)). Alex noted that James Lee and several others have done work on
doubling metrics before, so there is a library of results to use here.
Intuitively, a doubling metric also generalizes the notion of a
growth-constrained metric. Such a metric also roughly approximates a
slightly perturbed plane metric, which may make it appropriate for
characterizing Internet routing.

Alex then presented a mechanism that works for all doubling metrics, for all delta > 0, for all epsilon > 0, and has the number of beacons depend on the doubling dimension of the metric and epsilon. The idea is actually fairly simple. First, we pick O( (1/e) (1/delta)^dim ) beacons in the space uniformly at random. Then we compute the labels for each node as above, as the set of distances to each beacon.

To see why this works, consider a pair of nodes u and v. Let B be the smallest ball centered at u that contains more than epsilon*n nodes. Let r be the radius of the ball B. The idea is that with high probability, B contains a beacon (because we picked roughly 1/e fraction of the nodes to be beacons).

Now let B' be a ball centered at u with radius r*sqrt(3delta). Alex then enumerated the cases for the pair node v - it can either be inside B and B' both, in B' but not in B, or outside both B and B'. The easiest case is the last, in which v is outside B' - in this case, u is so much closer to its beacon than to v that the triangle bound holds.

The more interesting cases are when v is closer. If v is in B', then we use the doubling property of the metric to conclude that B' can be covered with smaller balls of radius delta*r. Call the smaller ball of this cover that contains v by the name B''. Either B'' contains more than epsilon*n/k nodes or it does not. If it does, then it has a beacon for v and the triangulation bounds hold. If it does not, then it can be shown that ignoring the resulting nodes does not throw you off by too much, since we are allowing epsilon fraction of pairs to be off.

We then took questions. Someone asked what would happen if the beacons were miraculously picked for you - could you do better? Alex replied that picking better beacons had been tried experimentally in the systems projects he mentioned and there seemed to be a factor of 1/2 improvement, but he had not considered it formally or proved anything. James had an observation here about how picking randomly from the densest regions is your best bet in any case (but I didn't catch the details). I thought the speaker was trying to establish a notion similar to competitive ratio in online algorithms, but I didn't get a chance to follow up at lunch.

Dick asked what would happen if we first fix a pair of nodes (e.g. the distance from Boston to LA), then randomized over the choice of beacons. Is the probability of error in this case similar to the probability of error over all pairs of nodes? In other words, are some pairs always bad? Alex replied that the method would not work well for node pairs with small true distances, since the lower bound would tend to be too low, but you could still get some kind of estimate.

We then thanked the speaker and headed out.