Robi Krauthgamer talked about his work on lower bounds for embedding edit distances to l1.

Given two strings x and y, the ** edit distance ** ed(x,y) between them is the number of character insertions, deletions and substitutions required to convert x to y. Such a distance measure is commonly used to compare biological sequences. Given the set of strings V {0,1}^{n}, we would like to find an embedding of V into the l1 metric. Such an embedding is said to have ** distortion ** at most D if the distance between every pair of points is preserved within a factor of D. Robi's result says that any embedding of edit distance on V to l1 incurs a distortion of at least log d.

E _{p} [ deltaA(x,y) ] / E_{q} [deltaA(x,y)] <= D E _{p} [ ed(x,y) ] / E_{q} [ ed(x,y)]

The demand distribution q was chosen to be the uniform distribution on VxV. [ADGIR03] shows that the expected distance between any pair of vertices in this distribution is Theta(n).

The following was selected as the edge distribution. Let p_{H} be the uniform distribution of edges on the n dimensional hypercube, and let p_{S} be the uniform distribution over edges of the form (x, S(x)). (Here for any string x, S(x) denotes the string obtained by applying a shift on x.) The edge distribution p is

p = 1/2 (p_{H} + p_{S})

p(A, V\A) <= O(log n/n) q(A, V\A)

Any cut which separates vertices of a n-dimensional hypercube can be thought of as a boolean function of n variables. Observe that the number of hypercube edges of a particular dimension i crossing such a cut is actually the ** influence** of variable i in the corresponding function, and the total number of hypercube edges crossing a cut is the total influence of variables on the corresponding boolean function. Also note that for any cut (A, V\A), in the chosen edge distribution, the total influence is at most 2np(A,V\A).

A lemma by [KKL88] says that if the total influence of a balanced function f is at most t, then there exists a variable j with influence at least 2^{ -ct} for some constant c. It one assumes for contradiction that

p(A, V\A) >= O(log n/n) q(A, V\A)

then this lemma implies that there exists a variable with influence 1/n
It was then shown that if one variable j has high influence, then so do many, namely, n^{1/8} variables. This leads to a contradiction as then the total influence becomes Omega(1/n^{1/8}), which is more than O(log n/n). The proof for this works as follows. If the variable j has high influence, then with high probability, x and x + e_{j} are on different sides of the cut. With at most O(log n/n) probability, x and S(x) are on the same side and with at most O(log n/n) probability, x + e_{j} and S(x + e_{j}) (which is also S(x) + e_{j+1}) are on the same side of the cut. Since S(x) has the uniform distribution if x has the uniform distribution, this implies that the influence of the variable j+1 is also high. A similar argument extends to the variables e_{j+2}, .., e_{j+n1/8},and therefore the total influence is high contradicting the earlier assumption.

- [ADGIR03] Andoni, Deza, Gupta, Indyk and Rashkhodnikova. "Lower Bounds for Embeddings of Edit Distance into Normed Spaces.", SODA 2003
- [KKL88] Kahn, Kalai and Linial. "The Influence of Variables on Boolean Functions." 29th FOCS, 1988.
- [LLR95] Linial, London and Rabinovich. "The Geometry of Graphs and Some of Its Algorithmic Applications", Combinatorica 1995.