SPEEDING UP NN (continued)
==============
[Recall from last lecture that Voronoi diagrams can be used in very low
dimensions, 2 through perhaps 5; and k-d trees are used for moderate
dimensions, up to about 30. In high-dimensional spaces, people usually use
exhaustive search or they apply dimensionality reduction so they can use
k-d trees. Occasionally people use locality-sensitive hashing, and it will be
interesting to see if it develops to the point where it reliably beats
exhaustive search on many applications.]
Voronoi diagrams
----------------
Let P be a point set. The _Voronoi_cell_ of w in P is
Vor w = { p in R^d : |pw| <= |pv| for all v in P }
[A Voronoi cell is always a convex polyhedron or polytope.]
The _Voronoi_diagram_ of P is the set of P's Voronoi cells.
[Show Voronoi diagram illustrations (voronoi.pdf, vormcdonalds.jpg,
voronoiGregorEichinger.jpg, saltflat-1.jpg, giraffe-1.jpg, vortex.pdf).]
[Voronoi diagrams sometimes arise in nature. The first deep study of Voronoi
diagrams occurred in the field of crystallography.]
[Believe it or not, the first published Voronoi diagram dates back to 1644, in
the book "Principia Philosophiae" by the famous mathematician and philosopher
Rene Descartes. He claimed that the solar system consists of vortices.
In each region, matter is revolving around one of the fixed stars. His
physics was wrong, but his idea of dividing space into polyhedral regions has
survived (vortex.pdf).]
Size (e.g. # of vertices) is in O(n^ceiling(d/2))
[This upper bound is tight when d is a small constant. As d grows, the
tightest upper bound is somewhat smaller than this, but the complexity still
grows exponentially with d.]
...but often in practice it is O(n).
[Here I'm leaving out a constant that grows exponentially with d.]
_Point_location_: Given query point v, find the point w for which v in Vor w.
2D: O(n log n) time to compute V.d. and a _trapezoidal_map_ for pt location
O(log n) query time [because of the trapezoidal map]
[That's a pretty great running time compared to the linear query time of
exhaustive search.]
dD: Use _binary_space_partition_tree_ (BSP tree) for pt location
[Unfortunately, it's difficult to characterize the running time of this
strategy, although it is likely to be reasonably fast
1-NN only! What about k-NN?
_order-k_Voronoi_diagram_ has a cell for each possible k nearest neighbors.
[Show figure of order-2 Voronoi diagram (2ordervoronoi.pdf).]
In 2D, size is in O(k^2 n)
[k-order Voronoi diagrams are rarely used in practice for k > 1, partly because
the size grows rapidly with k; partly because there's no software available.
They're good for 1-nearest neighbor in 2 or 3 dimensions, maybe 4 or 5, but
for anything beyond that, k-d trees are much simpler and probably faster.]
[There are also Voronoi diagrams for other distance metrics, like the L_1 and
L_infinity norms.]
[If you want to know how to compute Voronoi diagrams and point location data
structures for them, consider my course CS 274, Computational Geometry, which
I'll probably teach next spring.]
k-d Trees
---------
Decision trees for NN search. Differences:
- No entropy. Split dimension w/greatest variance or width (max - min).
[With nearest neighbor search, we don't care about the entropy. Instead, what
we want is that if we draw a sphere around the query point, it won't intersect
very many boxes of the decision tree. So it helps if the boxes are nearly
cubical, rather than long and thin.]
- Each internal node stores a sample point. [...that lies in the node's box.]
[We don't have nodes only at the leaves; we have them at internal nodes,
because when we search the tree for a query point, we want to stop searching
as early as possible.]
[Draw k-d tree. See PDF file.]
Given query pt q, find a sample pt p such that |qp| <= (1 + epsilon) |qs|,
where s is the closest pt.
[If epsilon = 0, we're asking for an exact nearest neighbor; if epsilon is
positive, we're asking for an approximate nearest neighbor.]
The alg. maintains:
- Nearest neighbor found so far (or k nearest). goes down v
- Heap of unexplored subtrees, keyed by distance from q. goes up ^
[Each subtree represents a box, and we measure the distance from q to the
nearest point in that box and use it as a key for the subtree in the heap.
The search stops when the distance to the kth nearest neighbor found so far
and the distance to the nearest unexplored box meet in the middle.]
[Draw figure of search in progress. See PDF file.]
Q <- heap containing root node of tree
r <- infinity
while Q has a cell closer to q than r / (1 + epsilon)
C <- removemin(Q)
p <- C's sample point
r <- min { r, |qp| } [optimization: store square of distance instead.]
C', C'' <- child cells of C
insert(Q, C') [optimization: check whether
insert(Q, C'') distance >= r / (1 + epsilon) before inserting.]
return point that determined r
[This procedure tries to avoid searching most of the subtrees by searching
the boxes close to q first.]
[Draw figure of worst-case exact search. See PDF file. In the worst case, we
may have to look at every node in the k-d tree to find the nearest neighbor.
In that case, the k-d tree is slower than simple exhaustive search. This is
an example where the *approximate* nearest neighbor algorithm can be much
faster. In practice, settling for an approximate nearest neighbor sometimes
improves the speed by a factor of 10 or even 100, because you don't need to
look at most of the tree to do a query.]
For k-NN, replace "r" with a max-heap holding the k nearest neighbors
[...just like in the exhaustive search algorithm I discussed last lecture.]
Works with any L_p norm for p in [1, infinity].
[k-d trees are not limited to the Euclidean, L_2 norm.]
Software:
- ANN (David Mount & Sunil Arya, U. Maryland)
- FLANN (Marius Muja & David Lowe, U. British Columbia)
- GeRaF (Georgios Samaras, U. Athens) [random forests!]
[I want to punctuate the fact that exhaustive nearest neighbor search really is
one of the first algorithms you should try in practice, even if it seems too
simple. So here's an example of a modern research paper that uses 1-NN and
120-NN search to solve a problem.]
Example: im2gps
----------------
Paper by James Hays and [our own] Prof. Alexei Efros.
[Goal: given a query photograph, determine where on the planet the photo was
taken. Called _geolocalization_. They evaluated both 1-NN and 120-NN with
a complex set of features. What they did not do, however, is treat each
photograph as one long vector. That's okay for tiny digits, but too expensive
for online travel photographs. Instead, they reduced each photo to a small
_descriptor_ made up of a variety of features that extract the essence of each
photo.]
[Show slides (im2gps.pdf).]
[Features, in rough order from most effective to least:
1. GIST: A compact "image descriptor" based on oriented edge detection
(_Gabor_filters_) + histograms.
2. Textons: A histogram of textures, created after assembling a dictionary of
common textures.
3. A shrunk 16x16 image.
4. A color histogram.
5. Another histogram of edges, this one based on the Canny edge detector,
invented by our own Prof. John Canny.
6. A geometric descriptor that's particularly good for identifying ground, sky,
and vertical lines.]
[Bottom line: With 120-NN, their most sophisticated implementation came within
64 km of the correct location about 50% of the time.]
RELATED CLASSES
===============
[If you like machine learning and you'll still be here next year, here are
some courses you might want to take.]
CS C281A: Statistical Learning Theory (fall)
[C281A is the most direct continuation of CS 189/289A.]
EE 127 (spring), EE 227BT (fall): Numerical optimization [a core of ML]
[It's hard to overemphasize the importance of numerical optimization to
machine learning, as well as other fields of CS like graphics and theory.]
EE 126 (both): Random Processes [Markov chains, expectation max., PageRank]
EE C106A/B (fall/spring): Intro to Robotics [dynamics, control, sensing]
Math 110: Linear Algebra [but the real gold is in Math 221]
Math 221: Matrix Computations (fall?) [How to compute SVDs + eigen, etc.]
CS C280 (spring): Computer Vision
CS C267 (spring): Scientific Computing
[Parallelization, practical matrix algebra, some graph partitioning]
CS 298-115 (fall): Interactive Robotics; Anca Dragan
CS 194-26 (fall): Computational Photography; Alyosha Efros
CS 274 (spring): Computational Geometry; me
[k-d trees, Voronoi diagrams, other geometric search structures]