SPEEDING UP NN (continued) ============== [Recall from last lecture that Voronoi diagrams can be used in very low dimensions, 2 through perhaps 5; and k-d trees are used for moderate dimensions, up to about 30. In high-dimensional spaces, people usually use exhaustive search or they apply dimensionality reduction so they can use k-d trees. Occasionally people use locality-sensitive hashing, and it will be interesting to see if it develops to the point where it reliably beats exhaustive search on many applications.] Voronoi diagrams ---------------- Let P be a point set. The _Voronoi_cell_ of w in P is Vor w = { p in R^d : |pw| <= |pv| for all v in P } [A Voronoi cell is always a convex polyhedron or polytope.] The _Voronoi_diagram_ of P is the set of P's Voronoi cells. [Show Voronoi diagram illustrations (voronoi.pdf, vormcdonalds.jpg, voronoiGregorEichinger.jpg, saltflat-1.jpg, giraffe-1.jpg, vortex.pdf).] [Voronoi diagrams sometimes arise in nature. The first deep study of Voronoi diagrams occurred in the field of crystallography.] [Believe it or not, the first published Voronoi diagram dates back to 1644, in the book "Principia Philosophiae" by the famous mathematician and philosopher Rene Descartes. He claimed that the solar system consists of vortices. In each region, matter is revolving around one of the fixed stars. His physics was wrong, but his idea of dividing space into polyhedral regions has survived (vortex.pdf).] Size (e.g. # of vertices) is in O(n^ceiling(d/2)) [This upper bound is tight when d is a small constant. As d grows, the tightest upper bound is somewhat smaller than this, but the complexity still grows exponentially with d.] ...but often in practice it is O(n). [Here I'm leaving out a constant that grows exponentially with d.] _Point_location_: Given query point v, find the point w for which v in Vor w. 2D: O(n log n) time to compute V.d. and a _trapezoidal_map_ for pt location O(log n) query time [because of the trapezoidal map] [That's a pretty great running time compared to the linear query time of exhaustive search.] dD: Use _binary_space_partition_tree_ (BSP tree) for pt location [Unfortunately, it's difficult to characterize the running time of this strategy, although it is likely to be reasonably fast 1-NN only! What about k-NN? _order-k_Voronoi_diagram_ has a cell for each possible k nearest neighbors. [Show figure of order-2 Voronoi diagram (2ordervoronoi.pdf).] In 2D, size is in O(k^2 n) [k-order Voronoi diagrams are rarely used in practice for k > 1, partly because the size grows rapidly with k; partly because there's no software available. They're good for 1-nearest neighbor in 2 or 3 dimensions, maybe 4 or 5, but for anything beyond that, k-d trees are much simpler and probably faster.] [There are also Voronoi diagrams for other distance metrics, like the L_1 and L_infinity norms.] [If you want to know how to compute Voronoi diagrams and point location data structures for them, consider my course CS 274, Computational Geometry, which I'll probably teach next spring.] k-d Trees --------- Decision trees for NN search. Differences: - No entropy. Split dimension w/greatest variance or width (max - min). [With nearest neighbor search, we don't care about the entropy. Instead, what we want is that if we draw a sphere around the query point, it won't intersect very many boxes of the decision tree. So it helps if the boxes are nearly cubical, rather than long and thin.] - Each internal node stores a sample point. [...that lies in the node's box.] [We don't have nodes only at the leaves; we have them at internal nodes, because when we search the tree for a query point, we want to stop searching as early as possible.] [Draw k-d tree. See PDF file.] Given query pt q, find a sample pt p such that |qp| <= (1 + epsilon) |qs|, where s is the closest pt. [If epsilon = 0, we're asking for an exact nearest neighbor; if epsilon is positive, we're asking for an approximate nearest neighbor.] The alg. maintains: - Nearest neighbor found so far (or k nearest). goes down v - Heap of unexplored subtrees, keyed by distance from q. goes up ^ [Each subtree represents a box, and we measure the distance from q to the nearest point in that box and use it as a key for the subtree in the heap. The search stops when the distance to the kth nearest neighbor found so far and the distance to the nearest unexplored box meet in the middle.] [Draw figure of search in progress. See PDF file.] Q <- heap containing root node of tree r <- infinity while Q has a cell closer to q than r / (1 + epsilon) C <- removemin(Q) p <- C's sample point r <- min { r, |qp| } [optimization: store square of distance instead.] C', C'' <- child cells of C insert(Q, C') [optimization: check whether insert(Q, C'') distance >= r / (1 + epsilon) before inserting.] return point that determined r [This procedure tries to avoid searching most of the subtrees by searching the boxes close to q first.] [Draw figure of worst-case exact search. See PDF file. In the worst case, we may have to look at every node in the k-d tree to find the nearest neighbor. In that case, the k-d tree is slower than simple exhaustive search. This is an example where the *approximate* nearest neighbor algorithm can be much faster. In practice, settling for an approximate nearest neighbor sometimes improves the speed by a factor of 10 or even 100, because you don't need to look at most of the tree to do a query.] For k-NN, replace "r" with a max-heap holding the k nearest neighbors [...just like in the exhaustive search algorithm I discussed last lecture.] Works with any L_p norm for p in [1, infinity]. [k-d trees are not limited to the Euclidean, L_2 norm.] Software: - ANN (David Mount & Sunil Arya, U. Maryland) - FLANN (Marius Muja & David Lowe, U. British Columbia) - GeRaF (Georgios Samaras, U. Athens) [random forests!] [I want to punctuate the fact that exhaustive nearest neighbor search really is one of the first algorithms you should try in practice, even if it seems too simple. So here's an example of a modern research paper that uses 1-NN and 120-NN search to solve a problem.] Example: im2gps ---------------- Paper by James Hays and [our own] Prof. Alexei Efros. [Goal: given a query photograph, determine where on the planet the photo was taken. Called _geolocalization_. They evaluated both 1-NN and 120-NN with a complex set of features. What they did not do, however, is treat each photograph as one long vector. That's okay for tiny digits, but too expensive for online travel photographs. Instead, they reduced each photo to a small _descriptor_ made up of a variety of features that extract the essence of each photo.] [Show slides (im2gps.pdf).] [Features, in rough order from most effective to least: 1. GIST: A compact "image descriptor" based on oriented edge detection (_Gabor_filters_) + histograms. 2. Textons: A histogram of textures, created after assembling a dictionary of common textures. 3. A shrunk 16x16 image. 4. A color histogram. 5. Another histogram of edges, this one based on the Canny edge detector, invented by our own Prof. John Canny. 6. A geometric descriptor that's particularly good for identifying ground, sky, and vertical lines.] [Bottom line: With 120-NN, their most sophisticated implementation came within 64 km of the correct location about 50% of the time.] RELATED CLASSES =============== [If you like machine learning and you'll still be here next year, here are some courses you might want to take.] CS C281A: Statistical Learning Theory (fall) [C281A is the most direct continuation of CS 189/289A.] EE 127 (spring), EE 227BT (fall): Numerical optimization [a core of ML] [It's hard to overemphasize the importance of numerical optimization to machine learning, as well as other fields of CS like graphics and theory.] EE 126 (both): Random Processes [Markov chains, expectation max., PageRank] EE C106A/B (fall/spring): Intro to Robotics [dynamics, control, sensing] Math 110: Linear Algebra [but the real gold is in Math 221] Math 221: Matrix Computations (fall?) [How to compute SVDs + eigen, etc.] CS C280 (spring): Computer Vision CS C267 (spring): Scientific Computing [Parallelization, practical matrix algebra, some graph partitioning] CS 298-115 (fall): Interactive Robotics; Anca Dragan CS 194-26 (fall): Computational Photography; Alyosha Efros CS 274 (spring): Computational Geometry; me [k-d trees, Voronoi diagrams, other geometric search structures]