One of the fundamental problems in computer vision is to identify feature points or salient regions in images. Since these distinctive points or regions are normally used to establish correspondence or measure similarity across different images, they are desired to have properties that are somewhat stable or invariant under transformations incurred by changes in viewpoint or illumination. Thus, the problem is to
extract regions in a 2D image that correspond to a very rich class of regular patterns on a planar surface in 3D.
The basic idea of our approach is to view each image as a matrix and seek a transformation that gives rise to a low-rank matrix subject to sparse errors. We consider a
m x
n image
I^{0} of a texture to be of low-rank if the rank of the matrix
I_{0} is much smaller than both
m and
n. Suppose we are given an image
I of an object with a window containing a low-rank texture. The scene could be warped and corrupted by errors or occlusions, and therefore,
I does not have low rank. We model the warping by a global domain transformation belonging to a finite-dimensional group. Thus, we have a transformation τ such that the transformed image
I ο τ has low rank, subject to a sparse error term
E that compensates for large, gross errors such as partial occlusions. We can formulate this problem in Lagrangian form as follows:
where ||
E||
_{0} represents the number of non-zero entries in the matrix
E, and γ is an appropriately chosen positive constant. Although the above problem follows naturally from our problem formulation, the cost function is highly non-convex and discontinuous, and the equality constraint is highly non-linear. Thus, it is intractable, and of little practical use.
Recent work in compressed sensing and matrix completion suggests that the above problem can be efficiently solved, under quite general conditions, by replacing the cost function with its convex surrogate, as shown below:
where || ||
_{*} represents the matrix
nuclear norm (the sum of singular values), || ||
_{1} represents the matrix 1-norm (the sum of absolute values of the entries), and λ is a positive weighting factor. Although the cost function is convex and continuous, the constraint is non-linear. Therefore, we solve the optimization problem by iterative linearization of the constraint. We also add a couple of extra constraints to avoid converging to a local minima. For more details on the algorithm and implementation, please refer to
our paper.
The videos below demonstrate the working of the TILT algorithm. The input image is that of a building whose facade is not parallel to the image plane. In the video, each frame corresponds to one iteration of the TILT algorithm. In the input video (left), the input window is shown as a red box, and the output of each iteration is shown by a green box. The transformation is applied to the entire image in the output video (right). The final frame shows the low-rank representation returned by the TILT algorithm. The TILT algorithm computes the correct transformation despite the presence of occlusions such as tree branches in the image.