Natural Image Segmentation via Lossy Compression |
||||
AbstractWe present a novel algorithm for segmentation of natural images that harnesses the principle of minimum description length (MDL). Our method is based on observations that a homogeneously textured region of a natural image can be well modeled by a Gaussian distribution and the region boundary can be effectively coded by an adaptive chain code. The optimal segmentation of an image is the one that gives the shortest coding length for encoding all textures and boundaries in the image, and is obtained via an agglomerative clustering process applied to a hierarchy of decreasing window sizes as multi-scale texture features. The optimal segmentation also provides an accurate estimate of the overall coding length and hence the true entropy of the image. We test our algorithm on the publicly available Berkeley Segmentation Dataset. It achieves state-of-the-art segmentation results compared to other existing methods. Adaptive Texture and Boundary EncodingConstructing Texture FeaturesFor each pixel, we apply a w × w cut-off window around the pixel across the three color channels, and stack the color values inside the window in a vector form. For ease of computation, we reduce the dimensionality of these features by projecting the set of all features X onto their first D principal components.
Adaptive Texture EncodingThe coding length of a region is uniquely determined by the mean and covariance (μ, Σ ) of the texture windows in the region. To estimate (μ, Σ ) empirically we need to exclude the windows that cross the boundary of R:
We now describe encoding these texture vectors based on the lossy MDL principle. For a region R containing N pixels, a good approximation for the optimal coding length of R up to distortion ε^{2} is:
Adaptive Boundary EncodingIn the Freeman chain code coding scheme, the orientation of an edge o_{t}is quantized along eight discrete directions. The difference chain code between two consecutive edges o_{t} and o_{t+1} is Δo_{t} = mod(o_{t} - o_{t+1}, 8). Given the prior distribution P[Δo] of difference chain codes in natural images, the optimal coding of the boundary B(R) is given by:
Segmentation by Minimizing Coding LengthThe total coding length of an image I with respect to a segmentation with non-overlapping regions R_{1},...,R_{k} is: The optimal segmentation of I minimizes this total coding length function, and can often be found through the following agglomerative process. Given an oversegmentation of the image, at each iteration, we find the pair of regions R_{i} and R_{j} that maximally decrease the total coding length if merged: To model the spatial locality of textures, we further construct a region adjacency graph where each vertex v_{i }corresponds to a region R_{i} and the edge e_{ij} is present if and only if regions R_{i} and R_{j} are adjacent in the image. To perform image segmentation, we simply apply a constrained version of the agglomerative procedure - only merging regions that are adjacent in the image. Automatic Choice of DistortionThe opitmality of the distortion ε for each image is measured by the segmentation that best matches with human perception. We learn to predict ε from a set of training images {I_{k}} and their corresponding ground truth {R_{g}(I_{k})}. Consider a discrepancy function d(R_{ε }(I_{k}),R_{g}(I_{k})) which measures the difference between the output and the ground truth segmentation. For each image, as ε deviates from its optimal value, it leads to over or under segmentation. Thus, we approximate d by a convex quadratic function. Denote some feature vector extracted from image I_{k} by f_{k }. The relationship between ε and f_{k} is model by a linear function ε=w^{T}f_{k} , where w is a parameter vector. The model and the quadratic approximation of d lead to the following unconstrained convex program which as closed-form solution. Results on Public Image DatabaseWe report the results of our image segmentation on all images in the Berkeley Segmentation Dataset. For each result, the top image is PRI-Optimized and the bottom image is VOI-Optimized segmentation. Please click on the link below to see results. For the dataset itself, please refer to the link below:MATLAB SoftwareTexture and Boundary Encoding-based Segmentation 1.0 InstallationThis texture segmentation algorithm uses the superpixel code of Mori et al., which includes some c files that need to be compiled. To do this open MATLAB and then run the following commands:> cd superpixels/yu_imncut > mex csparse.c > mex ic.c > mex imnb.c > mex parmatV.c > mex spmd_imncut.c TestingTo test our texture segmentation algorithm, run the following command in MATLAB:> test_texture_seg This will run TBES on an example image from the Berkeley Segmentation Dataset, compute the probabilistic Rand index and variation of information segmentation metrics (in comparison with human results), and display the segmentation results. For information on computing the precision and recall curve, please refer to the Berkeley Segmentation Dataset. CopyrightAuthors: Shankar Rao, Hossein Mobahi, Allen YangContact: Hossein Mobahi: hmobahi2 AT illinois DOT EDU (c) Copyright. University of Illinois, University of California, 2009. Notice: The packages should NOT be used for any commercial purposes without direct consent of the authors. Publications
In the News
References
Comments? Questions? Contact Hossein Mobahi at: hmobahi2 AT illinois DOT EDU |