Next: Finding consistent example sets Up: Example Based Image Synthesis Previous: Introduction

Modeling smooth and/or linear appearance functions

Traditional interpolation networks work well when object appearance can be modeled either as a linear manifold or as a smooth function over the parameters of interest (describing pose, expression, identity, configuration, etc.). As mentioned above, both PCA and RBF approaches have been successfully applied to model facial expression.

In both approaches, a key step in modeling non-rigid shape appearance from examples is to couple shape and texture into a single representation. Interpolation of shape has been well studied in the computer graphics literature (e.g., splines for key-frame animation) but does not alone render realistic images. PCA or RBF models of images without a shape model can only represent and interpolate within a very limited range of pose or object configuration.

In a coupled representation, texture is modeled in shape-normalized coordinates, and shape is modeled as disparity between examples or displacement from a canonical example to all examples. Image warping is used to generate images for a particular texture and shape. Given a training set $\Omega = \{ (y_i,x_i,d_i), 0\leq i\leq n \}$ , where y_i is the image of example i, x_i is the associated pose or configuration parameter, and d_i is a dense correspondence map relative to a canonical pose, a set of shape-aligned texture images can be computed such that texture t_i warped with displacement d_i renders example image y_i: $y_i = t_i \circ d_i$ [5,1,6]. A new image is constructed using a coupled shape model G and texture model F, based on input u:

$\begin{displaymath} \hat{y}(\Omega,u) = F_{T}(G_{D}(u),u) ~,\end{displaymath}$

where D, T are the matrices [d₀ d₁ ... d_n], [t₀ t₁ ... t_n], respectively.

In PCA-based approaches, G projects a portion of u onto a optimal linear subspace found from D, and F projects a portion of u onto a subspace found from T [6,5]. For example G_D(u)=P_D^m S_g u, where S_g is a diagonal boolean matrix which selects the texture parameters from u and P^m_D is a matrix containing the m-th largest principle components of D. F warps the reconstructed texture according to the given shape: $F_T(u,s)=[ P_T^m S_t u ] \circ s$ . While interpolation is simple using a PCA approach, the parameters used in PCA models often do not have any direct physical interpretation. For the task of view synthesis, an additional mapping u=H(x) is needed to map from task parameters to PCA input values; a backpropogation neural net was used to perform this function for the task of eye gaze analysis [10].

Using the RBF-based approach [1], the application to view synthesis is straightforward. Both G and F are networks which compute locally-weighted regression, and parameters are used directly (u=x). G computes an interpolated shape, and F warps and blends the example texture images according to that shape: $G_D(x)=\sum_i c_i f(x-x_i)$ , $F_T(x,s)=[ \sum_i c'_i f(x-x_i) ] \circ s$ , where f is a radial basis function. The coefficients c and c' are derived from D and T, respectively: C = DR⁺, where r_ij = f(x_i-x_j) and C is the matrix of row vectors c_i; similarly C'=TR⁺ [9]. We have found both vector norm and Gaussian basis functions give good results when appearance data is from a smooth function; the results below use f(r)=||r||.

The method presented below for grouping examples into locally valid spaces is generally applicable to both the PCA and RBF-based view synthesis techniques. However our initial implementation, and the results reported in this paper, have been with RBF-based models.

Next: Finding consistent example sets Up: Example Based Image Synthesis Previous: Introduction

Trevor Darrell
9/11/1998