previous up next
Next: Finding consistent example sets Up: Example Based Image Synthesis Previous: Introduction

Modeling smooth and/or linear appearance functions

Traditional interpolation networks work well when object appearance can be modeled either as a linear manifold or as a smooth function over the parameters of interest (describing pose, expression, identity, configuration, etc.). As mentioned above, both PCA and RBF approaches have been successfully applied to model facial expression.

In both approaches, a key step in modeling non-rigid shape appearance from examples is to couple shape and texture into a single representation. Interpolation of shape has been well studied in the computer graphics literature (e.g., splines for key-frame animation) but does not alone render realistic images. PCA or RBF models of images without a shape model can only represent and interpolate within a very limited range of pose or object configuration.

In a coupled representation, texture is modeled in shape-normalized coordinates, and shape is modeled as disparity between examples or displacement from a canonical example to all examples. Image warping is used to generate images for a particular texture and shape. Given a training set $\Omega = \{ (y_i,x_i,d_i), 0\leq i\leq n
\}$, where yi is the image of example i, xi is the associated pose or configuration parameter, and di is a dense correspondence map relative to a canonical pose, a set of shape-aligned texture images can be computed such that texture ti warped with displacement di renders example image yi: $y_i = t_i \circ d_i $ [5,1,6]. A new image is constructed using a coupled shape model G and texture model F, based on input u:

\begin{displaymath}
\hat{y}(\Omega,u) = F_{T}(G_{D}(u),u) ~,\end{displaymath}

where D, T are the matrices [d0 d1 ... dn], [t0 t1 ... tn], respectively.

In PCA-based approaches, G projects a portion of u onto a optimal linear subspace found from D, and F projects a portion of u onto a subspace found from T [6,5]. For example GD(u)=PDm Sg u, where Sg is a diagonal boolean matrix which selects the texture parameters from u and PmD is a matrix containing the m-th largest principle components of D. F warps the reconstructed texture according to the given shape: $F_T(u,s)=[ P_T^m S_t u ]
\circ s$. While interpolation is simple using a PCA approach, the parameters used in PCA models often do not have any direct physical interpretation. For the task of view synthesis, an additional mapping u=H(x) is needed to map from task parameters to PCA input values; a backpropogation neural net was used to perform this function for the task of eye gaze analysis [10].

Using the RBF-based approach [1], the application to view synthesis is straightforward. Both G and F are networks which compute locally-weighted regression, and parameters are used directly (u=x). G computes an interpolated shape, and F warps and blends the example texture images according to that shape: $G_D(x)=\sum_i c_i f(x-x_i)$, $F_T(x,s)=[ \sum_i c'_i f(x-x_i) ] \circ s$, where f is a radial basis function. The coefficients c and c' are derived from D and T, respectively: C = DR+, where rij = f(xi-xj) and C is the matrix of row vectors ci; similarly C'=TR+ [9]. We have found both vector norm and Gaussian basis functions give good results when appearance data is from a smooth function; the results below use f(r)=||r||.

The method presented below for grouping examples into locally valid spaces is generally applicable to both the PCA and RBF-based view synthesis techniques. However our initial implementation, and the results reported in this paper, have been with RBF-based models.


previous up next
Next: Finding consistent example sets Up: Example Based Image Synthesis Previous: Introduction
Trevor Darrell
9/11/1998