Next: Previous Work Up: 3D Pose Tracking with Previous: 3D Pose Tracking with

Introduction

Estimating 3D object pose is an important problem in computer vision. In particular, it is a challenging aspect of novel human interface applications, which require fast, accurate head or body tracking. Knowledge of users' body position must arrive quickly to adjust the display of an interface in a meaningful, timely manner. For a method to work on body parts with varied clothing or appearance, it should rely on motion of direct image measurements rather than tracking a priori features or fixed models.

Previous approaches to pose tracking often rely on assumed models of shape to track motion in 3-D from intensity data. This leads to innaccuracies when the real object deviates from the model, which is typically planar or ellipsoidal.

In this paper we show how we can take advantage of recently developed video-rate range sensors to dramatically improve pose tracking performance. We use models which express parametric motion constraints directly on range and intensity image values, since these methods effectively integrate measurement uncertainty both over image regions and over the motion model. These linearized models yield closed-form solutions, which may be computed quickly.

Our method offers two key innovations to existing direct pose estimation frameworks. First, we use the range information to determine the shape of the object, rather than assume a generic model or estimate structure from motion. This shape is updated with each frame, offering a more accurate representation across time than one provided by an initial or off-line range scan. Second, we derive the depth counterpart to the classic brightness change constraint equation. We use both constraints to jointly solve for motion estimates. Observing the change in depth directly, rather than inferring it from intensity change over time (or subtle perspective effects), can yield more accurate estimates of object motion, particularly for rotation out of the image plane and for translation in depth. Depth information is also less sensitive than intensity data to illumination and shading effects as an object translates and rotates through space, and hence the depth change constraint equation is more reliable than the traditional brightness constraint when these photometric effects are significant.

We use a hardware stereo implementation which offers images of registered depth and intensity at video frame rate. This system relies on the non-parametric census stereo correspondence algorithm [12] and currently runs on a single FPGA PCI card attached to a personal computer [11]. Other real-time range sensing technology may also be used as input to our pose estimation method, as long as registered depth images are available at video rate. ``RGBZ'' data can directly resolve many of the usual ambiguities present in a single intensity image; in previous papers we have demonstrated the utility of this information for background segmentation [6] and face detection and tracking [5].

The remainder of this paper proceeds as follows. We first summarize previous work on parametric motion methods for pose estimation and head tracking. We then introduce our joint depth and brightness constraint, suitable for image sequences where gradients can be computed both on intensity and depth information. Next, we show how these constraints can be integrated over image regions according to a rigid-body motion model. This results in a single linear system with an efficient closed-form solution. We derive the system for both perspective and orthographic camera models. We demonstrate results for tracking objects with known motion in synthetic sequences and for tracking the pose of a user's head in real video sequences.

Next: Previous Work Up: 3D Pose Tracking with Previous: 3D Pose Tracking with

Trevor Darrell
9/16/1999