Segmentation

$\begin{displaymath} \mathcal{F} \equiv \vert\mathbf{P_i} - \mathbf{P_m}\vert \gt k\sigma, \end{displaymath}$

However, we must also take into account low confidence values, as well as the effect of shadows. The treatment of low confidence values is slightly different for range and color comparisons. At each pixel we will describe conservative foreground criteria, $\mathcal{F}_r$ and $\mathcal{F}_c$ for range and for color respectively based on the above general case. Then our final segmentation is a disjunction of the two criteria. The following sections describe the use of range, color, and their combination in more detail. Results of the combined segmentation are compared with using only range or color.

Use of Range

The presence of low confidence range values, which we have been referring to as invalid, in either the image or in the background model complicates our segmentation process. The most conservative approach would be to discount range in the segmentation decision unless range values in both frame i and the model, r_i and r_m respectively, are valid. We actually allow foreground decisions to be made when r_m is invalid but r_i is valid and smoothly connected to regions where foreground decisions have been made in the presence of valid background data:

$\begin{displaymath} \mathcal{F}_r \equiv \mathrm{Valid}(r_i) \wedge (\nabla r_i ... ...g( \mathrm{Valid}(r_m) \wedge (\vert r_i - r_m\vert < k\sigma) \end{displaymath}$

As is shown by Figure 5, using the background model we can correctly classify the table (refer to original scene image in Figure 1) as background even though it is at same depth as the person. Note that Z-keying methods would fail in this case [5].

Use of Color

Shadows of foreground elements will cause appearance changes on the background. With out special treatment these appearance changes will be included in the foreground segmentation, which is usually not desirable. We attempt to minimize the impact of shadows in several ways. First, we use a luminance-normalized color space, $(\frac{R_i}{Y_i}, \frac{G_i}{Y_i}, \frac{B_i}{Y_i})$ , which reduces the differences between a background object and itself under lighting changes induced by shadows or interreflections. We will refer to the distance between a pixel's value and the model in this color space as $\Delta\mathrm{color}$ . This color representation becomes unstable or undefined when the luminance is close to zero, hence we define $\mathrm{YValid}(Y) \equiv Y \gt Y_{min}.$ Our primary criterion for foreground segmentation is $\Delta\mathrm{color}$ which essentially corresponds to a hue difference in the context of valid luminance. We augment this comparison with a luminance ratio criterion and a final luminance comparison in the context of invalid model luminance.

Range-based adaptive thresholding

As we mention above, we minimize the impact of shadows by using a luminance-normalized color space. However there still remains a tradeoff in setting $c\sigma$ to be tolerant of remaining artifacts from strong shadows as well as maintaining integrity of the true foreground regions. We alleviate this tradeoff by using depth information to dynamically adjust our color matching criterion. We modify this simple scheme by increasing the color threshold wherever the depth data indicates that a pixel belongs to the background. This has the effect of allowing us to be more lenient in our color matching within regions which appear to be at background depth, thereby allowing us to do a better job of ignoring shadows in these regions, while not compromising the restrictiveness of our color matching within regions in which depth is uncertain. (Note: Where depth indicates that a pixel is in the foreground, color matching is unimportant since the depth information alone is sufficient for correct segmentation.)

Figure 4 shows a case where a person casts a strong shadow on the wall. The middle left image shows the combined range and color-based segmentation when the color threshold is not adapted according to depth information. In this case, the shadow on the wall is sufficiently dark that it exceeds the color threshold setting, and causes the shadow to be labeled as foreground even though depth information indicates that it is background. If this color threshold is simply increased in order to remove the shadow (middle right image), valid parts of the foreground are eroded. The bottom image shows the combined range and color-based segmentation when the original color threshold is adaptively raised wherever the depth matches the background. The shadow is largely eliminated, while the remainder of the foreground is not impacted.

**Figure:** Top: Background image, person in foreground casting a strong shadow. Middle left: Basic color segmentation, shadow remains. Middle right: Effect on color segmentation when using the higher threshold for the entire image: skin tones close to background color are eroded. Bottom: large portions of shadow removed with adaptive (range-based) threshold.
$\begin{figure} \twofigw{/home/gaile/text/Figures/BackSeg/wallback.ps}{/home/gail... ...in} \onefigw{/home/gaile/text/Figures/BackSeg/shadowgone2.ps}{1.5in}\end{figure}$

Combining Color and Range

We take a disjuction of the previous results to produce our final segmentation criteria, $\mathcal{F} \equiv \mathcal{F}_r \vee \mathcal{F}_c$ . A pixel identified as foreground based on either depth or color is taken to be foreground in the combined segmentation.

This result will often contain small isolated foreground points caused by noise in color or range. There may also be some remaining small holes in the foreground. We fill the foreground holes using a morphological closing with a small structuring element. We can then take connected components over a certain minimum area as the final foreground segmentation result. The minimum area criteria can be set conservatively, to eliminate only noise related foreground elements, or it can be set at higher values based on the expected absolute size of ``interesting'' foreground elements, e.g. to select people and not pets.

Segmentation Results

The most compelling demonstration of this segmentation algorithm is to compare the segmentation results based on color or range alone with those achieved by the combined process. In particular, we use the examples presented in our introduction in Figures 1 and 2. Comparisons are presented in Figures 5 and 6 respectively.

We see that both cases produce more complete foreground segmentation. The holes present in range-based results are filled based on color comparison, and the holes present in color based results are filled based on range comparison. Using the joint segmentation approach, the only areas which would remain as problems are large regions with no valid range and colors similar to the background.

**Figure:** Top row: range only segmentation, color only segmentation. Bottom: joint segmentation results.
$\begin{figure} \twofigw{/home/gaile/text/Figures/BackSeg/for2depth.ps}{/home/gai... ....5in} \onefigw{/home/gaile/text/Figures/BackSeg/for2both.ps}{1.75in}\end{figure}$

**Figure:** Top row: range only segmentation, color only segmentation. Bottom: joint segmentation results.
$\begin{figure} \twofigw{/home/gaile/text/Figures/BackSeg/for506depth.ps}{/home/g... ...1.5in} \onefigw{/home/gaile/text/Figures/BackSeg/for506both.ps}{2in}\end{figure}$

It is relevant to note that our use of range data does tend to produce a ``halo'' around foreground objects not present in the color only segmentation. Disparity maps produced by the census algorithm often include this halo effect in which pixels outside the perimeter of the foreground object are labeled as being at the depth of the foreground object. This error results from the fact that correlation-based stereo algorithms use windows much larger than a single pixel to determine correspondence, which works well in the case where the disparity for the entire window is constant. At depth discontinuities, the correlation window includes pixels with quite distinct disparities. Such depth discontinuities are often correlated with marked intensity change. Often this intensity change is the most significant feature in a correlation window. For a point just outside the perimeter of the foreground object, windows centered at the point in both views will share the significant intensity change and hence the point will be labeled as being at the depth of the foreground object. Although not presented here, we are also investigating the use of color discontinuities to correct for the halo effect in range which slightly corrupts the silhouette boundaries in these results.