Next: Conclusion
Up: Background estimation and removal
Previous: Background Estimation
However, we must also take into account low confidence values, as well as the effect of shadows. The treatment of low confidence values is slightly different for range and color comparisons. At each pixel we will describe conservative foreground criteria, and for range and for color respectively based on the above general case. Then our final segmentation is a disjunction of the two criteria. The following sections describe the use of range, color, and their combination in more detail. Results of the combined segmentation are compared with using only range or color.
The presence of low confidence range values, which we have been referring to as invalid, in either the image or in the background model complicates our segmentation process. The most conservative approach would be to discount range in the segmentation decision unless range values in both frame i and the model, ri and rm respectively, are valid. We actually allow foreground decisions to be made when rm is invalid but ri is valid and smoothly connected to regions where foreground decisions have been made in the presence of valid background data:
where is the local gradient of ri. Gradient values above G represent discontinuities in range, so this value is set based on the expected smoothness of foreground objects.As is shown by Figure 5, using the background model we can correctly classify the table (refer to original scene image in Figure 1) as background even though it is at same depth as the person. Note that Z-keying methods would fail in this case [5].
Shadows of foreground elements will cause appearance changes on the
background. With out special treatment these appearance changes will
be included in the foreground segmentation, which is usually not
desirable. We attempt to minimize the impact of shadows in several
ways. First, we use a luminance-normalized color space,
, which reduces
the differences between a background object and itself under lighting
changes induced by shadows or interreflections. We will refer to the
distance between a pixel's value and the model in this color space as
. This color representation becomes unstable or
undefined when the luminance is close to zero, hence we define
Our primary criterion for
foreground segmentation is which essentially
corresponds to a hue difference in the context of valid
luminance. We augment this comparison with a luminance ratio
criterion and a final luminance comparison in the context of invalid
model luminance.
As we mention above, we minimize the impact of shadows by using a luminance-normalized color space. However there still remains a tradeoff in setting to be tolerant of remaining artifacts from strong shadows as well as maintaining integrity of the true foreground regions. We alleviate this tradeoff by using depth information to dynamically adjust our color matching criterion. We modify this simple scheme by increasing the color threshold wherever the depth data indicates that a pixel belongs to the background. This has the effect of allowing us to be more lenient in our color matching within regions which appear to be at background depth, thereby allowing us to do a better job of ignoring shadows in these regions, while not compromising the restrictiveness of our color matching within regions in which depth is uncertain. (Note: Where depth indicates that a pixel is in the foreground, color matching is unimportant since the depth information alone is sufficient for correct segmentation.)
Figure 4 shows a case where a person casts a strong shadow on the wall. The middle left image shows the combined range and color-based segmentation when the color threshold is not adapted according to depth information. In this case, the shadow on the wall is sufficiently dark that it exceeds the color threshold setting, and causes the shadow to be labeled as foreground even though depth information indicates that it is background. If this color threshold is simply increased in order to remove the shadow (middle right image), valid parts of the foreground are eroded. The bottom image shows the combined range and color-based segmentation when the original color threshold is adaptively raised wherever the depth matches the background. The shadow is largely eliminated, while the remainder of the foreground is not impacted.
We take a disjuction of the previous results to produce our final segmentation criteria, . A pixel identified as foreground based on either depth or color is taken to be foreground in the combined segmentation.
This result will often contain small isolated foreground points caused by noise in color or range. There may also be some remaining small holes in the foreground. We fill the foreground holes using a morphological closing with a small structuring element. We can then take connected components over a certain minimum area as the final foreground segmentation result. The minimum area criteria can be set conservatively, to eliminate only noise related foreground elements, or it can be set at higher values based on the expected absolute size of ``interesting'' foreground elements, e.g. to select people and not pets.
The most compelling demonstration of this segmentation algorithm is to compare the segmentation results based on color or range alone with those achieved by the combined process. In particular, we use the examples presented in our introduction in Figures 1 and 2. Comparisons are presented in Figures 5 and 6 respectively.
We see that both cases produce more complete foreground segmentation. The holes present in range-based results are filled based on color comparison, and the holes present in color based results are filled based on range comparison. Using the joint segmentation approach, the only areas which would remain as problems are large regions with no valid range and colors similar to the background.
It is relevant to note that our use of range data does tend to produce a ``halo'' around foreground objects not present in the color only segmentation. Disparity maps produced by the census algorithm often include this halo effect in which pixels outside the perimeter of the foreground object are labeled as being at the depth of the foreground object. This error results from the fact that correlation-based stereo algorithms use windows much larger than a single pixel to determine correspondence, which works well in the case where the disparity for the entire window is constant. At depth discontinuities, the correlation window includes pixels with quite distinct disparities. Such depth discontinuities are often correlated with marked intensity change. Often this intensity change is the most significant feature in a correlation window. For a point just outside the perimeter of the foreground object, windows centered at the point in both views will share the significant intensity change and hence the point will be labeled as being at the depth of the foreground object. Although not presented here, we are also investigating the use of color discontinuities to correct for the halo effect in range which slightly corrupts the silhouette boundaries in these results.
G. Gordon, T. Darrell, M. Harville, J. Woodfill."Background estimation and removal based on range and color,"Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (Fort Collins, CO), June 1999.