classify approaches for human body segmentation into the following categories.
The first includes interactive methods that expect user input in order to
discriminate the foreground and background. Interactive segmentation methods
are useful for generic applications, and have the potential to produce very
accurate results in complex cases. However, since they rely on low-level cues
and do not employ object-specific knowledge, they often require user input to
guide their process, and are inappropriate for many real-world problems, where
automation is necessary. In general, this category differs from the other two,
which are automatic and often task specific.
second category includes top-down approaches, which are based upon a priori
knowledge, and use the image content to further refine an initial model.
Top-down approaches have been proposed as solutions to the problem of
segmenting human bodies from static images. The main characteristic of these
approaches is that they require high-level knowledge about the foreground,
which in the case of humans is their pose. One method for object recognition
and pose estimation is the pictorial structures (PS) model and its variations.
In general, human body segmentation approaches based on PS models can deal with
various poses, but they rely on high-level models that might fail in complex
scenarios, restricting the success of the end results. Besides, high-level
inference is time consuming and, thus, these methods usually are
approaches use low-level elements, such as pixels or super pixels, and try to
group them into semantic entities of higher levels. In, multiple levels of a
segmentation hierarchy are employed, and the algorithm involves a part
searching step over all produced segments, which is computationally expensive.
In foreground regions are sampled using small masks, which is not sufficient to
model the clothing in complex scenarios (nonuniform clothing, cluttered
background, different poses). In this methodology, we combine cues from
multiple levels of segmentation. In the human body is assumed to be inside a
large mask, but due to the variability of human poses, this assumption often
fails, and the sampling may lead to unrecoverable errors. In this study, we
propose a more refined searching process for the torso and legs, where we try
to find arbitrary salient regions in the regions that correspond to them.
Salient regions are comprised of segments that appear strongly inside the
hypothesized foreground and weakly in the background. By considering the
foreground and background conjunctively, we alleviate the need for exact mask
fitting and dense searching, and we allow the masks to be large according to
anthropometric constraints so that they may perform sufficient sampling in
fewer steps. Pose estimation can be considered as a higher level problem
compared with body segmentation, and many prefer to use a bottom-up approach to
facilitate body part estimation and pose recognition.
methodologies leverage the advantages of bottom-up and top-down approaches.
Perceptual groupings from a bottom up approach in many cases provide a good
foundation to cope with the high number of poses. Shape templates of body parts
applied on image segments are a way of solving the segmentation puzzle. In
order to alleviate the need for numerous shape templates, decompose the problem
into upper and lower body estimation, similarly to ours, but employ fitting of an
accurate torso model. In, two deformable models are designed to segment the human
body on two-scale super pixels, and in, a coarse-to-fine strategy is used. Li
et al. extend these concepts by combining a kinematic model and data driven
graph cut. Tauscher and Collins present a Bayesian framework for jointly
estimating articulated body pose, and the pixel-level segmentation of each body
part is proposed. The results of the methodology are very promising, but to
tackle the complexity of the problem, even for standing positions, the proposed
model needs to be optimized over a high number of parameters.