5. METODOLOGIA
5.6 PLAN DE ANÁLISIS EPIDEMIOLÓGICO Y ESTADÍSTICO
Figure 3.3: First-order Hidden Markov Model
§3.3 Particle Filter on Visual Tracking 47
a sequence of states at successive times, with each state associated with observations or measurements. In reality, the true states are usually unknown and hidden; only the observations are accessible. Therefore the challenge of inferring the true states and analysing the behaviour of the system, provided the a priori observations, appears substantial. The first-order Hidden Markov Model is often used to model these dy- namic and sequential properties of systems. It can be represented in the graphical form illustrated in Figure 3.3. Given a set of sequential states{xt;t ∈ N }and its as- sociated observations{yt;t ∈ N }(conforming in the sense of temporal propagation), the causal relationships of statext at timet and its predecessorsx1:t−1(x1:t−1 denotes
the set {xi;i = 1..t−1}), and observationsy1:t are characterised as the propagative conditional probabilities accounting for the uncertainty of dependencies. Substan- tially, the first-order Hidden Markov Model assumes the true state is conditionally independent of all states prior to the immediately previous state, and the observation is dependent solely upon the associated true state but is conditionally independent of all states other than the current state. These are mathematically formulated as:
p(xt|x1:t−1) = p(xt|xt−1) (3.3.1)
p(yt|x1:t) = p(yt|xt) (3.3.2)
With the Hidden Markov Model, the useful posterior p(xt|y1:t) probability in the Bayesian framework can be reformulated in the context of the state space model. Let’s go through the induction by starting with a joint probability equation:
p(xt,y1:t) = p(xt,y1:t) p(xt|y1:t)p(y1:y) =p(yt|y1:t−1,xt)p(y1:t−1,xt) p(xt|y1:t) = p (yt|y1:t−1,xt)p(xt|y1:t−1)p(y1:t−1) p(y1:t) p(xt|y1:t) = p(yt|y1:t−1,xt)p(xt|y1:t−1 ) p(yt|y1:t−1)
With Markov assumptions 3.3.1 and 3.3.2 the equation can be rewritten:
p(xt|y1:t) = p(yt|xt)p(xt|y1:t−1
) p(yt|y1:t−1)
(3.3.3)
By the Chapman Kolmogorov equation:
p(xt|y1:t−1) =
∫
p(xt|xt−1,y1:t−1)p(xt−1|y1:t−1)dxt−1
Again using Markov assumptions:
p(xt|y1:t−1) =
∫
p(xt|xt−1)p(xt−1|y1:t−1)dxt−1 (3.3.4)
Combining 3.3.3 and 3.3.4, the recursive formula can be written as:
p(xt|y1:t) = p
(yt|xt) ∫
p(xt|xt−1)p(xt−1|y1:t−1)dxt−1
p(yt|y1:t−1)
And by removing the normalising constantp(yt|y1:t−1):
p(xt|y1:t)∝ p(yt|xt) ∫
p(xt|xt−1)p(xt−1|y1:t−1)dxt−1 (3.3.5)
This is called the recursive Bayesian filter formula. It actually states that the predictive posterior is dependent upon the likelihood-weighted expectation of temporal dynam- ics / transition priorip(xt|xt−1)with respect to the previous posteriorp(xt−1|y1:t−1).
As the integral in the recursive Bayesian filter equation does not always have a closed form solution, researchers have investigated various sampling techniques to approximate the posterior calculation. The optimum way to explore search space is to allocate evaluations on samples in proportion to their observation likelihood related to the rest of the population. In this way, good candidates receive an exponentially increasing number of evaluations in successive resampling. Sequential Importance
§3.3 Particle Filter on Visual Tracking 49
Resampling is one of the Importance Sampling A.2 techniques which construct a set of samples with associated weights (called importance weights) to approximate a prob- ability distribution. Over time, it automatically updates and adjusts the samples and weights according to current available observations so that the posterior distribution at any time can be approximated by the samples and their weights.
Overall, a generic Particle Filter algorithm is summarised in Algorithm 4 and vi- sually depicted in Figure 3.4.
Algorithm 4A generic Particle Filter algorithm
fori=1 toNdodrawxitfromπ(xit|xit−1,yit)end for fori=1 toNdowi t =wit−1 p(yi t|xit)p(xit|xit−1) π(xi t|xit−1,yit) end for
for i = 1 to N docompute the normalised importance weightswit = wit/∑Nj=1wtj
end for
Compute an estimate of the effective number of samples as Nde f f =1/∑i=N1(wit)2 ifNde f f < Nthrthen
Draw Nsamples from the current sample set with probabilities proportional to
their weights. Replace the current sample set with the new one. fori= 1 toNdosetwit =1/Nend for
end if
Use the temporal model to predict the next statext+1with Gaussian noise to simu-
late uncertainty
It is worth noting that step five of above Algorithm 4 is a resampling step. The pur- pose of it is to deal with the degeneracy problem of the common importance sampling algorithm. After a few iterations, all but one sample will have negligible normalised weight. This has almost zero contribution to the estimate of the posterior p(xt|y1:t) but reduces the effectiveness of the algorithm. It is sensible to eliminate samples when their weights fall below a certain threshold. A measurement of the effectiveness of the estimate, called the number of effective samples Ne f f, was proposed in [Liu, Jun S. and Chen, Rong 1995; Liu and Chen 1998]. It was defined as:
Ne f f =
N
wherew∗i
t = p(xit|y1:t)/π(xit|xit−1,yt). AlthoughNe f f can not be evaluated exactly, an estimateNde f f can be approximated by:
d Ne f f = 1 ∑N i=1(wit)2 (3.3.6)
The number of effective samples indicates the degree of uniformness in spreading the probability mass amongst all the samples. For instance, the even distribution of probability mass has a large value of Ne f f, whereas the peaked distribution of prob- ability mass has a small value of Ne f f which also indicates severe degeneracy. At the fifth step of above Algorithm 4, when Ne f f is smaller than the threshold Nthr, the resampling procedure is performed in order to eliminate samples that have small weights and therefore concentrate on samples with large weights. Alternatively, se- lecting a better importance distributionπ(xit|xti−1,yit)also can minimise the variance ofw∗ti, resulting in largerNe f f. Theoretically, whenπ(xit|xit−1,yit) = p(xit|xit−1,yit), the importance distribution is optimal with a minimum of variance. However, it is always difficult to be resolved.
Chapter 4
Subject Specific Body Shape
Modelling and Automatic
Initialisation
We start with a fundamental question: how much reliable information about the target is available at the upper bounds of tracking accuracy, and is it possible to track with- out knowing anything about the target? To some extent, the information acquired offline is a fundamental prerequisite for guaranteeing successful tracking. Consider- ing tracking as an inference process, the goal is to infer the current state of a subject via available information which is often hidden in both observations and prior knowl- edge. How much effort should be put into extracting information from observations and prior knowledge is situationally dependent. Sometimes, it is easier to extract the current state directly from observation even without prior knowledge. On the other hand, if observations are entangled or information contained in prior knowledge is easier to acquire, it is wise to put more effort into prior knowledge. We believe hu- man tracking belongs to the later case in which the relationship between the current posture and image observations is a complicated function in high dimensional space. Thus, the incorporation of more prior knowledge, such as building a subject specific appearance model for the tracking subject, is a sensible way to attack the problem.
The appearance model can be learned online and dynamically updated through the tracking process. This is an ideal scenario for many tracking problems. However, this scenario requires much more computational burden online, and tracking errors
are easily accumulated in the appearance model, leading to compromised tracking accuracy. On the other hand, subject specific modelling is a one-time effort that pre- builds a high resemblance model of the tracking subject (possibly in a controlled envi- ronment). The precision and details of subject specific modelling are guaranteed and not achievable by the online learning method. One-time modelling shifts the expen- sive online computations to the initialisation stage, as well as providing sound prior knowledge so that tracking becomes more robust and more accurate. This chapter begins with a review in some related works in human body modelling, and describes general articulated human body skeleton and its 3D joint rotation parameterisation in the form of Euler angles, the axis angle and the exponential map, as well as how to avoid rotation parameterisation singularities in optimisation. Two body shape pa- rameterisations, needle based and data-driven based, will be described to generate the subject specific model. Finally, an automatic initialisation is proposed to simultane- ously estimate the body shape and initial posture through a hierarchical optimisation method.
4.1
Related Works
Tremendous efforts have been made to study how to model the human body so that its digital version can be processed and utilised in different scenarios. In the computer graphics community, for example, researchers are concerned with the automatic gen- eration and animation of realistic human models. The applications of a system that simultaneously models pose and body shape include crowd generation for movie or game projects, creation of custom avatars for visual assistants, and usability testing of virtual prototypes. The studies in this category [Allen et al. 2003; Magnenat-Thalmann and Seo 2004; Anguelov et al. 2005a; Allen et al. 2006; Hasler et al. 2009] often use a general data driven approach to build a parameterised template model. In this, hun- dreds of high resolution 3D human body scans are fit with a template model mesh to
§4.1 Related Works 53
register correspondences. Subsequently, the parameterisation is derived for non-rigid surface deformation as a function of pose, body shape, or some combination, and solved by a nonlinear optimiser under correspondence constraints. The established parameterisation allows the template model to simulate a great variety of distinctive individual shapes and poses in realistic details.
To model body shape variation across different people, Allen et al. [Allen et al. 2003] morph a generic template shape into 250 3D scans of different humans in the same pose. The variability of human shape is captured by performing principal com- ponent analysis (PCA) over displacements of the template points. The model is used for hole-filling of scans and fitting a set of sparse markers for people captured in stan- dard poses. A framework is introduced by Thalmann and Seo [Magnenat-Thalmann and Seo 2004] for collecting and managing a set of range scan data to build a modeller that synthesises the realistic appearance of the body model directly from the control parameters. The developed tools are used to help annotate landmarks, automatically estimate skeletal structures for animation, and establish correspondences within the population of captured data. Their modeller then uses this structurally annotated data and synthesises new body models, by blending different models in a way that statistics are implicitly exploited. Consequently, their technique offers a time-saving generation of realistic, animatable body models with high realism, primarily for real- time applications. The SCAPE (Shape Completion and Animation of People) model [Anguelov et al. 2005a] represents both articulated and non-rigid deformation of the human body. The pose deformation model captures how the body shape of a per- son varies as a function of their pose and is parameterised by a set of 3D rotations of skeletal bones. The shape deformation model captures the variability in body shape across different people by shape parameters–a coefficient vector corresponding to a point in a low-dimensional shape space obtained by Principal Component Analysis. More recently, Allen et al. in [Allen et al. 2006] presented a method that learns skin- ning weights for corrective enveloping from 3D scans using a maximum a posteriori
estimation for solving a highly nonlinear function which simultaneously describes the pose, skinning weights, bone parameters, and vertex positions. The weight or height of a character can be changed during the animation and muscle deformation looks significantly more realistic than with linear blend skinning. Recently, a fully differen- tial model describing the pose and shape has been presented by [Hasler et al. 2009]. The representation is uniform in pose and shape but involves solving two equation systems to reconstruct one mesh. Since pose and shape variations are expressed by a differential encoding invariant to rotation and translation, the main drawback of this approach is that the pose and shape cannot be analysed independently.
On the other hand, template based human tracking or articulated pose estima- tion in some relatively early studies of markerless motion capture focuses more on modelling prior information about the tracking subject, avoiding complicated and ex- pensive geometric computations. Many studies [Sminchisescu and Triggs 2003; Balan and Black 2006; Kehl and Van Gool 2006; Sigal et al. 2004] use simple geometric prim- itives (e.g. cylinders and boxes) to approximate the shape of the human body. With the advancement of hardware computational capability, increased numbers of stud- ies begin to utilise more complex and highly detailed template models to incorporate more prior knowledge. Vlasic et al. [Vlasic et al. 2008] address the capture of details in mesh animation from multi-view video recordings. Their approach performs fast pose tracking with minor manual interaction, providing meshes readily usable for editing operations including texturing, deformation transfer, and deformation model learn- ing. In the work [Gall et al. 2009], Gall et al. recover the movement of the skeleton as well as non-rigid temporal deformation of the 3D mesh, providing a laser scanned 3D mesh available. Balan et al. [Balan et al. 2007] extended Anguelov et al.’s SCAPE [Anguelov et al. 2005a] deformation scheme to recover detailed human shape and pose from images. Their results show the SCAPE model is 10% better at explaining image foreground silhouettes than the cylindrical model and makes the likelihood function better behaved. Corazza and Mundermann et al.’s work [Corazza et al. 2010]