There are cases when the optimum filtering problem can be solved exactly. One such case is when X is a finite countable set [Rabiner, 1989]. Also, in linear Gaussian state-space models the densities in (3.9) and (3.10) are obtained by the Kalman filter [Kalman, 1960]. In general, however, these densities do not admit a close form expression and one has to use methods based on numerical approximations. One such approach is to use grid-based methods, where the continuousX is approximated by its finite discretised version and the update rules are used as in the case of finite state HMMs. Another approach is extended
one and performs the Kalman filter afterwards. The method fails if the nonlinearity in the HMM is substantial. An improved approach based on the Kalman filter is the unscented
Kalman filter [Julier and Uhlmann, 1997], which is based on a deterministic selection of
sigma-points from the support of the state distribution of interest such that the mean and the variance of the true distribution are preserved by the sample mean and covariance calculated at these selected sigma-points. All of these methods are deterministic and not capable of dealing with the most general state-space models; in particular they will fail when the dimensions or the nonlinearities increase.
Alternative to the deterministic approximation methods, Monte Carlo can provide a robust and efficient solution to the optimal filtering problem. SMC methods for opti- mal filtering, also known as particle filters, have been shown to produce more accurate estimates than the deterministic methods mentioned [Doucet et al., 2000b; Durbin and Koopman, 2000; Kitagawa, 1996; Liu and Chen, 1998]. Some of the good tutorials on SMC methods for filtering as well as smoothing in HMMs are Doucet et al. [2000b], Aru- lampalam et al. [2002], Capp´e et al. [2007], Fearnhead [2008], and Doucet and Johansen [2009], from the earliest to the most recent. One can also see Doucet et al. [2001] as a reference book, although a bit outdated. Also, the book Del Moral [2004] contains a rigorous review of numerous theoretical aspects of the SMC methodology in a different framework where a SMC method is treated as an interacting particle system associated with the mean field interpretation of a Feynman-Kac flow.
With reference to the Monte Carlo methodology covered in Chapter 2, the filtering problem in state space models can be considered as a sequential inference problem for the sequence of probability distributions πθ,n on the product measurable spaces (Xn =
Xn,E
n=E⊗(n))
πθ,n(dx1:n) := pθ(x1:n|y1:n)λ(dx1:n).
As we saw Section 2.5, we can perform SIS and SISR methods targeting {πθ,n}n≥1. The
SMC proposal distribution at time n, denoted as qθ,n, is designed conditional to the
observations up to time n and state values up to time n− 1; and in the most general case it can be written as qθ,n(dx1:n) := Qθ,1(y1, dx1) n Y t=2 Qθ,t[(x1:t−1, y1:t), dxt] = qθ,n−1(dx1:n−1)Qθ,n[(x1:n−1, y1:n), dxn] (3.12)
In fact, most of the time the transition kernel Qθ,n only depends only on the current
observation and the previous state, hence we simplify (3.12) by defining Qθ : X × Y →
P(E) and taking
for all n ≥ 1 with the convention Qθ[(x0, y1), x1] = Qθ(y1, x1). Suppose we design
Qθ[(x, y),·] such that it is absolutely continuous with respect to λ with density qθ(·|x, y).
Therefore, we can write
qθ,n(dx1:n) = " qθ(x1|y1) n Y t=2 qθ(xt|xt−1, yt) # λ(dx1:n) (3.13)
If we wanted to perform SMC using the target distribution πθ,n directly, then we would
have to calculate the following incremental weight at time n dπθ,n dπθ,n−1⊗ Qθ (x1:n) = fθ(xn|xn−1)gθ(yn|xn) pθ(yn|y1:n−1)qθ(xn|xn−1, yn) ∝ fθ(xn|xn−1)gθ(yn|xn) qθ(xn|xn−1, yn) .
In most of the applications pθ(yn|y1:n−1) can not be calculated, hence dπ dπθ,n
θ,n−1⊗Qθ(x1:n) is
not available. For this reason, instead of πθ,n SMC methods use the following unnor-
malised measure for importance sampling b
πθ,n(dx1:n) = pθ(x1:n, y1:n)λ(dx1:n),
where the normalising constant is pθ(y1:n), the likelihood of observations up to time n.
In that case, the importance weight for the whole path X1:n is given by
wn(x1:n) = wn−1(x1:n−1)wn|n−1(xn−1, xn),
where the incremental importance weight wn|n−1(x1:n) is
wn|n−1(xn−1, xn) =
fθ(xn|xn−1)gθ(yn|xn)
qθ(xn|xn−1, yn)
.
Algorithm 3.1. SISR (Particle filter) for HMM
For n = 1; for i = 1, . . . , N sample X1(i) ∼ qθ(·|y1), set W1(i) ∝
µθ(X1(i))gθ(y1|X1(i))
qθ(X(i)1 |y1)
. For n = 2, 3, . . .
• Resample {X1:n−1(i) }1≤i≤N according to the weights {Wn−1(i) }1≤i≤N to get resampled
particles{ eX1:n−1(i) }1≤i≤N with weight 1/N.
• For i = 1, . . . , N; sample Xn(i) ∼ qθ(·| eXn−1(i) , yn), set X1:n(i) = ( eX (i) 1:n−1, X (i) n ), and set Wn(i) ∝ fθ(X (i) n | eXn−1(i) )gθ(yn|Xn(i)) qθ(Xn(i)| eXn−1(i) , yn) .
We present the SISR algorithm, aka the particle filter, for general state-space models in Algorithm 3.1, reminding that SIS is a special type of SISR where there is no resampling.
In the following we list some of the aspects of the particle filter.
• As in the general SISR algorithm, we can use an optional resampling scheme, where we do resampling only when the estimated effective sampling size decreases below a threshold value.
• A by-product of the particle filter is that it can provide unbiased estimates for unknown normalising constants of the target distribution [Del Moral, 2004, Chapter 7]. For example, when SISR is used with an optional sampling scheme, if the last time prior to n when resampling was performed is k, an unbiased estimator of pθ(yk+1:n|y1:k) can be obtained as pθ(yk+1:n|y1:k)≈ 1 N N X i=1 n Y t=k+1 wt|t−1(Xt−1(i) , X (i) t ).
We will come back to this aspect of the particle filter in Section 3.4.1.
• The choice of the kernel Qθ for the proposal distribution in the particle filter is
important to ensure effective SMC approximation. The first genuine particle filter in the literature, proposed by Gordon et al. [1993], involved proposing from the prior distribution of X1:n, hence taking qθ(xn|xn−1, yn) = fθ(xn|xn−1) and the resulting
particle filter with this particular choice of Qθ is called the bootstrap filter. Another
interesting choice is to take qθ(xn|xn−1, yn) = qθ(xn|yn), which can be useful when
observations provide significant information about the hidden state but the state dynamics are weak. This proposal was introduced in Lin et al. [2005] and the resulting particle filter was called independent particle filter. The optimal choice that minimises the variance of the incremental importance weights is, from equation (2.12),
qθopt(xn|xn−1, yn) = pθ(xn|xn−1, yn).
This results in the optimal incremental weights to be woptn|n−1(x1:n) = pθ(yn|xn−1),
which is independent from the value of xn. First works where qoptθ was used include
Kong et al. [1994]; Liu [1996]; Liu and Chen [1995].
• The auxiliary particle filter for optimal filtering [Pitt and Shephard, 1999] is imple- mented by sampling X1:n−1 among the set of the particle paths up to time n− 1
and a new Xn fromX in order to target
¯
πθ,n(dx1:n) =
" N X
i=1
Wn−1(i) wn|n−1opt (X1:n−1(i) )δX(i)
1:n−1(dx1:n−1)
#
pθ(xn|xn−1, yn)λ(dxn).
sample from, then all the particles at time n will have equal weights. If this is not the case, the proposal distribution to sample from this target distribution can be written generally as ¯ qθ,n(dx1:n) = " N X i=1 α(i)n−1δX(i) 1:n−1(dx1:n−1) # qθ(xn|xn−1, yn)λ(dxn)
where αn−1(xn−1) and qθ(xn|xn−1, yn) is up to choice and should be close as possible
to the ideal choice. One attempt to make α(i)n−1 close to Wn−1(i) pθ(yn|Xn−1(i) ) (up to
normalising), which was suggested in the original work Pitt and Shephard [1999] on the auxiliary particle filter, is to take α(i)n−1 = gθ(yn|x∗(i)n ), where x∗(i)n is a prediction
of Xn given Xn−1(i) based on the dynamics of the process, e.g. x∗n= Eθ[Xn|Xn−1(i) ].
• Although the particle filter we presented in Algorithm 3.1 targets the path filtering distributions πθ,n(dx1:n) = pθ(x1:n|y1:n)λ(dx1:n); it can easily be modified, or used
directly, to make inference on other distributions that might be of interest. For example, consider the one step path prediction distribution
πpθ,n(dx1:n) = pθ(x1:n|y1:n−1)λ(dx1:n).
There is the following relation between πθ,n and πpθ,n.
πθ,np (dx1:n) = πθ,n−1(dx1:n−1)fθ(xn|xn−1)λ(dxn), dπθ,n dπθ,np (x1:n) = gθ(yn|xn) πθ,np (gθ(yn|·)) .
Therefore, it is easy to derive approximations to these distributions from each other: obtaining πθ,np,N from πN
n−1 requires a simple extension of the path X1:n−1 to X1:n
through fθ; this is done by sampling Xn(i)conditioned on the existing particles paths
X1:n−1(i) , respectively for i = 1, . . . , N. Whereas; obtaining πN
θ,n from π p,N
θ,n requires
a simple reweighting of the measure (or the approximate measure) according to gθ(yn|·). As a second example, the approximations to the marginal distributions
πN
n(dxk), k ≤ n (or πnp,N(dxk)) are simply obtained from the k’th components of
the particles, e.g.
πNn(dx1:n) = N X i=1 Wn(i)δX(i) 1:n(dx1:n)⇒ π N n(dxk) = N X i=1 Wn(i)δX(i) k (dxk).
Note that the optimal filtering problem corresponds to the case k = n. Therefore, it may be sufficient to have a good approximation for the marginal posterior distri- bution of the current state Xn rather than the whole path X1:n. This justifies the
racy for states Xk with k ≪ n for a good approximation for the marginal posterior
distribution of Xn.