Sequential Monte Carlo samplers [Del Moral et al., 2006] cover a very large class of SMC
methods. Assume that we have a sequence of somehow related distributions π1, . . . , πp
where each πn is defined on an arbitrary measurable space (Xn,En). There are many po-
tential choices for π1, . . . , πp leading to various integration and optimisation algorithms;
examples can be found in Chopin [2002] for static parameter estimation, Gelman and Meng [1998] and Neal [2001] for targeting a distribution through a sequence of intermedi- ate distributions, Del Moral et al. [2006] for global optimisation, Johansen et al. [2005] and Del Moral et al. [2006] for rare event simulation and density estimation, and Del Moral et al. [2012] for approximate Bayesian computation. The problem of approximating these distributions sequentially using Monte Carlo is beyond the extend of the classical SIS or SISR methods, since these require the distributions to be defined on increasing spaces.
The first approach that comes to mind is to treat each πn individually and perform
importance sampling for each of them independently. Obviously, this approach has the difficulties of importance sampling: unless the distribution of interest is a standard low- dimensional one, importance sampling is almost never used when there are alternatives. The main reason for that is the difficulty of designing an good proposal. One reasonable way is to do importance sampling for πn individually, but this time by designing the
importance distributions sequentially using an initial distribution η1 and a sequence of
transition kernels {Kn:Xn−1 → P(En)}n≥1. The idea here is that if the distributions πn
using Kn to slowly move the samples obtained to approximate πn−1. Let us assume that
we begin with sampling X1(1), . . . , X1(N ) from η1 to approximate π1. At times n ≥ 2, we
sample Xn(i) from Kn(Xn−1(i) ,·). The importance weight of X (i)
n is given by
wn(i)= dπn dηn
(Xn(i)), ηn(dxn) = ηn−1Kn(dxn).
The choice of Kn’s are optional except the requirement that πn ≪ ηn−1Kn; however it is
crucial for the the performance of this method. In the literature, several different types of moves are used, such as independent proposals [West, 1993], local random moves [Givens and Raftery, 1996], MCMC and Gibbs moves [Del Moral et al., 2006], etc.
This sequential implementation of importance sampling approach is attractive and optimal in some sense (we will see soon in what sense), however it has a quite restrictive limitation: in most cases it is impossible to calculate the importance distribution ηn.
SMC samplers come into role at this point, circumventing the need for calculation of ηn. The main idea of the method is to construct the synthetic distributions eπn on the
extended spaces (X1× . . . × Xn,E1⊗ . . . ⊗ En) as eπn(dx1:n) = πn(dxn) n−1 Y i=1 Li(xi+1, dxi) (2.16)
where each Ln : Xn+1 → P(Xn) is a backward Markov kernel. Since eπn admits πn
marginally by construction, importance sampling on eπn using the following proposal
distribution e ηn(dx1:n) = η1(dx1) n Y i=2 Ki(xi−1, dxi).
can provide an approximation for πn as well. Although, freedom to choose Kn’s and Ln’s
contribute to the method’s generality, the performance of the method crucially depends on the their choice. In fact, the central limit theorem presented in Del Moral et al. [2006] demonstrates that the variance of the estimator is strongly dependent upon the choice of these kernels. The importance weight for this method is given by
wn(x1:n) = de
πn
deηn
(x1:n).
It was shown in Del Moral et al. [2006] that given Kn, the optimum backward kernel
Loptn−1 which minimises the variance of the importance weights satisfies the relation ηn⊗ Loptn−1= ηn−1⊗ Kn.
It can be shown that the importance weights for the optimum backward kernel is wopt n (x1:n) = dπn dηn (xn).
This result reveals that the optimum backward kernel takes us back to the case where one performs importance sampling on the marginal space instead of the extended one. However, most of the time ηn cannot be calculated, hence other sub-optimal backward
kernels must be used. It was shown in Del Moral et al. [2006] that when Loptn−1 is not used, the variance of wn(x1:n) can not be stabilised. For that reason, resampling of the samples
that are used for approximating πn−1 is necessary before moving to the approximation of
πn. Actually, this can be done thanks to the possibility of constructing eπn such that the
importance weights can be expressed as a product of incremental weights. Assume that πn ≪ Kn and Ln ≪ πn for all n. Then it can be shown that for a bounded measurable
function ϕn onX1× . . . × Xn we have eπn(ϕn) = eηn(ϕnwn) where the importance weights
wn are given by wn(x1:n) = dπ1 dη1 (x1) n Y i=2 dLi−1(xi,·) dπi−1 (xi−1) dπi dKi(xi−1,·) (xi). (2.17)
Equation (2.17) admits a recursion in n as
wn(x1:n) = wn|n−1(xn−1, xn)wn(x1:n−1)
where the incremental weight wn|n−1(xn−1, xn) is given by
wn|n−1(xn−1, xn) = dπn dKn(xn−1,·) (xn) dLn−1(xn,·) dπn−1 (xn−1). (2.18)
Note that the recursive form of the weights enables us to implement an SMC method for the synthetic distributions eπn. Actually, when (2.17) exists, the SMC sampler for
π1, . . . , πp is the SISR algorithm targeting eπ1, . . . , eπp using the initial and transitional
proposal distributions η1 and Kn, n = 2, . . . , p respectively, and its incremental weights
are given in (2.18).
Note that, in practice even if Ln is not absolutely continuous with respect to πn,
we can still obtain importance weights factorized into incremental weights by taking the restrictions of Ln’s to the supports of πn’s. Note, also, that as in importance sampling,
SIS, and SISR, even if we know eπn’s and eηn’s only up to some normalising constants we
can still perform the SMC samplers algorithm to approximate the integrals πn and to
estimate the unknown normalising constants as well.
ple, the annealed importance sampling method, which corresponds to the SMC sampler without resampling where Ln−1 satisfies
πn−1Kn⊗ Ln−1 = πn−1⊗ Kn (2.19)
and Kn is such that πn−1 is Kn-invariant, is proposed by Neal [2001] for sequences of
slightly varying distributions. To deal with the variance problem for general cases, the equivalent choice of kernels are used in (among others) Chopin [2002] and Gilks and Berzuini [2001] with resample-move strategies, which actually corresponds to the SMC sampler algorithm with resampling. Population Monte Carlo, presented by Capp´e et al. [2004] and Celeux et al. [2006] with an extension, is another special case of SMC samplers where the authors consider the homogeneous case where πn= π and Ln(x, dx′) = π(dx′)
and Kn(x, dx′) = Kn(dx′). Finally Liang [2002] presents a related algorithm where πn= π
and Kn(x, x′) = Ln(x, dx′) = K(x, dx′).