• No se han encontrado resultados

The term perfect Monte Carlo refers to those methods in which the distribution of interest π is approximated by N > 0 of independent, identically distributed (i.i.d.) samples from the distribution π and integration of ϕ with respect to π is approximated by using this approximation. The approximation to π using N i.i.d. samples X(1), . . . , X(N ) is given by

πN M C(dx) := 1 N N X i=1 δX(i)(dx).

Then, the perfect Monte Carlo approximation to π(ϕ) is obtained by substituting π with πN M C in (2.1) as πM CN (ϕ) = 1 N N X i=1 ϕ(X(i)).

It is this approach which was originally referred to as the Monte Carlo method in Metropo- lis and Ulam [1949], although the term has come to encompass a broader class of methods through the following years.

It is easy to show that πN

M C(ϕ) is an unbiased estimator of π(ϕ) for any N > 0. Also,

if π(ϕ) is finite, the strong law of large numbers (e.g. Shiryaev [1995], p. 391) ensures almost sure (a.s.) convergence of πN

M C(ϕ) to π(ϕ) as the number of i.i.d. samples tends

to infinity,

πM CN (ϕ)a.s.→ π(ϕ). The variance of πN(ϕ) is given by

varπNM C(ϕ) = 1 N2 N X i=1 varπ  ϕ(X(i))= 1 Nvarπ[ϕ(X)] .

which indicates the improvement in the accuracy with increasing N, provided that varπ[ϕ(X)] is finite. Note that this is true regardless of the dimension of X ; which

makes Monte Carlo preferable over the deterministic numerical methods particularly for high dimensional integrations [Newman and Barkema, 1999]. Also, if varπ[ϕ(X)] is fi-

the central limit theorem (e.g. Shiryaev [1995], p. 335) √ NπN M C(ϕ)− π(ϕ)  d → N (0, varπ[ϕ(X)]) .

The requirement of perfect Monte Carlo is the ability to obtain i.i.d. samples from π. There are several methods for obtaining i.i.d. samples from distributions. We shall cover the two most common ones in the following.

2.2.1

Inversion sampling

If π is a distribution on R, then its cumulative distribution function can be defined as Fπ : R→ [0, 1], Fπ(x) = π((−∞, x]).

If it is possible to invert Fπ, then it is possible to sample from π by transforming a uniform

sample U distributed over (0, 1) as

X = Fπ−1(U) := inf{x ∈ X : Fπ(x)≥ U}.

This approach was considered by Ulam prior to 1947 [Eckhardt, 1987] and some extensions to the method are provided by Robert and Casella [2004].

2.2.2

Rejection sampling

Another common method of obtaining i.i.d. samples from π is rejection sampling, which is available when there exists an instrumental distribution µ such that π≪ µ with bounded Radon-Nikod´ym derivative dπ

dµ. Rejection sampling was first mentioned in a 1947 letter by

Von Neumann [Eckhardt, 1987], it was also presented a few years later in von Neumann [1951]. The method for obtaining one sample from π can be implemented with any M ≥ supx(x) by (i) generating X from µ, (ii) accepting it with probability M1 dπ(X), and otherwise repeating steps (i) and (ii) until acceptance. Letting A ={U ≤ 1

M dπ dµ(X)}

be the event of acceptance in a single trial, its probability is given by

P (A) = Eµ  1 M dπ dµ(X)  = 1 Mµ  dπ dµ  = 1 M, (2.2)

which is also the long term proportion of the number accepted samples over the number of trials. Therefore, taking µ as close to π as possible to avoid large Radon-Nikod´ym deriva- tives and taking M = supx(x) are sensible choices to make the acceptance probability P (A) as high as possible.

Algorithm 2.1. Rejection sampling: Choose M ≥ supx

(x). To generate a single

sample,

1. Generate X ∼ µ and U ∼ Unif(0, 1).

2. If U ≤ 1 M

(X), accept X; else go to 1.

The rejection sampling algorithm is given in Algorithm 2.1. The validity of this algorithm can be verified by considering the distribution of the accepted samples. Using Bayes’ theorem, P (X ∈ dx |A) = µ(dx)P (A|x) P (A) = µ(dx) 1 M dπ dµ(x)/ 1 M = π(dx). (2.3) One advantage of rejection sampling is that we can implement it even when we know π and µ only up to some proportionality constants Zπ and Zµ, that is, when π = Zπ, µ = Zµ

and we only know bπ and bµ. It is easy to check that one can perform the steps (i) and (ii) of rejection sampling method for any M ≥ supx dbdbπµ(x) using db

π

dbµ instead of dπ dµ, and

justification of this modification would follow from similar steps to those in (2.3). Also, in that case, the acceptance probability would be M1 Zπ

Zµ. Finally, when π and µ have

densities (denoted as π and µ also) with respect to a common dominating measure, then the Radon-Nikod´ym derivative dπ(x) becomes equal to π(x)µ(x).

The drawback of rejection sampling is that in practice a rejection based procedure is usually not viable when X is high-dimensional, since P (A) gets smaller and more com- putation is required to evaluate acceptance probabilities as the dimension increases. In the literature there exist approaches to improve the computational efficiency of rejection sampling. For example, assuming the densities exist, when it is difficult to compute π(x), tests like u ≤ 1

M π(x)

µ(x) can be slow to evaluate. In this case, one may use a squeezing

function s : X → [0, ∞) such that µ(x)s(x) is cheap to evaluate and s(x)

π(x) is tightly bounded

from above by 1. For such an s, not only u ≤ 1 M s(x) µ(x) would guarantee u≤ 1 M π(x) µ(x), hence

acceptance, but also if u 1 M π(x) µ(x) then u ≤ 1 M s(x)

µ(x) would hold with a high probability.

Therefore, in case of acceptance evaluation of π(x)µ(x) would largely be avoided by checking u 1

M s(x)

µ(x) first. In Marsaglia [1977], the author proposed to squeeze π from above and

below by µ and s respectively, where µ is easy to sample from and s is easy to evaluate. There are also adaptive methods to squeeze π from both below and above; they involve an adaptive scheme to gradually modify µ and s from the samples that have already been obtained [Gilks, 1992; Gilks et al., 1995; Gilks and Wild, 1992].

Outline

Documento similar