• No se han encontrado resultados

3.1 Background

3.1.3 Gaussian processes

Parametric models, which characterise data using functions of finite collections of param- eters, are useful when an underlying mechanism for the process that generated the data is hypothesised or known. When the mechanism is unclear, non-parametric models, for which the number of parameters grows indefinitely with the amount of available data, can

1In traditional survival analysis texts, the hazard is denotedλ, rather thanh. In epidemiological literature, however, the symbolλ often corresponds to the force of infection, which is whyλ has been reserved for the force function here.

help to identify meaningful patterns in the data. Rasmussen and Williams (2006) [196], p.166, provide a helpful discussion of parametric and non-parametric models. A useful type of non-parametric model is the Gaussian process. A Gaussian process is a flexible random surface that can be regressed onto data. Gaussian processes are infinite-dimensional generalisations of multivariate normal distributions, and have many of the advantages of analytic tractability that accompany normal distributions.

Formally, a Gaussian process is defined as a collection of random variables, any finite number of which follow a multivariate normal distribution [196]. Following [196], a Gaussian process is fully specified by its mean functionm(x)and covariance functionk(x,x), and is denoted

f(x)∼GP(m(x),k(x,x)). (3.13)

The Gaussian process at any finite collection of pointsxxx= (x1,x2, . . . ,xn)follows the multi- variate normal distribution

MV N((m(x1),m(x2), . . . ,m(xn))T,

k(x1,x1) k(x1,x2) . . . k(x1,xn) k(x2,x1) k(x2,x2) . . . k(x2,xn)

... . .. ...

k(xn,x1) k(xn,x2) . . . k(xn,xn)

 ).

The covariance functionkmust be positive semidefinite and symmetric in its arguments, analogous to the constraints on the covariance matrix of a multivariate normal distribution [196]. Different choices of covariance function yield processes that differ in smoothness and how rapidly they vary with distance. The squared exponential (SE) covariance function is a simple yet versatile example of covariance function. It is defined as

kSE(d) =exp

−d2 2l2

(3.14) where d is the distance |x−x| between any two input points x and x, and l defines the characteristic length scale of the process, which is roughly the distance that must be travelled for the function value to change ‘significantly’. Gaussian processes with a SE covariance function are smooth, with mean-square derivatives of all orders [196]. This may be contrasted with the exponential covariance function, which has form

kE(d) =exp

−d l

. (3.15)

The exponential covariance function is jagged (see Fig 3.1), without guaranteed mean-square differentiability of any order.

The SE covariance function is a special case of both the rational quadratic (RQ) and the Matérn covariance functions. The RQ covariance function has form

kRQ(d) =

1+ d2 2αl2

−α

(3.16) which approaches the SE covariance function asα →∞. The RQ covariance function may be interpreted as an infinite sum of SE covariance functions with different length scales, and also has mean-squared differentiability of all orders, regardless ofα [196]. The Matérn covariance function has form

kMatérn(d) = 21−ν Γ(ν)

√ 2νd

l ν

Kν

√ 2νd

l

(3.17) withν andlboth positive andKνa modified Bessel function. The argumentνis a smoothness parameter, such that processes with the Matérn covariance function are ⌊ν⌋-times mean- square differentiable [196]. The SE covariance function is obtained when ν →∞. The exponential covariance function is obtained whenν =1/2. Figure 3.1 depicts draws from Gaussian processes with the squared exponential, exponential, and RQ covariance functions.

When developing a Gaussian process model, a covariance function is normally chosen a prioriby the modeller to match the anticipated ‘character’ of the underlying process to be described, such as its smoothness and rough rate of variation.

Gaussian process regression

In practice, one often wishes to fit a Gaussian process f(·)with covariance functionk(·,·) to n observed data points yyytaken at locationsxxx. For example, one might wish to model how temperature varies across a geographic region, given readingsyyyfrom a set ofnweather stations at coordinatesxxx. One way to approach this problem is to model the observations as a Gaussian process f(xxx)with additional noise:

yyy= f(xxx) +εεε (3.18) whereεεε is a vector of the measurement errors at each site. If the measurement errors are assumed to follow independent and identically distributed draws from a normal distribution with mean 0 and variance σn2, the joint distribution of the observations yyy and a set ofm

2 4 6 8 10x

-3 -2 -1 1 2 3f(x)

2 4 6 8 10x

-3 -2 -1 1 2 3f(x)

2 4 6 8 10x

-3 -2 -1 1 2 3f(x)

Fig. 3.1 Five draws each from Gaussian processes with the squared exponential (SE) covari- ance function (Eq 3.14; top left plot), exponential covariance function (Eq 3.15; top right plot), and rational quadratic (RQ) covariance function (Eq 3.16; bottom plot) withα = 12. The length scalel is fixed at 1 for all three processes, and the process meanm(x)(see Eq 3.13) is fixed at 0. The processes differ primarily in smoothness: the squared exponential and RQ covariance functions yield smooth processes with mean-square differentiability of all orders, while the exponential covariance function yields a jagged process without guaranteed mean-square differentiability of any order. Choice of covariance function is normally madea prioriby the modeller, depending on the anticipated qualitative behaviour of the underlying process to be described.

unobserved temperaturesfat locationsxxxis

"

yyy f

#

∼MV N(000,

"

K(X,X) +σn2I K(X,X) K(X,X) K(X,X)

#

) (3.19)

whereK(X,X)is then×ncovariance matrix obtained by evaluating the covariance function for each pair of observation pointsxxx,K(X,X)is then×mmatrix of covariances between the observation pointsxxxand the prediction pointsxxx,K(X,X)is them×mcovariance matrix of the prediction pointsxxxwith themselves, andK(X,X) =K(X,X)T.

The posterior distribution off, conditional on the observed data points yyy, is given in [196]:

f|X,yyy,X∼MV N(¯f,cov(f)) where (3.20)

¯f=K(X,X)[K(X,X) +σn2I]−1yyy, (3.21) cov(f) =K(X,X)−K(X,X)[K(X,X) +σn2I]−1K(X,X). (3.22) So, the posterior mean and variance of the process can be evaluated at any set of pointsxxx from just the covariance function and the observed data. This is the simplest case of Gaussian process regression.

Often, however, the training datayyyare not observed directly. Instead, some process that depends onyyyis observed, and the posterior processfmust be inferred indirectly. This may be accomplished using a link functionΦthat connects some set of direct observationsttt with the process f(·). In one common case, the link function defines the probability with which a particular eventtttis observed:

P(T =ttt) =Φ(Ω,yyy,ttt) where (3.23)

yyy= f(xxx) +εεε and (3.24) f(xxx)∼GP(0,k(x,x)). (3.25) Here,T is a random variable that describes the observation process, andΩis a finite list of additional model parameters. In this formulation,Φis a semiparametric model, which means that it is composed of both parametric (Ω) and nonparametric (yyy) elements. To estimateyyy andΩ, one can consider the likelihood function

L(yyy,Ω|ttt) =Φ(Ω,yyy,ttt). (3.26)

If it is sufficiently easy to evaluateΦ, posterior estimates of bothyyyandΩmay be obtained using a Metropolis Hastings algorithm. First, a prior distribution for the parameterΩmust be specified. The prior foryyyis given by the Gaussian process, Eq 3.13. Proposals foryyyand Ωare drawn from their respective prior distributions, and the likelihood (3.26) is evaluated.

The proposals are accepted or rejected with probability proportional to the likelihood ratio between the current and most recently accepted proposals, eventually yielding a good estimate of the posterior densities ofyyyandΩ.

In addition to the detailed background on Gaussian processes and Gaussian process regression provided by Rasmussen and Williams in [196], Gelfandet al.(2003) [85] introduce a theoretical framework for fitting generalised linear models with coefficients that vary spatially or temporally according to a Gaussian process. Building upon the related work of Banerjeeet al.(2003) and Banerjee and Gelfand (2006) [11], Goldsteinet al.(2015) [93]

identify local trends in the speed and direction of the spread of an invasion wave of the gypsy mothLymantria disparin the north-eastern United States.