, (5.16)
as in the building of a standard emulator as given in Chapter 3.
Now to find the distribution of the derivatives of the simulator, conditional only on the function output in the training data the next step is to take the product of (5.11) and (5.15) and then integrate out β. This results in:
˜
Finally integrating out σ2 leaves us with, conditional on Θ, a t process with n − q degrees of freedom. The posterior mean is ˜m∗∗(x, d) and can be used as a fast approximation to the derivative of η(x) with respect to input d. The posterior covariance between the derivative of η(xi) with respect to input d and the derivative of η(xj) with respect to input d is bσ2˜c∗∗(x, d) where
The covariance function includes a matrix of roughness parameters, Θ. As when emulating response, we cannot analytically integrate out Θ from the pos-terior distribution. In this chapter, as throughout this thesis, we choose to fix θ, having estimated its value from the training data.
5.2.2 The general case: emulating model response and derivatives
If we can afford to run an adjoint model, or derivative information has already been generated perhaps as a result of some sensitivity analysis, we can include
that information when building an emulator of derivatives. This is a similar framework to that of Chapter 4, the difference is that in this chapter our goal is to emulate derivatives, not model response. Due to similarities in the setup with Chapter 4 and with parallel objectives to Section 5.2.1, in this section we present the methodology for where we have derivatives and model response with which to emulate model derivatives within the general case. So far, in this thesis we have separated emulating model response and derivatives, with and without derivative information in the training data, and in this section we bring these situations together and present the general case for emulating model response and derivatives, with or without derivatives in the training data. As this section is effectively a summary of methods described earlier in this thesis some repetition is inevitable.
We begin by describing the uncertainty about the simulator output by a Gaus-sian process. Assuming η(·) is differentiable everywhere, we can proceed by mod-elling the derivatives of η(·) also by a Gaussian process:
We specify the prior mean for the function output as
E [η(x)| β] = h(x)Tβ, (5.20)
and the prior mean for the derivative is:
E We bring these functions together in ˜h(x, d), which is defined as:
h(x, d)˜ Tβ = h(x)Tβ for d = 0
∂
∂x(d)h(x)Tβ for d 6= 0 ,
where d ∈ {0, 1, . . . , p}. We have the location in the input space represented by x and the value of d determines whether or not we are interested in the derivative at that point. In this way (xi, d = 0), for example, would refer to the model response at point i in the input space while (xj, d = 1) refers to the derivative w.r.t input 1 at point j in the input space. The vector h(x)T, of length 1 × q, comprises known, differentiable functions of x; and β is a q × 1 vector of unknown coefficients. We choose the form of h(.) based on our prior beliefs about η(·).
The covariances between η(xi) and η(xj) are defined, for some twice differen-tiable covariance function as:
Cov [η(xi), η(xj)] = σ2c(xi, xj), (5.22)
and covariances between derivatives are:
As we are interested in both the model response and derivatives we require the cor-relations between points, between derivatives and points and also between deriva-tives themselves. These correlations are all incorporated in ˜c{(xi, di), (xj, dj)}:
A common form of correlation function is the infinitely differentiable Gaussian form c(xi, xj) = exp{−(xi − xj)TΘ(xi − xj)}, where Θ is a diagonal matrix of positive smoothness parameters, θ{k} with k ∈ {1, . . . , p} and p is the number of inputs. While the Gaussian form is a popular choice of correlation function it may not be suitable for every simulator and we discuss alternative correlation functions in Chapter 4.
We assume that our prior information about β and σ2 will be weak, and so for the prior distribution use
p(β, σ2) ∝ 1
σ2. (5.24)
In summary our prior beliefs take the form:
˜
η(x, d)|β, σ2, Θ ∼ GP (˜h(x, d)Tβ, σ2˜c{(x, d), (x, d)}), (5.25) where d ∈ {0, 1, . . . , p}.
The next stage is to create a design which consists of a set of points in the input space at which the simulator or adjoint is to be run to create the training data. We are not restricted to a design which has either model response at every point or all first derivatives at a point. Expert knowledge may help to identify at which points in the design space model response would be beneficial and at which point the derivatives w.r.t to various inputs are most informative.
Having specified the location of the design points and determined at which points we require function output and at which points we require first derivatives, we arrange this information in ˜D = {(xk, dk)}. We have xk which refers to the
location in the design and dk determines whether at point xk we require function output or a first derivative w.r.t one of the inputs. The simulator, η(·), or the adjoint of the simulator, ˜η(·), (depending on the value of each d), is then run at each of the input configurations. This results in our training data:
˜
y = {˜η(x1, d1), ˜η(x2, d2), . . . , ˜η(xn˜, d˜n)}
= η( ˜˜ D), (5.26)
a vector of length ˜n.
We begin the process of deriving the posterior process by writing the distri-bution of the training data, ˜y, conditional on the parameters β and σ2. The training data can consist of derivatives and model response and so from (5.25) and (5.26) we get:
˜
y| β, σ2, Θ ∼ N ( ˜Hβ, σ2A),˜ (5.27) where ˜H = [˜h(x1, d1), . . . , ˜h(xn˜, d˜n)]T and ˜A is the ˜n × ˜n matrix of correlations between points, between points and derivatives and between derivatives them-selves, in the training data: ˜A = ˜c( ˜D, ˜D). Now we wish to update (5.25), the distribution of ˜η(), and we partition in the following way:
η(x, d)˜ where ˜t(x, d)T consists of the correlations of the derivative we are emulating and the training data:
˜t(x, d)T = Corr [˜η(x, d), ˜η(x1, d1)] , . . . , Corr [˜η(x, d), ˜η(xn, dn)]
= ˜c{(x, d), ˜D}. (5.28)
We can now use standard techniques of conditioning multivariate normal distri-butions to give
and d could take any value in {0, 1, . . . , p}. This, (5.29), is the joint distribution of η(·) and its derivatives, conditional on the parameters, β, σ2 and Θ.
The next part of building our general emulator is the same as building a standard emulator, as given in Chapter 3 and repeated here. We apply Bayes Theorem with (5.24) and (5.27) and this results in a joint Normal Inverse Gamma posterior distribution for (β, σ2):
f (β, σ2|˜y, Θ) ∝ σ2
and integrating (5.30) with respect to β gives us:
σ2| ˜y, θ ∼ InvGam ˜n − q
2 ,(˜n − q − 2)bσ2 2
, (5.34)
as in the building of a standard emulator.
Now to find the joint distribution of η(·) and the derivatives of η(·), conditional only on the training data, the next step is to take the product of (5.29) and (5.33) and then integrate out β. This results in:
˜
Finally integrating out σ2 leaves us with, conditional on Θ, a t process with
˜
n − q degrees of freedom. The posterior mean is ˜m∗∗(x, d) and can be used as a fast approximation to the derivative of η(x) with respect to input d if d 6= 0, and as a fast approximation to η(x) if d = 0. The posterior covariance between the derivatives of η(xi) and η(xj) with respect to input d, or between the model response η(xi), and η(xj), depending on the value of d, is bσ2c˜∗∗(x, d) where
The covariance function includes a matrix of roughness parameters, Θ. As when emulating response, we cannot analytically integrate out Θ from the pos-terior distribution. In this chapter, as throughout this thesis, we choose to fix θ, having estimated its value from the training data. Further discussion of methods to estimate Θ are given in Chapter 3.