1.2. Bases teóricas
1.2.1. Comunicación Interna
1.2.2.12. Comunicación y cultura en la universidad
The general case which applies to the emulation of both model response and first derivatives, with or without first derivative information in the training data, is presented in Section 5.2.2. This methodology can be extended if second or third derivatives are available, or if we wanted to emulate higher derivatives. The key concept is that derivatives of Gaussian processes remain Gaussian processes
with mean and covariance functions given by the relevant derivatives of the orig-inal functions of the Gaussian processes. Consider our prior distribution for the first derivatives of η(·), (5.25) with d 6= 0. If we wanted to emulate the second derivatives of η(·) then (5.25) becomes
∂2 Clearly this requires a 4 times differentiable covariance function though and a twice differentiable mean function. In general higher derivatives of η(·) can be modelled by Gaussian processes with mean:
E ∂u
∂xuη(x)
= ∂u
∂xuh(x)Tβ and covariance between two higher derivatives given by:
Cov ∂u times differentiable. The posterior process is then derived similarly to Section 5.2.2.
The methodology can also be adapted to emulate integrals of functions, if this were required by a model user. This is akin to the special case of emulating the model response when we only have derivatives in the training data; such an emulator could be built using the setup and methods described in Section 5.2.2. Suppose now we are interested in the integral I = R
χη(x)dx where χ represents the input space. O’Hagan (1991) uses a Gaussian process to make inferences about I in Bayes-Hermite quadrature. The Gaussian process emulator methodology applied to I with just model response in the training data results in:
Similarly this could be extended to include derivatives in the training data and Oakley (1999) briefly looks at the effect of observing derivatives in quadrature.
5.5 Conclusions
Adjoint models are becoming increasingly common and valuable but for complex models they remain difficult to write and expensive in computing resource and time to run. In this chapter we have adapted existing methodology to emulate model derivatives and demonstrated the methodology in 3 examples.
The one dimensional toy example, in Section 5.3.1, shows the potential the Gaussian process emulator approach has: with derivative information in our em-ulator we can emulate the model derivatives very well. In addition to this, given just a few extra simulator runs we can accurately emulate the model derivatives with function output alone. The design problem for building an emulator of derivatives is briefly investigated with the one dimensional toy model and we find that an optimal space-filling design in some cases, produces invalid emulators.
This is because all the points are exactly the same distance apart, making it very difficult to estimate the smoothness parameter in the emulator. This may not be of serious concern with real complex models though, as in large dimensional space we are very unlikely to get multiple points with exactly the same distances between them. If we do have enough training data, in higher dimensional input space, such that this happens we would likely encounter numerical problems when building the emulator. Even if the emulator is built without any such hindrance though, the problem of the resulting poor performance of the emulator can likely be resolved by removing a small number of the observations, as we saw in Figure 5.6a. The effect on the emulation of derivatives when we have equidistance points does, however, motivate the need to investigate designs with varying distances between points to better estimate smoothness parameters. In support of this, we see the uncertainty reducing between design points which also implies perhaps space-filling isn’t necessarily optimal when emulating derivatives.
The second example, in Section 5.3.2, shows that in an 8-dimensional model we can emulate very well the derivatives with respect to 5 of the inputs. The per-formance of the emulator is not as good with respect to inputs 2, 3 and 5 but the emulator is appropriately confident about these predictions. In a one-dimensional setting, individual emulators for each of these inputs perform well, although with a real complex model such an experiment would unlikely be possible. The model response, however, is much less sensitive to inputs 2, 3 and 5 which possibly makes it less important that these derivatives are accurately emulated. The derivatives
w.r.t these inputs are relatively very small in magnitude and thus the information they provide is unlikely to have a large effect on any further analysis.
The final example, the emulation of the partial derivatives of C-GOLDSTEIN, has had mixed success. Firstly, perhaps it should be noted that the function of the derivatives, as shown in Figure 5.14b, is not a function we would confidently claim to be able to accurately emulate. We require that the simulator is a smooth function of its inputs and therefore also that the partial derivatives are a smooth function of the inputs. This is a key requirement as it accounts for the efficiency of an emulator. As discussed in Chapter 3, if a model is smooth then knowledge of the output at xitells us something about the output at xj for xiclose to xj. Monte Carlo methods do not make use of this extra information and as such the emulator approach is more efficient. If there are areas of the input space where a model does not respond smoothly to changes in the inputs, i.e the model is non-stationary, standard emulation is unlikely to be appropriate. There are options that could be investigated in this situation though. For example, Gramacy and Lee (2008) split the output into regions which have a similar level of smoothness. Other options include ‘warping’ the input space such that we have a stationary function on this transformed space, as demonstrated in Section 5.3.2, or another option is to select a non-stationary covariance function. In our C-GOLDSTEIN example, we split the output into regions, and emulate the derivatives of the ‘rough’ patch separately. With this method and given enough training data, we can emulate the derivatives of this model quite well. This is relatively straightforward in one-dimension as even with limited observations of the function, it is clear which areas we need to perform further runs, to ‘zoom in’. In higher dimensional space though, identifying the location and cause of such ‘rough’ regions is much harder.
Even if this can be done, as in the example illustrated in Section 5.3.3, many simulator runs will be required, and it is unlikely that emulation will be an efficient alternative to an adjoint model. Of course, an adjoint to the required model might not exist, so the question becomes about whether emulation is more efficient than the many runs a finite differences experiment would entail.
The difference between FD and adjoint estimated derivatives, as shown in Figure 5.17b, is in contrast with the validation results in Chapter 2, where good agreement is apparent. It is clear, however, from Figure 5.18, that the adjoint derivatives are accurate here and it is likely that if FD runs are performed with a more appropriate value of , the resulting derivatives would be in closer agreement
to the adjoint. This highlights the importance and difficulty in selecting a suitable value of when undertaking FD experiments.
We’ve just described how the derivatives from the adjoint model, at and around the points of discontinuity, appear to be accurate and if the purpose is to investigate whether the model ‘misbehaves’ at any inputs points, then relatively large derivative values are very informative of this. This is assuming of course that the model is expected to be smooth. It could be argued though that the adjoint-generated derivatives are actually misleading for other types of analysis.
Suppose, for example, we wish to run some optimisation algorithm which makes use of gradient information. Inputing partial derivatives which, while accurate at a precise level are unrepresentative of the trend, seen for example at point 8 in Figure 5.18b, will cause problems for the optimisation algorithm. For example, the algorithm will be less efficient as it will necessarily spend time searching in the wrong part of the input space. In addition to this there is a greater chance that a local maximum or minimum is returned, rather than the global optimum which we are interested in. In this situation, emulated derivatives could actually provide more meaningful estimates of the gradient which would in turn assist the optimisation algorithm.
In summary, we have suggested an alternative approach to the efficient evalu-ation of model derivatives. While this is encouraging, validevalu-ation derivatives will still be required and if an adjoint model does not exist or is too expensive to execute, then other techniques to produce validation derivatives will have to be employed. Derivatives generated by the finite differences approach could be an option, though this requires multiple simulator runs and numerical and approx-imation errors can cause inaccurate derivatives. This complicates the validation process as we need to be confident that validation derivatives are accurate enough to be used as a diagnostic: it is possible that conflict between emulated derivatives and those generated by finite differences may be in part due to an inappropriate choice of .