The next logical step after the generation of an accurate statistical surrogate for the SR model is to look at space-time variations of the tsunami waves, instead of looking at the wave motion over time at fixed locations. This step requires computationally intensive analyses and strong statistical expertise. The consideration of the water waves behaviour in space additionally to the behaviour in time might improve the predictions of the statistical emulator that considers only time variations and is restricted to a single locations in space. This possible improvement can be attributed to the fact that a spatial-temporal emulator can borrow strength across both space and time. A huge number of simulator outputs have to be obtained in order to have a sufficient knowledge of the waves’ behaviour both in time and space and they have to be used for the emulator’s generation. For this reason, Outer Produce Emulators are the most appropriate, compared to standard emulators, since they can handle efficiently both the large number of
Chapter 3. Statistical Emulation of a landslide-generated tsunami model input parameter and also the extremely large number of evaluations.
The input parameter space is the same with the fixed-location application; similarly the prior choices for the input regressors and residuals covariance functions are the same. However, the output regressors and residuals covariance functions need to be modified. These functions are required to describe with the best way possible the behaviour of the output along the space coordinates. This again highlights the need of a dense enough computational mesh in order to have a large enough number of evaluations, as it would be impossible to find representative functions in the case of insufficient resolution.
3.4.1 Separable and Non-separable covariance functions for time and space
In order to make the analysis simpler and computational feasible, the residual covariance func- tion for the outputs can be assumed to be separable in time and space. This means that it can be written as the product of a purely spatial and a purely temporal covariance functions. Sep- arable covariance functions indicates that the space and time variations are independent. This assumption has the significant convenient advantage that the covariance matrix can be expressed as the Kronecker product of two smaller matrices that are purely temporal and purely spatial. Therefore, the determinant and the inverse of this matrix can be easily obtained. Nevertheless, this assumption is not realistic, as it imposes a strong constraint by ignoring the interactions between time and space. Physically there is a clear association between the spatial and the temporal dimensions and a realistic statistical representation should take this aspect into ac- count. Non-separable covariance functions model space-time interactions and in many cases non-separable models are physically more realistic.
Genton [2007] developed techniques for finding approximate separable covariance matri- ces for cases where the space-time covariance matrices are non-separable. In this case, two matrices, one purely temporal and one purely spatial, that give the ‘best’ separable (Kronecker product) approximation have to be determined. The solution to this problem is to describe the non-separable matrix by the nearest Kronecker product (NKP) approximation. The NKP for space-time covariance matrix (NKPST) is described as follows: LetΣ be the space-time co- variance matrix of a stochastic process Z(s, t), where s represents the spatial location and t the time. The NKPST problem is to find two matrices, S and T that minimize the Frobenious norm ||Σ − S ⊗ T ||F , where for a matrix A= (aij), ||A||F = (PiPjσ2ij)
1 2.
The use of separable covariance functions for time and space is supported by Rougier [2011], where he highlights some significant advantages of using separable covariance func- tions, such as the fact that the variance matrix has a Kronecker product form which makes the inversion, that is required in the emulation process, much more easier. Rougier [2011] con-
cluded that the use of separable covariance functions for the prior emulator is safe. This is because by updating the prior emulator using computer model evaluations, the separability of the covariance function is erased.
On the other hand, Fricker et al. [2013] contradicts the opinion of Rougier [2011] that using separable covariance structures for the outputs is sufficient and results in accurate enough predictions. The authors state that wrong representations of the joint uncertainty can result in considering the simulator outputs as independent. At their analysis, they generate non-separable covariance structures for GP emulators and they compare the performance of the emulator re- sulting with this non-separable covariance with the performance of one that uses separable covariance structures and also with emulators that assume independence between the outputs. The conclusion of Fricker et al. [2013] is that only the non-separable emulators are able to give accurate predictions and represent appropriately the joint uncertainty about the model outputs.
To investigate this further, an analysis examining whether or not the use of a separable covariance structure for time and space actually improves the emulator’s predictions was per- formed. For this analysis, the predictions of three different emulators were compared (results not shown here). The difference between the three emulators was only the selection of the out- put covariance function, with all the other functions and parameters to be exactly the same. The statistical emulation is applied to data generated from a non-separable covariance function and also to the Irish wind speed data set [Haslett and Raftery, 1989]. This data set is extensively used for covariance investigations due to its non-separable structure in time and space.
For the first emulator, a non-separable output covariance function, proposed by Gneiting [2002], was used. For the second emulator, the technique of Genton [2007] was employed for approximating the non-separable covariance function of the first emulator with a separable one. Finally, for the last emulator, a separable covariance function for time and space was used. The conclusion from this investigation was that a non-separable output covariance function does not improve the emulator’s predictions. On the contrary, the best predictions for both the data sets resulted from the separable output covariance emulator. Similar conclusions were obtained by Bowman and Woods [2012].
3.4.2 Subemulators
Spiller et al. [2014] highlighted the fact that in some cases, the use of a single emulator for predicting variations in space as well as variations in time may be unnecessary challenging and damage the analysis. Some of the reasons for this is the complicated spatial correlations from topography and also the fact that in the case of tsunami events, some waves do not reach most of the locations and hence the simulated output is zero for many input scenarios. Another
Chapter 3. Statistical Emulation of a landslide-generated tsunami model
reason is the need for large matrices inversions which is the computational problem of using GP emulators. Therefore, the authors support that the generation of a single emulator at each location is more than enough, since there are not going to be any significant advantages by generating a single emulator that takes into account spatial-temporal correlations. Spiller et al. [2014] present an approach of building one emulator for each location using automating meth- ods. Specifically, one location is considered at a time and an emulator is fitted specifically to the model output at the particular locations. The authors called these emulators “subemulators”. This process is automated and it is repeated in parallel for all the locations. Consequently, it can be concluded that the generation of a spatial-temporal emulator is not necessary in case that an efficient automated process of simultaneously building emulators at each of the locations of interest is introduced.