Markus Ferber - ENMIENDAS ES Unida en la diversidad ES. Parlamento Europeo Proyecto de informe

In extreme events, a cluster consists of a number of observations that represent a partial series of exceedances of a given threshold (Coles, 2001), occurring in a short time span which may affect the same geographical region (Vitolo et al., 2009). There is a relation in time and space between these events. If a clustering of storm magnitudes existed, the number of smaller storms with low return periods (RPs) should be higher for years with an event with a high RP (Raschke, 2015).

Clusters of extreme events are characterised by the dispersion statistic or dispersion index (Equation 4.7). Mailier et al. (2006) and Vitolo et al. (2009) calculate the dispersion statistic 𝜓 as in Equation 4.7.

𝜓 = 𝕍𝑎𝑟(𝑌) 𝔼(𝑌)

Equation 4.7. Dispersion statistic.

The dispersion statistic 𝜓 quantifies the ratio of the variance 𝕍𝑎𝑟(𝑌) and the mean 𝔼(𝑌). In the general linear model, the assumption is that the variance remained constant (Figure 4.3a); however, in the Binomial and Poisson distributions, this is not necessarily the case. For the Binomial distribution (Figure 4.3b), the dispersion has a convex-shaped relationship; while in the Poisson distribution (Figure 4.3c), where the variance is equal to the mean, there is a 1:1 linear relationship. In fact, as the expected value increases, variability increases.

The dispersion statistic can show:

 Equidispersion is characterised by equality of mean and variance, ar(Y) = 𝔼(Y) = μ . Near-zero values of ψ are consistent with a purely random process with constant rate.

 Positive values of ψ indicate overdispersion in the distribution, 𝕍ar(Y) > 𝔼(Y), and are consistent with a process that is more clustered than a purely random process with a constant rate. When ψ > 1, the process is more clustered than random.

 Underdispersion is characterised by negative values of ψ, 𝕍ar(Y) < 𝔼(Y), and are consistent with a process that is more regular than a purely random process with a constant rate. When ψ < 1, the process is more regular than random.

Overdispersion is often found, whereas underdispersion is less common. In the case of a clustered pattern ψ > 0 (overdispersion), whereas for a regular pattern ψ < 0 (underdispersion).

The dispersion index is estimated by using the method of Cameron and Trivedi (1990) with the R package called AER (Kleiber and Zeiles, 2008). Once calculated the dispersion statistic, the next step is to descluster. The declustering method corresponds to a filtering of dependent observations to obtain a set of threshold excesses that are approximately independent (Coles, 2001). Desclustering works by:

1. Using an empirical rule to define clusters of exceedance; 2. Identifying the maximum excess within each cluster;

3. Assuming cluster maxima to be independent, with conditional excess distribution given the GPD;

4. Fitting the GPD to the cluster maxima.

For example in Figure 4.4, a time series of flood levels was constructed with events extracted using a 3 day separation criteria and a threshold that retained the highest 20% of events (Gouldby et al., 2014). The event set comprised a total of 1918 records, approximately 31 per year.

Given a threshold, the extremal index is estimated by using the method of Ferro and Segers (2003). The automatic declustering identifies independent clusters and estimates the GPD for cluster maxima with the R package called texmex (Southworth and Heffernan, 2015).

a) Normal b) Binomial c) Poisson

Figure 4.3. Theoretical expectations of the relationship between the variance and the mean in the Normal (a), Binomial (b) and Poisson (c) distributions. Axis labels also include the theoretical mean and variance, represented in terms of distribution parameters.

4.8 Chapter Summary

R software provided a range of functions embedded in different R packages such as ismev, texmex, or evmix to statistical model extreme values.

The reason for fitting a statistical model is to analysis the data and to simulate future events. There are two steps, firstly, the univariate modelling to fit GPD distribution to the upper tails of each of the marginal distributions of the data, and secondly, the multivariate modelling to calculate estimated dependences and then simulate the conditional fitted models.

As mentioned, the first step is to understand the currently available threshold selection methods: graphical methods (e.g. Coles, 2001); parametric methods (e.g. Rosbjerg et al., 1992; Grabemann and Weisse, 2008; McMillan et al., 2011; Arns et al., 2013); mixture models (e.g. Frigessi et al., 2002; Behrens et al., 2004; Mendes and Lopes, 2004; Tancredi et al., 2006; Carreau and Bengio, 2009) or other methods based on the Root Mean Square Error (RMSE) (Li et al., 2014).

While univariate modelling is done after selecting a suitable threshold, multivariate modelling is far more complex and requires more assumptions as complete time series. The conditional multivariate approach of Heffernan and Tawn (2004) is used to model the dependence between variables. However, the Heffernan and Tawn (2004) method considers only complete records and does not deal with missing values.

However, there is no established acceptable percentage of missing data within the dataset to work with. There is not one best approach to handle missing data. All missing data methods come with assumptions; however, the impact of the chosen missing data method on the result highly depends on the amount of missing data. One of the key steps in time series analysis is to try to identify and fill in any missing observations enabling comprehensive analysis and forecasting.

The missing data techniques are classified in two approaches: univariate and multivariate. There are different techniques in order to estimate missing values such as delete the case with missing observations, and impute, interpolate or correlate missing values. This can sometimes be achieved using simple methods such as calculating appropriate mean value. However, more complex methods may be needed and they may also require a deeper understanding of the time series data. Multivariate approaches can be implemented to handle missing values or combined with various mathematical approaches.

Finally, after selecting a threshold and handle missing values, the next step is to undertake the multivariate extreme modelling in order to predict regional extreme events. An application of the multivariate extreme model is to evaluate results with return level estimations and the climate change scenarios, and also analyses the storm surge clusters.

Chapter

5. Multivariate

Extreme

Storm

Surge

In document ENMIENDAS ES Unida en la diversidad ES. Parlamento Europeo Proyecto de informe Molly Scott Cato Finanzas sostenibles (2018/2007(INI)) (página 37-40)