The kriging is a Best Linear and Unbiased Estimator (BLUE) and the words of the acronym have the following meanings:
• Linear : linear estimator (the estimation is obtained as linear combina- tion of available measures).
• Unbiased : unbiased estimator (accuracy condition).
The accuracy condition is represented by the constraint between the co- efficientsPN
i=1λi = 1, that is obtained from the following equation:
Eh ˆZ(x0)− Z(x0)
i = 0
The efficiency condition is imposed minimizing the estimation error vari- ance:
min σ2E = min E
n ˆZ(x0)− Z(x0)o2
Analytic derivations can be found in the numerous textbooks about the topic (De Marsily, 1986; Kitanidis, 1997). In the next paragraphs are re- ported the linear equations’ systems that give the values of λi for the ordinary
kriging and for the kriging for uncertain data.
Ordinary kriging (OK)
Ordinary kriging (OK) is the most commonly used version of kriging. The mean component of the process is assumed to be spatially constant and is unknown. In the OK accuracy and efficiency conditions lead to the following N + 1 linear equations’ system with unknown variables λ1, λ2,· · · , λN, ν:
PN j=1λjγ(kxk− xjk) + ν = γ(kxk− x0k) k = 1,· · · , N PN i=1λi = 1 λj ≥ 0 (5.45)
that can be written in matrix form Ay = b where:
A = 0 γ(kx1− x2k) · · · γ(kx1− xNk) 1 γ(kx2− x1k) 0 · · · γ(kx2− xNk) 1 ... ... ... ... γ(kxN− x1k) γ(kxN− x2k) · · · 0 1 1 1 · · · 1 0 y= λ1 λ2 ... λN ν
5.2 Geostatistical analysis 73 b = γ(kx1− x0k) γ(kx2− x0k) ... γ(kxN− x0k) 1
The results of this linear equations’ system allow to determine the weights λi that can be utilized for the estimation in the point x0 through the equa-
tion (5.22). The Lagrange multiplier ν, used to introduce accuracy condition, allows to calculate estimation error variance in the same point:
σE2 =
N
X
i=1
λiγ(kxi− x0k) + ν (5.46)
The kriging problem is formulated by adding the non negativity con- straints to each of the weights on top of the classical constraint that their sum equals one. The solution is first found without inequality constraints. If all the weights are non-negative, the solution is accepted. Alternatively, the negative weights are set to zero and the corresponding gauges are removed from the computation and a new solution is found based on the reduced set of gauges.
Kriging for uncertain data (KUD)
It is possible to verify that the application of ordinary kriging in the points where measures are conducted gives an estimation that exactly corresponds to the measured values in that points. When available measures are affected by uncertainty, this characteristic represents a limit, for instance for measure or samples errors. In order to overcome this limitation, De Marsily (1986) proposes a variation called kriging for uncertain data (KUD). Mazzetti and Todini (2009) found that the method proposed by the previous author war incorrect or only valid for an homoschedastic field (all the errors at the different sites have the same variance). Mazzetti and Todini (2009) modified and tested the methodology proposed by De Marsily (1986). The new linear equation system became:
PN j=1λjγ ∗ (kxk− xjk) + ν = γ ∗ (kxk− x0k) k = 1, · · · , N PN i=1λi = 1 λj ≥ 0 (5.47)
where N is the number of gauges and: ( γ∗ k,j = γk,j+ σ2 k+σ 2 j 2 ∀k, j = 1, · · · , N ∪ k 6= j γ∗ k,0 = γk,0+ σ 2 k 2 ∀k = 1, · · · , N (5.48) where σ2
i represents the variance of measure error relative to the i-th obser-
vation point.
So adding one half the sum of variances of gauges errors to the extra diagonal terms of the kriging matrix, and, at the same time, adding one half of the errors variance to the variogram between the gauges and the point to be estimated, is possible to account for errors in gauges.
The kriging for uncertain data is particularly useful and appropriate when some level of uncertainty is attached to the data to be interpolated, as in the case of statistical parameters produced by fitting the GEV distribution. KUD has the advantage of making the interpolation less sensitive to local sampling effects: these sampling effects are incorporated in the parameter variance of estimation error and the interpolation is no longer required to be exact at the point of measurements. However, it remains unbiased and still minimizes the interpolation error variance.
Chapter 6
Error metrics
Error metrics reported in the next paragraphs were employed in order to com- pare the accuracy of results from different methods to reproduce observed extremes. In particular, metrics based on square statistics and metrics based on quantiles, calculated on the highest observed values, are considered. The first family of metrics measure the distance between the estimated frequency distribution and the empirical distribution of the sample. The second family of metrics only consider the observed higher-intensity events in the stations, for instance the first 5 maxima, in order to evaluate the goodness of fit in the right tail of the distributions, because of its importance in the characteriza- tion of extreme values. Some of these metrics need the definition of plotting position rules, that have been discussed in section 3.7. The metrics described in the next section are created using Hazen’s plotting position.
Figure 6.1 clearly illustrates the difference between the two families of error metrics. The red lines denotes the distances between the theoretical frequency distribution and the empirical one. The green lines denotes the distances between the observed values and the theoretical quantiles.
6.1
Square statistics of Cramer-von Mises’ fam-
ily
The square statistics of Cramer-von Mises’ family measure the discrepancy between the empirical cumulative distribution function, which is labeled Fn(x) and the theoretical distribution to test: F (x). Fn(x) can be calculated
with one of the plotting position rules mentioned in section 3.7. Parameters of F (x) can be known or unknown. This family of square statistics functions is described as:
Q = n Z ∞
−∞
[Fn(x)− F (x)]2ψ(x)dF (x) (6.1)
where ψ(x) represents a function of weights.
The Cramer-von Mises statistic W2 is obtained from equation (6.1) con-
sidering ψ(x) = 1, whereas the Anderson-Darling statistic A2 is obtained
considering ψ(x) = {F (x)[1 − F (x)]}−1
. Hence, A2 gives larger weight to
both distribution’s tails with respect to the central part, whereas utilizing the statistic W2 each part of the distribution is equally weighted. Finally,
utilizing Hazen’s plotting position, see equation 3.34, for Fn(x), the integral
in equation (6.1) produces the following expressions for the statistics W2 and
A2: W2 = 1 12n + n X i=1 F (xi)− 2i− 1 2n 2