Introducción a la nueva generación: ePassNG

From the review given in Section 6.1, it is apparent that, most commonly, environ-

mental design optimality objective functions are stated as a minimisation task of some scalar function, φ, of the sampling locations or model parameters such as the prediction

covariance matrix; seeMateu and Muller [2013].

Reverting back to design in a single dimension, frequently the objective scalar function to be minimised is applied to the Fisher information matrix,

M(Y, s, α) = E " ∂ ∂αlog p(Y |α) 2# (6.1)

where log p(Y |α) is the log likelihood function, s are the locations of the observations and α are the model parameters. Through the Cramer-Rao inequality, it can be shown that the inverse of M is a lower bound for the conditional covariance matrix of the model parameters, α. In the case of normality, M is also the precision matrix i.e. the inverse

of the covariance matrix of the model parameters conditional on the data; see Nowak

[2010]. Therefore, while maximising a function of the information matrix is a popular

choice, minimisation of a function of the covariance matrix associated with parameter accuracy can also be used.

Geostatistical design is primarily focused on accurate estimation at unmeasured locations rather than on model parameter accuracy. Thus, most commonly the covariance matrix

associated with prediction accuracy, namely the Kriging variance matrix, Cs0|y, (see

Chapter 2) is used; see Zimmerman and Li[2013].

Building on the modelling methodology and notation for spatial and spatio-temporal

data for spline-based models (see Chapter 2), the covariance matrix of the basis co-

efficients, C_α|y_ˆ , can be used. Nowak [2010] discusses the relationship between design

optimalities for classical regression design problems and geostatistical design problems. In the majority of cases, the parameter covariance matrix can be replaced by the prediction covariance matrix. Thus, for notational simplicity, the covariance matrix of interest will be referred to collectively as C hereafter; its dimensions are m × m.

Outlined below is a summary of some of the more commonly used design criteria, ex- tended from the traditional regression-like context into spatial and spatio-temporal do-

mains, based on the work ofNowak [2010].

Design Objective Functions based on the Covariance Matrix

In 1959, Kiefer and Wolfowitz [1959] presented the concept of alphabetic optimalities

with the introduction of D- and E- optimal designs for regression estimation problems. Their work has been significantly added to, with optimalities such as the A-, C- and T- now being used widely within the statistical literature. Traditional regression based designs assume observations are collected with independent errors. This assumption is clearly violated for spatial and spatio-temporal data and thus adjustments need to be

made; see Mateu and Muller [2013]. Classical regression based designs focus primarily

on minimising the uncertainty around the parameters being estimated in the models. In contrast, designs for spatial and spatio-temporal data mainly focus on minimising

the uncertainty associated with predictions at unmeasured locations; see Le and Zidek

[2006].

• A - Optimality aims to minimise the quadratic penalty function:

( ˆα − αtrue)>A( ˆα − αtrue)

which is equivalent to minimising the average parameter estimation variance. The resulting function to be minimised is:

φA=

mtr[AC] (6.2)

where A a non-negative definite matrix and m is the dimension of C. • D - Optimality aims to minimise:

φD = det[C]1/m=

j=1

eig_j(C)1/m (6.3)

The logarithm of this function is also widely used. This is the most common objective function for classical regression design problems; however, in dense sampling networks the computational expense is significantly large. This measure is also

very sensitive to a single highly informative observation; seeNowak [2010].

• E - Optimality minimises:

φE = max(eig(C)) (6.4)

The primary role of the E- criterion is to assess whether large-scale variability has been removed. The computational cost of this measure is not as extreme as that of D- due to the computation time of the largest eigenvalue being significantly less than the computation time of the full set.

• G - Optimality aims to minimise the maximum prediction variance over the design region. The criterion to be minimised is:

φG = max[C] (6.5)

• P - Optimality provides a generalisation of the A-, D- and E- optimalities, with the criteria to be minimised being:

φP =   m X j=1 eig_j CP   1/P (6.6)

When P = 1, 0 and ∞, P-optimality becomes the A-, D- and E- optimalities respectively. However, for large datasets this measure is extremely expensive to compute due to it requiring all eigenvalues to be computed.

• Average estimation variance exploits the fact that estimation variance is the value of the conditional covariance when the distance between locations is 0. This corresponds to the diagonal elements of C. The average estimation variance which is to be minimised is then given as:

σ_E2 = 1

mtr(C). (6.7)

This is equivalent to the A- criterion when the matrix A is the identity matrix

(A = I). This is also known as the AI optimality, denoted as φAI.

Other Design Objective Functions

In the measures for optimisation discussed earlier, the main aim was to achieve the best predictive accuracy possible. Studies whose aims are primarily exploratory may want to choose an objective function that gives good coverage of the study region and that are

space filling in nature; seeMateu and Muller [2013].

• Minimax Distance Design aims to minimise the maximum distance between a

given unsampled point s0 and its closest point in a given design S = {s1, . . . , sn}.

The criterion to be minimised over the study region, D, is therefore:

φmM = max

s0∈D min

||s₀− s_i||. (6.8)

In document Guía de Referencia PKI epassng (página 33-40)