While standard interpolation techniques are capable of fitting continuous models to dis- cretely sampled data, it is very desirable in remote sensing applications to somehow com- pensate for the noise terms in Equation 2.22. One method of interpolation that achieves this is known as kriging, which interpolates discrete sample by using stochastic informa- tion. In order to perform kriging, one must first generate an experimental semivariogram, and then fit a theoretical semivariogram model. The act of deriving a theoretical semi- variogram model from an experimental semivariogram relies on two major assumptions. These will be discussed below.
The first assumption that must be met concerns the first and second order moments of the random spatial process. By definition, these moments must be invariant under spatial translation by an absolute distance h:
E[Z(s)] = µ (2.23)
E[(Z(s) − µ)(Z(s + h) − µ)] = C(h) (2.24) Any function that satisfies these properties is defined as a second order stationary function. [39] Note that the function’s first and second order moments are independent of absolute spatial position, and only depend on the separation, h, between a discretely sample set of elevations. A milder hypothesis is often made when these assumptions cannot be met regarding the data. This hypothesis is defined by Chiles and Delfiners as the ”intrinsic hypothesis.” [39] An intrinsic hypothesis satisfies the following equations:
var(Z(s + h) − Z(s)) = 2Υ(h) (2.26) Where 2Υ(h) is the variogram of the spatial process and ha, hi is defined as the linear drift of the spatial process. It should be noted that the value Υ(h) is commonly referred to as the semivariogram of the random spatial process, but some authors use these terms interchangeably.
The second assumption that must be met in order to perform kriging is that the data set must exhibit ergodicity. By definition, a second order stationary random function Z(s) is ergodic in the mean if the spatial average over the sampled region converges to the expected spatial mean, µ, when the sampled region tends to infinity. [39]
In other words, the ergodic property allows the assumption of a spatially averaged mean from a single realization of a spatial process. This is an important concept for sampling elevation measurements of randomly distributed surfaces of sand and soil, where multiple measurements of the random spatial process that led to the orientation of individual grains can be difficult to obtain due to time-constraints in the field.
According to Chiles and Delfiners, if an elevation data-set of a surface (i) is unique, (ii) is defined by a 2-dimensional spatial coordinate system, and (iii) can be characterized by evenly distributed sample points, then structural analysis via a semivariogram method can be performed. Hengl defines the semivariogram in the following equation:
Υ(h) = 1
2(E[(Z(s) − Z(s + h))
2]) (2.27)
Equation 2.27 is meant to represent the true autocorrelation structure of the process making up the spatial process. [38] By using this equation, along with the definition of a covariance function, it can be shown that a semivariogram is directly related to the covariance of the spatial process:
Υ(h) = C(0) − C(h) (2.28)
As a result of this derived relationship, it is clear that a second order stationary random function is also an intrinsic random function. [38][39] From equation 2.28, it is also seen that the value of the semivariogram at zero lag distance is equal to the mean residual error, C(0), of the spatial process. [40] A critical tool in examining statistical properties of a spatial process is the empirical semivariogram (or experimental semivariogram). The empirical semivariogram is defined in the following equation. [40]
ˆ Υ(h) = 1 2N (h) X i,j∈N (h) |Zi− Zj|2 (2.29)
Where N (h) denotes the total number of pairs of elevation measurements whose spatial coordinates si, sj are separated by the lag distance, h. Models known as theoretical semi- variograms (described in the next subsection) can be fitted to the empirical semivariogram,
which can then be used in an interpolation process known as kriging (also described in the following subsections).
The physical interpretation of a semivariogram provides insight on the underlying spatial process, and should be considered before discussing the modeling of theoretical semivariograms. There are three major features that define the physical description of a semivariogram: the nugget, the sill and the range. The value of the semivariogram at h=0 is defined as the nugget of the semivariogram. This nugget is often characterized by a discontinuity at the origin of the semivariogram. According to Equation 2.29, for an idealized case of no underlying noise Υ(0)=0, meaning that there is no mean residual error for the spatial process. However, Chiles and Delfiners note that the nugget effect is made up of a combination of microstructures below the scale of the sampling grid that manifests itself as white noise, and measurements errors caused by both instrument signal-to-noise issues and mechanical positioning errors. [39]
The sill of the semivariogram provides insight into the total variance of the spatial process. Formally, the sill is defined as the limit of the semivariogram as the spatial separation between points tends to infinity. [40] The range of the sill is defined as the distance at which the difference of the semivariogram from the sill value becomes negligible. [39] These concepts can provide information about the characteristics of the underlying spatial process. For example, if the experimental semivariogram increases indefinitely as lag distance increases (meaning that a fixed sill cannot be reliably fitted to the theoretical semivariogram), it is indicative of an underlying spatial trend in the surface structure. [38] The concepts of sill, sill range, and nugget are illustrated in Figure 2.9 for both the case where the sill value is achieved at a lag distance equal to the sill range (a) and the case where the semivariogram increases indefinitely as a function of lag distance.
When collecting data points to build up an empirical semivariogram of surface struc- ture, the design of a sampling plan can be critical for achieving an accurate estimate of the underlying theoretical semivariogram. According to Oliver and Webster, there are three major considerations that must be considered for the design of an elevation sampling grid [6]:
1. The maximum lag distance to which you compute the semivariogram should exceed the range of the sill.
2. The step size of the discrete separation lag should be small enough and the number of lags large enough for the empirical estimates to distinctly reveal the functional form of the semivariogram.
3. The number of sampling points should be large enough to place the estimates of the semivariances within acceptable confidence limits.
These samples need not be random, due to the fact that the sampled values of the underlying spatial process are assumed to be the outcomes of a second order stationary
Figure 2.9: An illustration of a semivariogram for the cases of (a) a surface that possesses an underlying spatial trend causing the semivariogram to increase indefinitely, and (b) a case where the semivariogram achieves a sill value at the sill range. Credit: [6]
random process. Therefore, these points can be either sampled in a grid-like fashion or randomly distributed so long as they cover a wide range of separation distances. [6] These two manners of sampling pose tradeoffs for the outcomes. If a regular sampling approach is employed, the area of interest will be systematically covered, but distances smaller than the grid size will be misrepresented. On the other hand, if a randomized sampling approach is used, all distances between points will be represented, but the spreading of points in geographic space is lower. [38]