2.7.1 Rationale
In the event of hydrocarbon operations beginning in Kirby Misperton, our measurements of water-quality parameters made during the baseline phase will be compared to measurements during the operational phase to determine whether any detectable changes have occurred. Robust and objective protocols are required to determine whether any differences observed between measurements from each phase are larger than those that could have resulted wholly due to the underlying variability of the measurements.
However, as demonstrated, the baseline measurements indicate complex patterns of variation for each analyte:
the mean and standard deviation for a given parameter vary greatly from site to site.; the time series from individual sites include large fluctuations and spikes where single large
values occur during a period of generally small measurements;
the large measurement fluctuations that are evident do not appear to follow regular seasonal or temporal trends nor do they occur at the same time at different sites.
Such behaviour is inconsistent with many statistical tests of differences between sets of measurements.
Previous statistical analyses of baseline datasets have centred upon the methods presented in a report by the Environment Agency (EA, 2019) which suggests that:
1. graphical summaries (time series and box-whisker plots) of a subset of key analytes be produced for each site and inspected;
3. summary statistics of the central tendency and spread of the measurements be produced for each site and for each analyte, both including and omitting outliers;
4. tests be conducted to determine whether the measurements of each analyte at each site are consistent with data drawn independently from a Normal distribution with a constant mean;
5. change indicator values (CIVs) be established for each analyte at each site. These values are defined as the sum of the mean of the data with outliers omitted plus the magnitude of the detectable change, the detectable change being the degree of change that would be expected (with a probability of 80%) to be recognised as significant at the p=0.05 level. These CIVs should prompt further investigation rather than being seen as strict thresholds or limits.
A number of questions remain regarding the precise manner in which the CIVs will be applied. First, over what time period will measurements from the operational phase be compared to the CIVs? The method used for deriving them appears to imply that the comparison will be made once the number of sampling rounds in the operational phase matches the number of rounds in the baseline phase. If this were the case, changes in the analytes on the commencement of hydraulic fracturing activities will remain uninvestigated for months or years.
The handling of outliers amongst the data is also of concern. Since outliers are omitted from the mean values used in the derivation of the CIVs, they would presumably be omitted from the operational phase data which are compared to the CIVs to ensure a fair comparison. However, in the event that activities did lead to large measurement values during the operational phase, these could be dismissed as outliers and not considered.
The approach outlined in a recent Environment Agency report (EA. 2019) indicates that where the baseline data are inconsistent with independent samples from a Normal distribution, the comparisons to the CIV should be treated with caution. It is not clear what this would mean in practice.
In addition, conducting comparisons to the CIVs on a site-by-site basis could lead to a rather confused picture if measurements exceed the CIV at a subset of locations.
Early plans for our statistical evaluation of the baseline and operational water-quality data concerned use of an approach based on the calculation of the space-time mean of each analyte according to the method of Brus (2014). The space-time mean is the average of the analyte value across a specified window in space and time as predicted by a statistical model. The model used by Brus (2014) assumes much more regular variation of the analytes than was observed during the baseline period.
We propose to modify the model of Brus (2014) so that it is consistent with the baseline data and then use it to predict the space-time mean across portions or windows of the operational phase. We define the spatial extent of the window of interest to be the set of API (Area of Potential Impact; Ward et al., 2017) measurement locations10. The temporal extent is flexible
but in this report we consider it to be each individual round of sampling. We determine the space-time mean for each round of sampling during the operational phase and test whether this quantity is consistent with the baseline data. Thus, this approach could reveal a change in the
10 Whilst API – Area of Potential Interest – suggest as an area, it should recognised that for groundwater monitoring sites within the API area are monitoring a three-dimensional system or volume, i.e. part of an aquifer. The selection of monitoring sites to be included in the API (or control area) should be informed by the conceptual model and be monitoring at locations with the same hydrogeological characteristics. Therefore they are necessarily aquifer- and, by association, depth-specific.
analytes from just one round of sampling. It does not automatically omit outliers from the analysis and does not require the assumption of the measurements being consistent with an independently sampled Normal distribution. Also, it reduces the statistical analyses to a single test or each analyte.
2.7.2 Overview of the approach
We require a statistical model of the baseline variation of each analyte. We will then use this model to determine the degree of variation that could be expected in the data from the operational period if no underlying change has occurred. Finally, we will compare the actual variation in the operational phase to this expected variation to determine if they are consistent. The baseline model must account for any patterns in the model due to spatial or temporal correlation amongst the data or a tendency for measurements from the same round of sampling to be similar.
Before estimating a statistical model, we standardise the data so that the measurements from each site are comparable. The majority of the analytes have a highly skewed (i.e. asymmetric) distribution. Where this is the case we apply a log-transform to reduce this skew so that the data are more consistent with a Normal distribution. We then standardise the data from each site by subtracting the mean for that site and then dividing by the standard distribution.
Our most basic model assumes that the standardised data are drawn from a Normal distribution with mean of zero and unit variance. We calculate and inspect variograms of the standardised data to see if any spatial and/or temporal correlation is evident. Where it is, an appropriate correlation term is added to the model. Similarly, we explore whether the mean of the baseline data varies with time and again add an appropriate term to the model if required. A statistical test is used to confirm that any additional term does improve the model before it is finally accepted.
We use our baseline model to determine the expected mean standardised measurement value across the API for a future round of sampling and the specified confidence limits for this mean. The model can account for sites being missed in a future sampling round. If the observed mean of the standardised measurements falls outside the confidence limits then this is seen as an indication that further investigation is required.
One concern about the approach is whether sufficient baseline data exist to estimate such a model accurately. We test this by using the models to simulate long time series of data that could be expected for each analyte. We then follow the above procedure, estimating models for baselines of different length and treating the remaining simulated data as the operational phase. In these simulated examples there is no underlying difference between the baseline and operational phases. Therefore, a substantial number of cases where further investigation is required will be an indication that more baseline data are required.
2.7.3 Technical details of the modelling methodology
Statistical analysis of Vale of Pickering data is presented of data of four analytes (CH4, Cl, Na
and NH4) from 20 API Superficial groundwater sites (Figure 29). The approach can be applied
to any analyte, but only four are presented here to demonstrate the development/application of the statistical methodology. However, the four were selected as being representative change indicators. For example, demonstrating variations in salinity (Na, Cl), the hydrocarbon source gas (CH4) and a redox indicator (NH4). The selection of analytes also considered the frequency
of measurement quantification (LOQ). Automation of the process and production of a ‘plug- in’ programme for rapid analysis of data wull be produced in a further phase of the project.
The data considered were collected irregularly from September 2015 to August 2018. The intervals between two adjacent sampling rounds range from 1 month to 3 months.
For the purpose of modelling, the sampling dates are rearranged into 29 ‘representative’ sampling dates, by shifting the actual dates to their nearest ‘representative’ sampling dates. Log transformation was applied to the CH4, Cl and Na data as the distributions for these
variables are severely right skewed. No transformation was applied to the NH4 data. The
(transformed) data were first standardised site by site, as: 𝑌𝑡𝑗 =𝑌𝑡𝑗
∗ − 𝜇̂ 𝑗
𝜎̂𝑗 , 𝑗 = 1, … , 𝐽; 𝑡 = 1, … , 𝑇,
where 𝑌𝑡𝑗∗ is the observation taken at site 𝑗 time 𝑡, 𝜇̂𝑗 and 𝜎̂𝑗 are the sample mean and standard deviation of the data at site 𝑗. Then spatial/temporal correlation in the data were investigated through empirical spatial and temporal variograms, computed as:
𝛾(ℎ) = 1
2|𝑁(ℎ)| ∑ (𝑌𝑖 − 𝑌𝑗)
2 𝑖,𝑗 ∈𝑁(ℎ)
,
where 𝑁(ℎ) is the collection of pairs of standardised observations 𝑌𝑖 , 𝑌𝑗 that are distance
ℎ apart (distance can be in space or in time) and |𝑁(ℎ)| is the number of such pairs in the collection. The empirical variograms can be computed using the ‘gstat’ package in R (Gräler et al., 2016). Provided that there are no evident spatial or temporal correlations, the baseline distribution can be established by the sample means and standard deviations.
Figure 29. Boxplots of log transformed CH4 (top left), log transformed Cl (top right), log
transformed Na (bottom left) and NH4 (bottom right) from 20 API sites investigated in
In the presence of spatial/temporal correlations, they will need to be appropriately accounted for in the model to quantify better the uncertainty of the space-time mean. Different models with residual correlation structure reflecting the spatial/temporal correlations may be considered, such as:
𝑌 = 𝑋𝛽 + 𝜀, 𝜀 ~𝑁(0, 𝜎𝜀2𝑉),
where 𝑉 is a valid correlation matrix describing the spatial/temporal structure in the data. Furthermore, a random effect component may be introduced to account for additional unexplained variations, e.g. temporal random variation not explained by the covariates in the design matrix 𝑋. This can be modelled as:
𝑌 = 𝑋𝛽 + 𝑍𝜂 + 𝜀, 𝜂 ~ 𝑁(0, 𝜎𝜂2𝐼), 𝜀 ~ 𝑁(0, 𝜎 𝜀2𝑉)
where 𝑍 is the temporal random effect design matrix, 𝜂 is the corresponding random effect coefficient vector and 𝐼 is the identity matrix. Parameters in the above models can be estimated by maximizing the (residual) log-likelihood, using package ‘nlme’ in R (Pinhero et al., 2018). The sample standard deviations may be adjusted using the estimated covariance matrix and/or the random effect variance. Confidence interval of the mean standardised observations (i.e. standardised observations averaged over sites) at time point 𝑡 can be obtained from the estimated model, as:
± 𝑧𝛼√𝜎̂𝜂2 + 𝜎̂𝜀 2
𝑛𝑡 ,
where 𝑧𝛼 is the upper 𝛼 quantile of the standard Normal distribution (e.g. 1.96 for the 95%
confidence interval) and 𝑛𝑡 is the number of observations recorded at the sampling round 𝑡. The pointwise confidence intervals form a confidence band, which can be extended to future time points to decide if the measurements taken at a future date should raise any concern. The above approach is applied to the observed data to establish the baseline model. When new data are acquired, they are first standardised using the empirical site means and standard deviations from the baseline model. Then the mean of these standardised data across the API are calculated for each round of sampling. The corresponding confidence bands are constructed using the estimated baseline model parameters (including the parameters associated with the random effect or the correlation matrix). Finally, the observed standardised mean is compared to these confidence limits to assess whether this should trigger further investigation, i.e. if it was outside the confidence limit range.
The resulting baseline models for the CH4, Cl, Na and NH4 data from the 20 API sites are
presented in Table 3 - Table 10.
There appear to be no distinctive spatial or temporal correlation in the data for any of the four analytes. However, there are random temporal fluctuations in the standardised data that cannot be explained fully by a simple intercept model. Therefore, models with temporal random effect were fitted.
(1) Baseline model for the log-transformed CH4 data
Table 3. Means and standard deviations of log(CH4)
Figure 30. (left): histogram of the standardised log(CH4) data. (middle) Empirical
spatial variogram (cut off at 2000 m); and (right) empirical temporal variogram (cut off at 6 sampling rounds) of the standardised data
Table 3 shows the sample mean, standard deviation of the 20 API sites. A histogram of the standardised data is shown in the left panel of Figure 30. The empirical spatial and temporal variograms (middle/right panel of Figure 30) show no distinctive patterns and hence no distinctive correlations.
A linear mixed effect model with temporal random effect was fitted. The model was tested against a simple intercept model and the likelihood ratio test suggests that the temporal random effect is significant (Table 3). The mean standardised time series and the adjusted confidence bands are shown in Figure 31.
log(CH4) mean std.dev Site 1 2.1313 1.3495 Site 14 6.7410 2.1199 Site 15 9.4393 0.7974 Site 3 8.2055 0.5189 Site 35 7.4375 0.5034 Site 36 1.4336 0.9981 Site 38 10.1261 0.2998 Site 4 6.0893 0.8776 Site 40 8.0146 0.2636 Site BGS 41 6.6386 1.2599 Site BGS 42 10.3406 0.2239 Site BGS 43 4.2012 1.8683 Site BGS 44 4.2803 2.0305 Site BGS 45 4.9707 1.4722 Site BGS 46 2.9145 2.2709 Site BGS 54 2.8170 1.3402 Site TE 48 1.7266 1.9625 Site TE 49 1.6137 2.2599 Site TE 50 2.4442 2.2160 Site TE 52 2.4195 2.2945
Table 4. The fitted temporal random effect model
(2) Baseline model for the log transformed Cl data
Table 5. Means and standard deviations of log(Cl)
The estimated linear mixed effect model
Value 95% interval Random std 0.3065 (0.1929, 0.4871)
Residual std 0.9304 (0.8664, 0.9992)
Likelihood ratio (LR) test
DF LogLik LR p-value Simple 2 -566.52
Mixed 3 -559.53 13.97 0.0002
Figure 31. The mean standardised time series and the adjusted 95% confidence band
Table 5 shows the sample mean, standard deviation of 20 API sites. A histogram of the standardised data is shown in the left panel of Figure 32.
The empirical variograms (middle/right panel of Figure 32) show no distinctive spatial or temporal correlation.
A linear mixed effect model with temporal random effect was fitted. The model was tested against a simple intercept model and the likelihood ratio test suggests that the temporal random effect is significant (Table 5). The mean standardised time series and the adjusted confidence bands are shown in Figure 33.
log(Cl) mean std.dev
Site 1 3.1958 0.0793 Site 14 4.6386 0.0510 Site 15 5.5996 0.4334 Site 3 4.8976 0.0879 Site 35 3.7910 0.0757 Site 36 3.3019 0.0400 Site 38 5.8389 0.0472 Site 4 4.2831 0.0261 Site 40 4.7641 0.1321 Site BGS 41 4.7254 0.1401 Site BGS 42 4.3543 0.1340 Site BGS 43 5.2502 0.1564 Site BGS 44 5.2573 0.3975 Site BGS 45 5.0552 0.2302 Site BGS 46 3.5344 0.0845 Site BGS 54 4.2917 0.1682 Site TE 48 4.7244 0.0922 Site TE 49 3.9930 0.0375 Site TE 50 3.5437 0.0655 Site TE 52 3.8327 0.1604
Figure 32. (left) Histogram of the standardised log (Cl) data. (middle) Empirical spatial variogram (cut off at 2000m) and (right) empirical temporal variogram (cut off at 6 sampling rounds) of the standardised data
Table 6. The fitted temporal random effect model
The estimated linear mixed effect model
Value 95% interval Random std 0.3134 (0.2020, 0.4860)
Residual std 0.9280 (0.8651, 0.9954)
Likelihood ratio (LR) test
DF LogLik LR p-value Simple 2 -586.41
Mixed 3 -578.18 16.46 <.0001
Figure 33. The averaged (over sites) time series and the adjusted 95% confidence band
(3) Baseline model for the log transformed Na data
Table 7. Means and standard deviations of log(Na)
Figure 34. (left) Histogram of the standardised log (Na) data. (middle) Empirical spatial variogram (cut off at 2000m) and (right) empirical temporal variogram (cut off at 6 sampling rounds) of the standardised data
Table 7 shows the sample mean and standard deviation of 20 API sites for Na. A histogram of the standardised data is shown in the left panel of Figure 34. The empirical variograms (middle/right panel of Figure 34) show no distinctive spatial or temporal correlation.
A linear mixed-effect model with temporal random effect was fitted. The model was tested against a simple intercept model and the likelihood ratio test suggests that the temporal random effect is significant (Table 8). The mean standardised time series and the adjusted confidence bands are shown in Figure 35.
log(Na) mean std.dev
Site 1 5.1414 0.0354 Site 14 6.4211 0.0512 Site 15 6.1216 0.1692 Site 3 5.4266 0.0374 Site 35 5.8552 0.0530 Site 36 5.1615 0.0354 Site 38 5.7823 0.0369 Site 4 4.9864 0.0349 Site 40 6.4208 0.0413 Site BGS 41 5.9700 0.1994 Site BGS 42 6.1841 0.0839 Site BGS 43 6.4713 0.0883 Site BGS 44 3.6084 0.1827 Site BGS 45 5.3307 0.1744 Site BGS 46 5.6686 0.2642 Site BGS 54 5.4771 0.0311 Site TE 48 3.5399 0.0858 Site TE 49 3.5100 0.1309 Site TE 50 3.1995 0.0815 Site TE 52 5.9516 0.0452
Table 8. The fitted temporal random effect model
(4) Baseline model for the NH4 data
Table 9. Means and standard deviations of NH4
The estimated linear mixed effect model
Value 95% interval Random std 0.5432 (0.3938, 0.7492)
Residual std 0.8434 (0.7861, 0.9048)
Likelihood ratio (LR) test
DF LogLik LR p-value Simple 2 -586.41
Mixed 3 -551.46 69.89 <.0001
Figure 35. The averaged (over sites) time series and the adjusted 95% confidence band
Table 9 shows the sample mean, standard deviation of 20 API sites. A histogram of the standardised data is shown in the left panel of Figure 36.
The empirical variograms (middle/right panel of Figure 36) show no distinctive spatial or temporal correlation.
A linear mixed effect model with temporal random effect was fitted. The model was tested against a simple intercept model and the likelihood ratio test suggests that the temporal random effect is significant at the level of 0.05 (Table 10). The mean standardised time series and the adjusted confidence bands are shown in Figure 37.
NH4 mean std.dev Site 1 1.0068 0.0789 Site 14 2.7772 0.3352 Site 15 1.3326 0.1671 Site 3 0.4540 0.1121 Site 35 1.4509 0.0794 Site 36 0.9227 0.0832 Site 38 0.6324 0.2112 Site 4 0.4646 0.0591 Site 40 0.7912 0.0441 Site BGS 41 2.0101 0.3747 Site BGS 42 2.1443 0.1438 Site BGS 43 1.0454 0.1074 Site BGS 44 0.0783 0.0434 Site BGS 45 1.4473 0.3012 Site BGS 46 0.5918 0.3020 Site BGS 54 0.7407 0.0540 Site TE 48 0.1434 0.0736 Site TE 49 0.2034 0.0414 Site TE 50 0.3635 0.1013 Site TE 52 1.7615 0.4066
Figure 36. (left) Histogram of the standardised NH4 data. (middle) Empirical spatial variogram (cut off at 2000m) and (right) empirical temporal variogram (cut off at 6 sampling rounds) of the standardised data
Table 10. The fitted temporal random effect model
2.7.4 Application of the modelling approach
The statistical approach assumes that the model reflects accurately the baseline variation and it does not account for any uncertainty in estimating this model. We therefore need to confirm that each model has been estimated sufficiently accurately such that this missing component of uncertainty does not lead to observations from the operational period being erroneously identified as inconsistent with the baseline period.
For demonstration purposes, we do this by splitting the available data into two parts. One, consisting of 21 rounds of sampling up to July 2017, is treated as the ‘baseline’ period, whereas the remaining eight rounds of sampling are treated as if they were the ‘operational’ phase. The modelling approach was applied to these ‘baseline’ data and then the observed means from the ‘operational’ phase were compared to the estimated baseline model. If the fitted baseline model is able to capture the expected variation (i.e. the natural, background variation) in the data