5. ANÁLISIS CONCEPTUALES
5.2. La Distribución del Factor de Protección
3.3.1
Dependence Modelling
A model was produced which calculated dependence χ between the various pairs of hydrological variables in the Ouse catchment using daily maxima records X and Y
based on equation 2.7 derived from Svensson and Jones (2000). The variables were independently observed but were paired through time (in this case one 24-hour water-day period). For example, the daily maximaX1 was observed on the same water-day as the
daily maxima Y1, X2 was observed on the same water-day as Y2, and so forth. These pairs were retained throughout the dependence calculation such that the dependence calculation was calculated using pairs ofX1 and Y1, X2 and Y2… Xn and Yn. As per Svensson and Jones (2000, 2002) the marginal distributions were assumed to be similar and were not transformed.
By using observed pairs taken at a daily resolution however, the dependence results may only be indicative of where extreme values occur simultaneously within any single temporal period (e.g. one water-day). The daily maxima values from each of the pairs of
X and Y may therefore have occurred up to 24-hours apart. For quickly responding catchments such as the Ouse (Environment Agency, 2002), a 24-hour period may be too
long to ascertain whether the peaks of the two variables can actually occur simultaneously, which is important for the estimation of water levels in a joint probability analysis. Svensson and Jones (2003) however stated that if there was no dependence calculated from data at a daily resolution, dependence would not exist at a higher resolution, such as hourly observations. To explore this problem, the dependence model was extended to include the complete recorded 15-minute datasets of the two variables of interest, allowing a dependence value to be calculated from real-time simultaneously-recorded pairs of X and Y to compare to the dependence valueχ
calculated using daily maxima values. As with the daily maxima datasets, the pairs of 15- minute variables X1 and Y1, X2 and Y2… Xn and Yn were observed simultaneously and were kept intact as pairs throughout the dependence calculation.
It was understood that the topography and hydrodynamics of a tidal river system may affect the temporal relationship (and therefore dependence) between sea level and river flow. If both sea level and river flow peaks were to occur at the same point in time, the physical time-lag between both sites (in this case Newhaven and Barcombe) would mean the peaks would not arrive at the point of interest (in this case Lewes) at the same time. For example, it takes 55-minutes for the tidal peak to propagate upriver from Newhaven to Lewes, and approximately 1-hour for the peak of the river flow to travel downriver to Lewes from Barcombe. Therefore, observed river flow and sea level records could not be utilised at the same time at both boundary sites when using the real-time (e.g. 15-minute) datasets.
A time-lag algorithm was therefore incorporated into the dependence model, which inserted a lag between the river flow and sea level observations, rather than to rely on a fixed time period to calculate a dependence value. The process initially selected the daily maxima values from the first dataset, including the actual time they occurred. The model then automatically selected the corresponding value from the second dataset recorded at the same time. For example, for a variable pair of sea level X and river flow Y, if a tide were to peak at 07:15 on any given day (e.g.X1), the model selected the corresponding
flow value (e.g. Y1) which was also recorded at 07:15 and calculated a value of
dependence. The process was then repeated with negative and positive time-lags (in ±15- minutes increments) introduced to recreate the hydrodynamic lag between the two variables. A dependence value was then calculated for each lag increment, up to ±1-day.
3.3.2
Threshold Selection
The dependence measureχ can be estimated from any threshold level. The selection of
*
x and y* for this analysis was determined by two requirements: firstly to have enough data points above the threshold to be able to determine dependence, and secondly for the threshold to be high enough to regard the values as extreme (Svensson and Jones, 2002). For example, setting the threshold value above the maximum value in the series would produce a zero dependence value, where as setting the threshold to select only the extreme values would provide enough points to successfully calculate a value of dependence. The threshold values were also selected for each variable independently from each other.
To calculate a value of dependence, the selection of threshold values was determined using a POT approach, which selected extreme values for each dataset independently based on a series of percentile threshold levels (i.e. 95%, 98% etc). The independence criterion was that any two POTs must not occur on consecutive days, but be separated by at least one day (e.g. Svensson and Jones, 2000). The process eliminated the non-extreme peaks (i.e. the everyday maximum values), and produced sets of the most extreme peaks.
3.3.3
Significance Testing
Significance testing of theχ values was carried out using a permutation method (e.g. described by Svensson and Jones, 2003), which used generated datasets to test for where independence would hold (i.e. a hypothesis of null dependence). The process estimated values ofχ corresponding to the 5% significance level.
Permutation is a random generation method, which tests for results which were above the 5% limit. If true (i.e. above the 5% limit), then the value is significant and the
dependence value is null. However, if the results were below the 5% limit, then they could be labelled as insignificant and therefore the dependence value accepted. In other words, if the calculatedχ value from the original dataset was significantly different to the calculatedχ from the generated values, then it may be concluded that the original records are not independent, and that the dependence value would therefore be correct.
The method selected the complete daily maxima data series for the two variables. Each series was then divided into complete years blocks (using the water-year September to August), meaning that the daily maxima data within each year block was not altered so as
to preserve the seasonality. Each year block was labelled1,2,...,n, in order of occurrence (i.e. 1=1982,2=1983,etc) for each variable. The first series was kept unchanged and in sequence, whilst the second record was permuted by randomly shuffling the complete year blocks (i.e.4,7,...,n). This created a random resample of observations from two records, so that each set equalled the same number of years as the original dataset,
allowing for a newχ value to be calculated. For each resample, the full dataset was used, but each water-year block was used only once.
The permutation test was repeated 199 times, each time keeping the first dataset in sequence and reshuffling the second dataset. A newχ value was calculated for each resample. The 199 calculated values ofχ were ranked in descending order, and the 10th largest value taken as corresponding to the 5% significance level. The originalχ value was then compared to the resampledχ; if it was found to be above the resampledχ, then the dependence between the variables could be considered genuine and the originalχ
value accepted.
3.3.4
Confidence Intervals
Confidence intervals may be calculated to provide an indication of the range where the true dependence value would be expected to lie. The process used a resampling method called bootstrapping, which was based on the generation of new datasets. Unlike the test for significance, the estimation of the confidence intervals looked for dependence rather than independence by generating data with the same level of dependence found between the original data series.
As with the significance test, to calculate the confidence intervals, both daily maxima series were kept intact within year-long blocks throughout the recalculation ofχ. The year blocks (containing simultaneously recorded observations of both datasets) were then chosen randomly with replacement, meaning that each year block could be used
infinitely within each recalculation ofχ. The generated resample dataset was kept to the same size as the original dataset and a new value ofχ calculated.
The process was then repeated 199 times, each time resampling the year blocks of the variables at random, generating a large number ofχ values. Each simulation produced either a higher or lower dependence value than the original one as some years contained
higher levels of dependence than others, and others less. For example, for a given year which produced a high level of dependence and was randomly selected (i.e. 3 times) within a resampled dataset, it would be expected that the resultantχvalue would be high. Similarly, a randomly resampled dataset which only contained years which displayed low levels of dependence, would produce a low value resampled value ofχ.
The 199 calculated values ofχ were then ranked in descending order, and the 10th and 190th largest value taken as representing the 95% and 5% confidence intervals
respectively. The confidence intervals are displayed besides the calculated values ofχ
for each variable pairing.