• No se han encontrado resultados

All statistics were performed using Microsoft Excel 2003 and R open source software (R Development Core Team, 2007). In all statistical testing, significant differences were regarded as probability values less than 1 % (**) or 5 % (*) and all t-tests used corrected degrees of freedom. A number of key questions and hypotheses were posed:

• My first key question was to determine if the selected quadrats, when combined, were generally above or below the threshold for self-sustainability? This was tested by ranking the LFA SSCIs from lowest to highest and the distribution compared to the calculated theoretical threshold allowing one to draw conclusions about the level of degradation and the self-sustainability of the biogeochemical processes at the soil surface (Tongway and Hindley, 2004). The threshold is calculated as the central value in the range reflecting the inflection point between the two curves generated by the ranking of the obtained LFA index values. This threshold is not an absolute value as it is dependent on the quality of range end- points in reflecting the extremes available in the environment under study;

• I hypothesised that there was no difference in stability, infiltration or nutrient cycling between quadrats in the two mining regions, and tested this with a Welch Two Sample t-test by comparing the non-rocky grassland quadrats from each mining region, with disturbance levels combined;

• Similarly, I hypothesised that there would be no difference in LFA indices between disturbance levels, when combining vegetation and mines for each disturbance level, and tested this with a Welch Two Sample t-test;

42 • To test my hypothesis of no difference between vegetation types, a one-way ANOVA was applied to the LFA indices for the four vegetation types, after combining disturbance levels and mining regions within each vegetation type; • My final hypothesis, for the LFA data, was that stability, infiltration and nutrient

cycling indices would have no difference between low disturbance sites compared to high disturbance sites within each vegetation type. For this analysis I regarded non-rocky grasslands from Vaal River and West Wits regions as separate entities. This hypothesis was tested with a number of Welch Two Sample t-tests.

The VI values were subjected to the same basic hypotheses and statistical approach described above for the LFA indices, without ranking the results, as threshold values for a VI are irrelevant.

• A further key question was to test the accuracy of the VIs in measuring the plant characteristic they were designed for (i.e. chlorophyll or plant water content), in the absence of empirical data about these plant characteristics, when using spectral reflectance of winter senesced vegetation. To this end, correlation analysis was performed between the VIs to test for relationships between the VIs. The underlying assumption being that VIs measuring a similar plant characteristic but using widely separated wavelengths should give highly correlated results.

• To test the hypothesis that VIs can predict LFA indices, simple linear regression (Galpin, 2007) was applied between the LFA indices as response variables, and the VIs as the predictor variables. In all analyses, outliers were identified as extreme values inconsistent with the general trend in the data, and tests repeated with outliers removed but this did not improve statistical results, so all results in this report are shown with no quadrats removed from any analysis.

2.8 Partial Least Squares Regression Modelling

PLSR is a data compression method using matrix algebra techniques for extracting “latent variables” or components from two data sets through maximising the explained X/Y covariance (Frank and Friedman, 1993; Martens, 2001). PLSR modelling was performed with R open source software (R Development Core Team, 2007) and the pls package (Mevik and Wehrens, 2007, Wehrens and Mevik, 2007). The LFA indices and spectral data were paired. Thereafter, the pairs were randomly separated into two datasets by allocating each fourth spectrum into validation data (n = 26), with the remaining data (n = 79) used to calibrate suitable PLSR models. The validation data was used as new data to test the predictive accuracy of the models selected from the calibration phase. The LFA data was used as calculated with no transformations or scaling. The spectral data first had the wavelengths affected by sensor steps or atmospheric water interference removed (Ong et al., 2004, 2008; Figure 11) and were then centred by the pls algorithm (Mevik and Wehrens, 2007), but not scaled as all the spectral measurements have the same units (Geladi and Kowalski, 1986). The Root Mean Square Error of Prediction (RMSEP) was calculated using Leave-One-Out (LOO) Cross-Validation (CV) (Mevik and Wehrens, 2007, Mevik and Cederkvist, 2004). The RMSEP was used to select the best fitting models while avoiding over-fitting. Over-fitting is when a model with many parameters gives a strong fit to the data, often by modelling both the features and the “noise” in the data. However, such models with many parameters, some of which are modelling data “noise”, have poor predictive abilities with new data. Such best fitting models are identified by having the lowest or lowest local RMSEP. Interpretation of the main environmental features influencing the models was performed by examining the plots of the loading factors for components constituting the model parameters (Mevik and Wehrens, 2007).

44 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 350 500 650 800 950 1100 1250 1400 1550 1700 1850 2000 2150 2300 2450 Wavelength (nm)

R

ef

lect

an

ce

Plant cellulose and lignin Plant-water content

Plant chlorophyll and carotenoids Plant carotenoids and anthocycnins

Plant chlorophyll Mean spectrum

Figure 11 The full spectrum for each quadrat (n = 105) showing the position of the selected bands for different categories of Vegetation Index (VI). The spectra themselves show the prepared spectra for PLSR modelling with removal of the initial UV / visible region (350 – 399 nm), the step between sensors (990 – 1010 nm), atmospheric water noise (1350 – 1450, 1800 – 1950 nm) and sensor/ source noise in the SWIR (2400 – 2500 nm).