• No se han encontrado resultados

Implicación del arrendatario con el activo subyacente antes de la fecha de comienzo

Three different calibration techniques were used, namely, PCR, PLSR and BPNN. The PCR and PLSR were adopted for relating the variations in one response variable (OC, K, Mg, Na and P) to the variations of several predictors (wavelengths), using a TOMCAT, a MATLAB multivariate calibration toolbox (Daszykowski et al., 2007). Both PCR and PLSR model validation procedure was based on the leave-one-out cross validation method.

The number of PCs of PCR and LVs of PLSR for a model was determined by examining a plot of leave-one-out cross validation residual variance against the number of loadings or latent variables obtained from PCR and PLSR, respectively (Martens and Naes, 1989). For example, the number of latent variable of the first minimum value of residual variance was selected (Brown et al., 2005). Outliers were detected by subjecting the pre-processed spectra to PCA. The scores plot of the first two PCs provides two-dimensional maps showing the relation between data. Data points lying outside the 95% confidence ellipse (Hotelling T2) were considered as strong outliers (Figure 3-1) and were eliminated from the matrix (Constantinou et al., 2004).

The most popular neural network is BPNN (Bishop, 1995), which has been used for many applications. It can be used as calibration method for its supervised learning ability providing good results (Liu et al., 2008). Back propagation is the generalization of the Widrow-Hoff learning rule to Multiple-layer networks and nonlinear differentiable transfer functions (MatLab Neural Network Toolbox™ 6 User’s Guide).

Input vectors and the corresponding target vectors are used to train a network until it can approximate a function, associate input vectors with specific output vectors, or classify input vectors in an appropriate way. Networks with biases, a sigmoid layer, and a linear output layer are capable to approximate any function with a finite number of discontinuities (MatLab Neural Network Toolbox™ 6 User’s Guide). Figure 3-2 illustrates the architecture of the network that is most commonly used with the back propagation algorithm—the multilayer feed forward network.

Extreme long training time and over-fitting are two main difficulties of ANN calibration when using raw infrared spectral data points as inputs. The detailed network training procedure could be found in the ‘MatLab Neural Network Toolbox™ 6 User’s Guide’.

The input of BPNN might be either PCs obtained from PCA, or LVs obtained from PLSR (PLSR-vectors), and output will be the chemical values of the properties, some the input dimensions will be number of samples by PCs or LVs. Adopting PCs or LVs as input for BPNN is an effective way of reducing computation resources and improving the robustness of ANN calibration (Janik et al., 2007). In this study the two possibilities were tested, namely BPNN-PCs and BPNN-LVs. The number of PCs chosen as input for BPNN was based on the cumulative percentage of explained data variance. The first 5 PCs were considered as input in this study, since experience show that they can explain nearly 100 % of variance. The number of LVs considered as input for BPNN was the optimal number obtained at the first minimum value of residual variance, as explained by Brown et al. (2005). The selection of different numbers of PCs and LVs during the PCA and PLS, respectively, can be attributed to the fact that this procedure provided

the best results. Furthermore, this selection was also based on experience and review of literature.

A standard three-layer feed-forward network composed of one input layer (PCs or LVs), one hidden layer (initially ten nodes) and one output layer (one node) was used.

Each node in ANN, which represents a “neuron”, is associated with a transfer (activation) function that sums up the outputs from that node and passes them to the next layer in the network. This function was found achieved best accuracy for prediction. The tan-sigmoid function and a linear function were adopted in the hidden and in the output layers, respectively. The momentum set as 0.6, the least learning rate as 0.2, the threshold residual error as 0.5 × 10-5 and the training times as 2000, as suggested by He et al. 2008. During the training, 2 node configuration mehods were tested. After training, the number of nodes of the hidden layer was adjusted (12) so that to achieve the best results. To avoid over-fitting, the cross validation option was adopted. More details on the mathematical background of the ANN approach can be found in literature (e.g. Cheng and Titterington, 1994; Basheer and Hajmeer, 2000).

Due to the relatively small number of soil spectra covering a large geographical

Figure 3-2 Illustration of a feed-forward back propagation neural network (BPNN) containing input, hidden (with “n” neurons) and output layer

stable model performance, that’s why we divided the dataset to 90%:10%. The former set was the calibration set (cross-validation set) and was used to establish the calibration models based on leave-one-out cross-validation technique, which calculate the best model by using mean root mean square of prediction. The latter set was the validation set (prediction set) and was used for independent validation of the established models. This division of cross-validation (90%) and prediction (10%) sets was replicated three times and all four analyses (PCR, PLSR, BPNN-PCs and BPNN-LVs) were carried out on the three replicates. This was done in order to examine the robustness of calibration models developed for the prediction of five soil properties investigated. The sample statistics for the cross-validation and prediction sets for the three replicates are given in Table 3-1.

Table 3-1 Sample statistics of cross-validation and prediction data sets of the three divisions (replicates) of sample sets

Property

Number of samples

Minimum Maximum Mean Standard deviation

Cross-validation set

3.2.6 Statistical indicators used to assess the accuracy of calibration models