• No se han encontrado resultados

4.2 Marco Metodológico

4.2.2 Participación Social

General approach

Before beginning the regression analysis, it was necessary to consider how the metamodels should be derived, and how many there should be for each metric. A common approach is to derive two or more different metamodels for the same metric, using independent subsets of the sample data, and check their predictive abilities against each other. Each metamodel is then used as a predictor, and differences between each model’s predictions allow a

confidence interval to be estimated. Alternatively, the regressions are repeated until the two models conform to the same basic structure, as represented by their regressor terms and any mathematical functions thereof. This procedure, called ‘cross-validation’, is well

established in engineering and operational research and is served by a large, standard literature (e.g. Davison and Hinkley, 1997; Good, 1999; Marriott and Krzanowski, 1995; Webster and Oliver, 2001; and numerous others). An altogether different method of cross- validation could be used instead, however, in which one or more data points at a time are omitted from the whole set, and a different metamodel is derived from the remaining data (e.g. Davison and Hinkley, 1997). In the latter technique, the aim is to arrive at one ‘best’ metamodel form, which is used as the sole predictor for the metric.

For the purposes of this research, although cross-validation has some attractions, it was thought that it would be difficult to take subsets of the central composite design data without introducing aliasing, particularly between the two-factor interaction effects. This would compromise the analysis, defeating the object of conducting the initial experiment using a regular sampling design. Also, cross-validation did not appear to simplify

derivation of the final metamodel forms nor did it eliminate the need for further simulations to sample unexplored areas of the parameter space. For these reasons, cross-validation was not adopted here.

Having excluded cross-validation, the approach to metamodel derivation follows the manner of model development and improvement outlined in the standard texts (e.g. Box et al., 1978; Box and Draper, 1987). In particular, the aim initially was to derive preliminary linear regression models from the central composite design sample, test predictions made using these models by running further simulations, and then run additional simulations and improve the fit of the metamodels if required. A disadvantage of this approach is that each

set of test simulations needs to be quite large (10s of sample points) to test predictions across a 10-D parameter space. Moreover, it would not be appropriate to use data from the tests in deriving the final metamodel for each metric, which implies some wastage in the sampling. These points aside, the approach seemed to be suitable for the purposes here, and further comments on how it was implemented now follow.

Simplicity of preliminary metamodels and general form

The advantage of using metamodels over full simulations has been explained in Chapter 3, from which it is implicit that, for the metamodels to be representative a LEM’s behaviour, they have to predict GOLEM’s output to within an acceptable degree of error. When planning this phase of the research, although it was not expected that the error could be reduced to zero, it was thought important that it should be reduced so far as practical, and a number of principles were followed in order to achieve this.

Following recommendations in the standard texts (e.g. Box and Draper, 1987; Box et al., 1978; Draper and Smith, 1981), these principles were, in the preliminary metamodelling analysis, to achieve high R2 and adjusted R2 scores (as evidence of a good fit with small errors), and balanced residuals plots (as evidence of correct model formulation and low bias). Furthermore, so far as possible, the metamodels should not include terms appearing to be only marginally significant in the regression. These principles were also followed in deriving the final metamodels, and an additional principle, that the constant term should be generally stable between regressions using different subsets of the data but the same model structure was also included at that stage (Box et al., 1978)3.

The initial analysis, therefore, was aimed at deriving metamodels employing a low number of regressor terms, each with high or very high significance4, and using only simple

transformations of the predictor variables. With regard to the latter point, the functions to be used should not produce points of inflexion within the range of interest, as recommended

3 This is not essential, as high R2 scores and balanced residuals plots are generally good evidence of a sound model. However, the stability of the constant term also provides some additional assurance that the basic model structure i.e. its mathematical formulation, is correct, and this may be useful where fitting of a more complex model form is being attempted (e.g. Box et al., 1978).

for response surface investigations during early phases of analysis (e.g. Box and Draper, 1987; Wu and Hamada, 2000).

As an additional aim to the above, the author hoped to include as many of the parameters as possible in the models, so that each metamodel would be sensitive to changes in the values of most of the parameters rather than to changes in just a few of them. Finally, the

preliminary metamodels were restricted to a linear (i.e. additive) form, with the possibility that they could be changed to non-linear forms after further sampling and analysis. In this respect, drawing on examples from standard works (e.g. Box and Draper, 1987; Draper and Smith, 1981), and taking into account the foregoing, each preliminary metamodel, ignoring the error term, was expected to have a structure conforming to the equation below:

) ( ... ) ( ) ( ˆ 0 1x1 1f x1 2x2 2f x2 jxj jf xj y=β +β +γ +β +γ + +β +γ ) ( ... ) ( ) ( 1 2 13 1 3 13 1 3 12 2 1 12x x γ f x x β x x γ f x x βijxixj γijf xixj β + + + + + + + , 4.02

where ŷ is the predicted value of the metric at a particular sample point positioned by the parameter values {x1, x2 , …, xj}, assuming j parameters in the model, β0 is a constant, and

the other terms are understood as follows:

β1 to βj are coefficients for the linear main effects terms of x1 to xj;

β12 to βij are coefficients for the linear two-factor interaction effects terms of x1x2 to xixj;

γ1 to γj are coefficients for the curvature terms of the main effects, expressed by the

functions f(x1) to f(xj);

γ12 to γij are coefficients for the curvature terms of the two-factor interaction effects,

expressed by the functions f(x1x2) to f(xixj);

• the functions f(), whether applied to a single parameter or a two-way interaction, may be quadric, power, log, hyperbolic, inverse or similar functions, without points of inflexion within the parameter value ranges of interest.

With these points in mind, the preliminary metamodels were derived for sediment yield, drainage density and sediment delivery ratio, using least squares multiple regression, and working many times through forward, backward and best subsets procedures. In deriving the metamodels, the analyses were all performed using a standard statistics software package (MINITAB 14 ®). Main effects were also ‘tuned’ to some extent, through trial of different functions and polynomials in the regressions, in order to obtain more balanced

residuals plots. The preliminary regression work was successfully completed on these three metrics, and the resulting metamodels and analysis details are now reported and commented on.