• No se han encontrado resultados

Relación sujeto-objeto en las ideas de Demócrito y Epicuro

CAPÍTULO IV: ANÁLISIS DE RESULTADOS

4.1 Análisis de la tesis doctoral “Diferencia entre la filosofía democriteana y

4.1.3 Relación sujeto-objeto en las ideas de Demócrito y Epicuro

Model selection is the choice of a final model from a set of competing models. This is subtly different from structure detection in which the terms that constitute a model are chosen. Structure detection algorithms often provide a number of different competing models from which to choose. This might be the case from, for example, varying the ERR threshold in the FRO algorithm. The model selection task would then be to select the best model from those generated by varying the threshold. The objective of the system identification task as a whole is to identify a parsimonious description of a system that avoids over fitting to the data. This indicates that the criteria for assessing the model quality should somehow penalise complexity while promoting a good model fit. In Section 3.8.3 it will be shown how, by taking a Bayesian perspective, the penalisation of model complexity is naturally incorporated and hence problem of over fitting can be avoided. For now the trade-off between model complexity and accuracy will be discussed in another way, as the bias and variance trade-off.

3.7.1 The bias-variance trade-off.

Different sources of error in the modelling process lead to both bias and variance in the model predictions. The error due to bias is the difference between the expected model prediction and the true value. The error due to variance is the variability in the model prediction. In a perfect scenario both the bias and variance should be

minimised to zero, however, given a finite data limit it can be shown that the best choice of model will involve a trade-off between the two as reducing one leads to an increase in the other.

To define the concept mathematically consider the general system given by Equation (3.4) where the function ˆf(x) is approximating the true function f(x).

The expected value of the squared error is given by

E[(y− ˆf(x))2] (3.57)

which can be decomposed into

E[f(x) ˆf(x)] +Var[ˆf(x)] +Var[y] (3.58)

Bias2+Variance+Irreducible Error (3.59) see appendix A.3 for derivation.

The squared error has been decomposed into three terms; a squared bias term, a variance term and an irreducible error term. This third term represents the noise in the true system which cannot be reduced. The task of minimising the squared error is therefore the task of minimising both the squared bias and the variance.

In the context of the basis function regression model with parameters esti- mated by least squares, the bias variance trade-off can be directly linked to the complexity of the model through the number of parameters. By increasing the number of model terms the flexibility of the model increases resulting in a better fit to the training data and hence a low bias. However this can result in over fit- ting (fitting to the noise) which increases the variance. Decreasing the amount of model terms leads to a decreased fit and hence a higher bias but may produce a smaller variance. The quality of the model as measured over a validation set will be poor in both the overly simplistic and overly complex case. The overly simple model cannot capture the true system behaviour and the overly-complex model is fitting to a random noise sequence. It is natural to conclude that the choice of the best model structure is at a trade-off between the bias and the variance which can be controlled through the final choice of model model structure.

Introducing a regularisation term into the estimator, such as in Section 3.4.1, represents another way to control the model complexity. Consider fitting the pa- rameters of fixed model structure to different training data sets: Highly regu- larised parameters (a large value of regularisation constant) lead to a model with consistent outputs (low variance) over the data sets but a poor fit to the true sys- tem (high bias). A small regularisation constant leads to a model whose outputs provide a good fit to the training data and so perform well on average (low bias)

but are inconsistent (high variance) because of over fitting to the data. The bias- variance trade-off is therefore affected by both the number of terms in the model and by way of regularisation.

3.7.2 Metrics for model selection.

A number of metrics are available for comparing different models in order to se- lect a final model structure. From the previous section it is clear that any measure of model quality must include some trade between model fit and model com- plexity. Metrics for performing this task are called information criterion. Popular information criteria are;

Akaike’s Information Criterion (AIC) [5],

AIC=2M−2ln(L) (3.60)

Bayesian (Schwarz) Information Criterion (BIC) [110]

BIC= Mln(N)−2ln(L) (3.61)

and Final Prediction Error (FPE) [4]

FPE=LN+M

N−M (3.62)

where L is the maximised value of the likelihood function, M is the number of model terms and N is the number of data points over which the model was esti- mated. The first two criteria are both similar in form, with the first term penalising the number of model terms and the second term decreasing with model complex- ity. The model with the smallest value is selected as the best.