• No se han encontrado resultados

APOPTOSIS Y CARCINOGENESIS

PROTEINAS INHIBIDORAS DE LA APOPTOSIS Y CANCER

Generalisation is the ability of the trained network to make predictions on data it has not seen before. Generalisation is influenced by three factors (Haykin, 1994):

i. The size and accuracy of the training data. ii. The architecture of the network.

iii. The nature of the problem and the complexity of the function the network is to learn.

In their review, Hush and Horne (1993) regard generalisation in the context of the (more controllable) first two factors from two perspectives:

a. Fixed architecture: W hat is the size of the training set that would achieve good generalisation? (Baum and Haussier, 1989; Smith, 1993)

b. Fixed training set: W hat is the suitable architecture to fit the given data and result in good generalisation (Hinton, 1989; W eigend et al 1991; Sietsma and Dow, 1991)

Chapter 4: Progress in MLR training 78

Although Haykin (1994) would argue that the first of these viewpoints is the one most commonly encountered in practice^, our experience with MRS data analysis is, in fact, the opposite. The choice of a data set is often limited. In practice, acquiring large amounts of data may be expensive and/or lengthy as is the case with tumour spectra (whether in vivo or in vitro). The question becomes: what is the best network to fit the available data, without overfitting, and give good generalisation?

Good generalisation implies that the objective of network training is not to learn a representation of the training data, but rather to model the underlying process which generated this data. This is often depicted as a form of estimating some statistical model parameters (Bishop, 1995a) or a curve fitting in a multi-dimensional space (Haykin, 1994). In both cases new predictions could be seen as non-linear interpolations of the input data and generalisation as a measure of the smoothness of the input-output mapping as encoded by the network.

4.5.1 Estimation o f output error (confidence measures)

It is pointless to talk of the generalisation capabilities if we cannot obtain an objective assessment of the quality of network predictions. Since the outputs of a trained network are interpolations of the function encoded in the network connections, it is desirable to know how well the network can approximate this function and what the chances are of making an error. Barron (1993) tackles this issue by studying the approximation properties of MLPs with one hidden layer. Barron provided an estimate of the total risk (effectively the mean squared error between true function values and the network predictions). Total risk measures the average accuracy of the network estimates in terms of the complexity of the learnt function, the number of hidden nodes and the training data (both the number of variables and the number of samples). The bound on total risk provided by Barron is a balance between the accuracy of the data fitting (which favours larger networks) and the accuracy of the interpolation on test data (which favours simpler networks), which in a way is similar to complexity régularisation approaches adopted by Vapnik (e.g. Vapnik, 1992). However, Barron’s formula, along with the work of Vapnik, provide only average measures of error of the model as a whole. They do not provide performance estimates for individual new patterns.

Cross validation is a standard statistical tool which iteratively uses different partitions of the available to data to estimate the model parameters and evaluate its performance. The different partitioning of the data allows validation of various models with more efficient use of the data. Although cross validation can help choosing the model, and adjusting the model parameters, it, like Barron’s error estimates, does not provide performance assessment on new data.

" Haykin’s view is probably true in telecommunications problems where large amount data could be collected easily and cheaply.

Another approach to assign confidence measures for predictions made on new data to use ensemble training (Krogh and Vedeisby, 1995). As with cross validation, the network is trained on an ensemble of models using different training and testing combinations. Predictions are made by averaging the network output across all models. This has the effect of smoothing out the effects of overfitting since the averaging is effectively performed in function space rather than the parameter space.

In Chapter 8 , we introduce a practical confidence measure for classification based on the assumption that the network outputs are approximations of class membership probabilities.

4.5.2 Complexity régularisation

The key to good generalisation is to keep the network structure simple (with fewer number of parameters to determine) to prevent overfitting (Poggio and G irosi, 1990). Complexity régularisation adds a penalty term to the error function which penalises complex solutions (Reed, 1993; Bishop, 1995a). W eight decay (Hinton, 1989), weight elimination (Weigend et ai, 1991), weight sharing (Nowlan and Hinton, 1992) and pruning (Karnin, 1990, LeCun et ai 1990; Mozer and Smolensky, 1989) are forms of complexity control which either favour small values weights or delete unnecessary weights or nodes altogether.

Another form of régularisation is obtained by injecting the input patterns of the training data with random noise. Matsuoka (1992) and Bishop (1995b) have proved that this form of training, which was shown to improve the training speed and the generalisation performance, is equivalent to adding a penalty term to the error function. This is an attractive form of régularisation since it does not involve any change in the objective function and hence is simpler to implement. We use this approach in chapters 8 and 9 to generate more patterns from the data described in chapter 6 .

In chapter 3, we argued that in order to utilise the potential of MRS it might be beneficial to use all the information available in the spectra. The large dimensionality of the MRS, however, will result in over-complex networks which will overfit the data and generalise poorly. Reduction of dimensionality becomes an important issue for achieving accurate and robust MR S/pattern recognition applications. The reduction of dimensionality in MRS is often subjective and is usually influenced by the presence of few a large peaks. In this thesis we argue that by developing and using complexity régularisation tools, an objective and automatic reduction of dimensionality can be achieved. This reduction has the added benefit of highlighting the important regions of the spectra in biochemical terms. The next chapter together with chapter 1 0 are dedicated to exploring approaches for utilising network structure optimisation for the selection of relevant inputs.

C hapter 4: Progress in MLR training 80

Documento similar