• No se han encontrado resultados

Regla 21 – Otras Modalidades de Juego por Golpes y Match Play Individuales

21.3 Par/Bogey

4.4.2 Pattern Mapping Network

Earlier we have seen that a multilayer feedforward neural network with at least two intermediate layers in addition to the input and output layers can perform any pattern classification task. Such a network can also perform a pattern mapping task. The additional layers are called hidden layers, and the number of units in the hidden layers depends on the nature of the mapping problem. For any

Analysis of Pattern Mapping Networks 115

arbitrary problem, generalization may be difficult. With a sufficiently large size of the network, it is possible to (virtually) store all the input-output pattern pairs given in the training set. Then the network will not be performing the desired mapping, because it will not be capturing the implied functional relationship between the given input-output pattern pairs.

Except in the input layer, the units in the other layers must be nonlinear in order to provide generalization capability for the network. In fact it can be shown that, if all the units are linear, then a network can be reduced to an equivalent two-layer network with a set of N x M weights.

Let and be the weight matrices of appropriate sizes between the input layer and the first hidden layer, the first hidden layer and the second hidden layer, and the second hidden layer and the output layer, respectively. Then if all the units are linear, the output and input patterns are related by the weight matrix containing N x M weight elements. That is,

(4.76)

As can be seen easily, such a network reduces to a linear associative network. But if the units in the output layer are nonlinear, then the network is limited by the linear separability constraint on the function relating the input-output pattern pairs. If the units in the hidden layers and in the output layer are nonlinear, then the number of unknown weights depend on the number of units in the hidden layers, besides the number of units in the input and output layers. The pattern mapping problem involves determining these weights, given a training set consisting of input-output pattern pairs. We need a systematic way of updating these weights when each input-output pattern pair is presented to the network. In order to do this updating of weights in a supervisory mode, it is necessary to know the desired output for each unit in the hidden and output layers. Once the desired output is known, the error, the difference between the desired and actual outputs from each unit may be used to guide the updating of the weights leading to the unit from the units in the previous layer. We know the desired output only for the units in the final output layer, and not for the units in the hidden layers. Therefore a straightforward application of a learning rule, that depends on the difference between the desired and the actual outputs, is not feasible in this case. The problem of updating the weights in this case is called a hard learning problem.

The hard learning problem is solved by using a differentiable nonlinear output function for each unit in the hidden and output layers. The corresponding learning law is based on propagating the error from the output layer to the hidden layers for updating the weights. This is an error correcting learning law, also called the

116 Feedforward Neural Networks

generalized delta rule. It is based on the of gradient descent along the error surface.

Appendix C gives the background information needed for under- standing the gradient descent methods. Table 4.5 gives a summary of the gradient search methods discussed in the Appendix C. In the following section we derive the generalized delta rule applicable for a feedforward network with nonlinear units.

Table 4 5 Summary of Basic Gradient Search Methods Objective

Determine the optimal set of weights for which the expected error between the desired and actual outputs is minimum.

For a linear network the error surface is a quadratic function of the weights

The vector is given by

= w where V =

2

and R is the autocorrelation matrix of the input data. 2. Gradient Search Methods

We can write the equation for adjustment of weights as

If R and are known exactly, then the above adjustment in one step starting from any initial weights

If R and are known only approximately, then the optimum weight vector can be obtained in an iterative manner by writing

where < for convergence. This is Newton's method. The error moves approximately along the path from to Here is a dimensionless quantity.

If the weights are adjusted in the direction of the negative gradient a t each step, it becomes method of steepest descent.

+ = + -

where for convergence and i s the largest

eigenvalue of R. The learning rate parameter has the dimensions of inverse of signal power. Here convergence is slower than in the Newton's method.

In general, the gradient cannot be computed, but can only be estimated. Hence convergence of the gradient descent methods is not guaranteed. The estimate depends on our knowledge of the error surface.

Analysis of Pattern Mapping Networks 117

Table