• No se han encontrado resultados

3. PROPUESTA DE SOLUCIÓN E IMPLEMENTACIÓN

3.2 ANÁLISIS DEL COMPORTAMIENTO DEL PORTAFOLIO

This section describes the architecture (or topology) and properties of some promi- nent Articial Neural Networks. Firstly, the Single Layer Perceptron is presented, as it is the simplest ANN architecture and because it is benecial to understanding the Multi-Layer Perceptron. The Multi-Layer Perceptronwhich was used in the current researchis discussed after the Single Layer Perceptron. Following the dis- cussion of these two architectures, some other architectures of interest are described.

The Single Layer Perceptron

The Single Layer Perceptron (SLP) is a feed forward network, which has the ability to learn to recognise simple patterns (Lippmann, 1987; Beale and Jackson, 1990; Haykin, 1999). It classies input data sets into one of two classes (such as class A or class B).

The architecture for the SLP consists of an input layer25, an output layer, and the

connecting weights (refer to Figure 2.12). This has a close analogy to the simplied nonlinear neuron model discussed in section 2.4.2.1. In fact, the SLP demonstrates a similar architecture and functionality to the simplied nonlinear neuron model, except that there may be multiple output layer nodes.

The simplied illustration of the SLP, in Figure 2.12, consists of only two output layer nodes; there may be many output layer neurons in a SLP, each of which will be connected to each of the input layer nodes by an associated connecting weight.

Inputs are supplied to the SLP (via input layer nodes); they and their associated weights are applied to the summation function in the output nodes. The summed value has a threshold value subtracted from it, and the result is applied to the acti- vation function (eg: a step-wise function). An example outcome could be, designate class A if the output y was +1 or class B if it was −1.

The SLP forms two regions separated by a hyperplane (called the decision bound- ary), such that inputs which classify to class A are located on one side of the linear boundary; class B outputs are located on the opposite side. The equation of the boundary line is dependent on the weights and the threshold value.

To demonstrate the summation and activation functions of the SLP output node, the following description is presented.

Let wi be the weight corresponding to input i, at time t, for (0 ≤ i ≤ n). Set

w0 =−θ, and x0 to always remain equal to 1. Provide inputsx1, x2, . . . , xn, and the

desired output (or target) d(t). Initialise all other wi(0) to small random values.

25By convention, the input nodes are not counted as a layer (even though they are presented as

Figure 2.12: The Single Layer Perceptron The output y(t) is calculated according to Equation 2.11:

y(t) =fh " n X i=0 xi(t)wi(t) # (2.11) where n is the number of input layer nodes, θ is the internal threshold or bias, andfh is the activation function (eg: step-wise function) used to produce an output.

In order to facilitate learning the SLP repeats Equation 2.11, adjusting all weight values after each repetition. This process continues until the network converges to the minimum error.

Adjusting the weights is accomplished according to Equation 2.12:

wi(t+ 1) =wi(t) +η[d(t)−y(t)]xi(t) (2.12)

whereηis a gain function to control the adaption rate, for(0≤η≤1). The gain

term controls the rate of weight change, ensuring the network modies the weights by a suitable magnitude26.

The target output y is designated according to Equation 2.13: d(t) =     

+1 if input from class A 0, if input from class B

(2.13)

A major benet of this architecture is that the network will resolve to the best possible output (that is, it will always nd the global minimum error). However, as demonstrated by Equation 2.13, this architecture is suitable only when there are two classes of possible outcomes (that is, it can only solve linearly separable problems of two outcome classes).

The Multi-layer Perceptron

The Multi-Layer Perceptron (MLP) is able to classify along n dimensional decision boundaries in the problem space, by utilising a non-linear threshold function and by incorporating extra layers of nodes in its conguration (Lippmann, 1987; Beale and Jackson, 1990; Haykin, 1999). These modications solves the limitation of the SLP (i.e. of only being able to solve two class problems) and allow the MLP to classify complex data (i.e. data demonstrating multiple classes in multi-dimensional problem space).

Like the SLP, the MLP has a feed forward operation. However, as illustrated in Figure 2.13 the architecture is quite dierent27. There are at least two layers of

nodes in a MLP28; the input layer, the output layer and one or more hidden layers

in between. Just as the output nodes in the SLP function as individual perceptron units (i.e. simple neuron models), so the hidden and output layer nodes of the MLP function as perceptron units. That is, nodes in all layers (excluding input nodes) accept input, apply the input to the summation and activation functions, and produce an output.

to oscillate between extreme weight values as the network trains toward minimum error.

27Note that unlike in Figure 2.12, Figure 2.13 does not include labels for the weight connections

between nodes. This was done to keep the illustration less noisy, and thus make the conguration dierences clearer. The weight labeling convention used in Figure 2.12 would be the same for Figure 2.13.

Figure 2.13: The Multi-Layer Perceptron

A notable dierence between the MLP and the SLP is that the activation function applied in the MLP reects the complex nature of the problem space. Because more than two classes are being classied, the step-wise function is no longer appropriate. As such, an exponential or logarithmic model is more appropriate; typically the most commonly applied function is the logistic sigmoid function. Therefore, node outputs are in the continuous domain rather than the binary domain.

As previously discussed, the number of input layer nodes is determined by the input pattern, and the number of output layer nodes must match the number of dierent classes in the problem space.

However, the number of middle layer nodes has no specic method of determi- nation. A rule of thumb is to assign half the number of input layer nodes as a temporary value for the number of middle layer nodes. Then increment or decre- ment the number of middle layer nodes and test the error at each adjustment, until the lowest error rate is attained.

Because the MLP exhibits a modied conguration to the SLP, and utilises a dierent activation function (i.e. sigmoid compared to step-wise), the learning rule of the SLP requires modication for the MLP (Beale and Jackson, 1990; Haykin, 1999). Input is presented to the network; comparison is made between the network output and the desired target; the error is determined and can then be used to update the weights (and produce successively more accurate output).

The type of activation function used in MLPs allows for continual reduction of error values by small increments, and thus the network output gradually approaches the desired target. This is achieved by using the `generalised delta rule' to calculate the error values for that input (at the output layer), and adjusting weights by back- propagating the error through the network to the previous layers. This functionality is responsible for the MLP being often termed the back-propagation neural network. Nodes in the hidden layers are adjusted in proportion to the error in the nodes (of the output layer) to which they are connected. So if an output node has a larger error value, the connected hidden layer nodes use a value proportionate to the output layer node (rather than the same error value). This allows the network to learn, as the method of error reduction facilitates correct adjustment of weights between the layers.

To demonstrate the operations of the MLP, the following description is pre- sented (Lippmann, 1987; Beale and Jackson, 1990). Provide input pattern Xp =

x0, x1, . . . , xn−1; the desired output (or target) Tp = t0, t1, . . . , tm−1; and wi the

weight corresponding to input i for (0 ≤ i ≤ n), where n is the number of input layer nodes and m is the number of output layer nodes.

Set w0 =−θ, andx0 to always remain equal to 1. Initialise all other wi to small

random values.

Let ypj be the actual output values for pattern p on node j, calculated for each

node for each layer according to Equation 2.14:

ypj =f "n−1 X i=0 wixi # (2.14) Note opj denotes the output layer values for pattern p on node j.

Adjusting the weights is accomplished according to Equation 2.15, starting from the output layer nodes (and progressively working backward through each layer of the network):

wij(t+ 1) =wij(t) +ηδpjopj (2.15)

where wij(t) represents the weights from nodei to nodej at time t, η is the gain

term, δpj is the error term calculated for pattern p on node j.

Equation 2.16 denes the error term for the output layer nodes:

δpj =kopj(1−opj)(tpj−opj) (2.16)

Equation 2.17 denes the error term for the hidden layer nodes: δpj =kopj(1−opj)

X

k

δpkwjk (2.17)

Note that the rst expression kopj(1−opj), in equations 2.16 and 2.17, is the

derivation of the sigmoid function29. Also, the latter expression (t

pj−opj), of equa-

tion 2.16, incorporates the desired output or targettpj, whereas the latter expression

P

kδpkwjk, of equation 2.17, incorporates the sum of error terms δpk. Here k desig-

nates all nodes in the layer preceding the layer where nodej is situated. Therefore, the sum of δpk is calculated from node j to all nodes k. Thus the error is back-

propagated through the network proportionately.

As discussed in the previous section, the error or energy surface can be deter- mined as a function of the input and weights. As the MLP has more layers of nodes than the SLP, the energy surface becomes more complex and can be populated with numerous local minima; but still only one global minimum.

In order to assist training, so that the minimum error is attained, there are two parameters that may be used in calculations when updating weight values. One, the gain termdiscussed previouslyis used to speed or slow progress toward a

29For the mathematical proof of this derivation, refer to (Beale and Jackson, 1990) Chapter 4

minimum (local or global) error rate. The other parameter is the `momentum' term, which is used as stimulation of the network during training; to jump out of local minima and eventually (hopefully) settle in the global minimum.

The summation function (equation 2.14) is applied to the inputs and weights (between the input and the rst hidden layer of nodes); the node outputs from each successive hidden layer are calculated, and these values become the inputs for the next layer of hidden nodes; this continues until the last layer of hidden nodes pass their outputs as inputs for the output layer nodes.

The error values (which are calculated to incorporate targets and the activa- tion function) are then propagated backward through the network and are used to successively update all weights (according to equations 2.15, 2.16, and 2.17). Once the weights between the input layer and the rst layer of hidden nodes have been updated, the feedforward process begins again with the network status altered by the newly updated weights (and possible adjustment to the gain and momentum terms). This process continues until the error rate is minimised.

One negative feature of the MLP is that during the training phase, the error rate will not always easilyif at allresolve to the global minimum. Quite often the error rate gets ensnared in one of many local minima on the error surface, and requires stimulation by the momentum and gain terms to escape it. This means that training a MLP requires quite a bit of trial and error testing, by manually manipulating the number of middle layer nodes and the momentum and gain terms, in order to reach the optimum eciency and accuracy.

The next section (2.4.2.2) discusses the architecture and properties of the Hop- eld Neural Network.

Hopeld Neural Network (HNN)

The Hopeld neural network is an auto-associative network. The auto-association of patterns means that presentation of corrupt or incomplete input will result in the reproduction (as output) of the original pattern (Lippmann, 1987; Beale and Jackson, 1990). The network thus works as a content addressable memory (CAM).

As illustrated in Figure 2.14, the architecture of the HNN consists of a number of nodes (visualised as one layer), each of which are connected to each other node (but not itself). Therefore, the HNN is a fully connected network, with binary inputs and outputs (that is, values of(0,1)or(−1,1)). Also, the network is symmetrically

weighted (that is, any weight wij =wji).

The dierence in architecture between the HNN and perceptrons means the HNN operates in a dierent way. The network is left to cycle through a succession of states until it converges on a stable solution; this occurs when node values no longer change. The nal network output is taken to be the value of all nodes in this nal stable state. Because of the fully inter-connective property, the value of one node aects the value of all nodes.

In their initial state each node represents dierent values (received as inputs), with each node trying to aect the others; thus the network is initially in an unstable state. During operation, some nodes may attempt to turn other nodes `on', while some other nodes may attempt to turn other nodes `o'.

As the network progresses through successive states, it works toward a state (by a system of `compromise') where all nodes settle into a stable state (representing the `best compromise'). At this point there are as many inputs attempting to turn a node on as there are inputs attempting to turn a node o.

Training involves one iteration per pattern presented to the network. Only the weights are adjusted by calculating the cross product of the input vector. Each successive input vector updates the weight matrix. The top-left to bottom-right diagonal values are set to 0.

Testing involves many iterations. Input is presented to all nodes simultaneously, and the network is left to stabilise. Updating of nodes occurs via weighted sum and a hard limiting step-wise function. Output of each node becomes the input fed back to the other nodes (but not itself). Outputs from the nodes in the stable state form the output of the network. So when presented with an input pattern, the network outputs a stored pattern nearest to that presented input pattern.

Figure 2.14: The Hopeld Neural Network

The Hopeld network has no learning algorithm as such. Patterns (or facts) are simply stored by setting weights to lower the network energy (or error).

The training stage occurs according to Equation 2.18:

wij =        M−1 X s=0 xsixsj i6=j 0 i=j,0≤i, j ≤M −1 (2.18)

where wij is the connection weight between node i and node j, xsi is the ith

element of the exemplar pattern for class s, M is the number of pattern classes. Note that for each weight, the product of the input i and input j is added to the existing weight. Therefore, the result of the training stage is the association of a pattern with itself.

Initialisation for the testing stage occurs according to Equation 2.19:

µi(0) =xi 0≤i≤N −1 (2.19)

Nodes are updated according to Equation 2.20:

xi =xiwijxj (2.20)

where input xi represents the node being updated, inputxj represents the input

into that node, and wij is the weight connection.

The network is allowed to iterate freely (in discrete time steps) until it converges. Note that the output of the network is forced to match that of the imposed unknown pattern.

Convergence during the testing stage occurs according to Equation 2.21: µi(t+ 1) =fh "N−1 X i=0 wijµi(t) # 0≤j ≤N−1 (2.21)

where fh is the step-wise function.

If the input i > 0, its output is 1; if the input i < 0, its output is -1 (or 0);

otherwise, the input value is left as it is. These values are then fed back into the network as input into the other network nodes.

The advantage of the HNN, as an auto-associative network, is that data can be retrieved (via its CAM functionality) even when incomplete or corrupt information is presented to it.

However, a limited number of patterns can be stored and recalled in a HNN; this has an impact on its applicability for many pattern recognition tasks (Beale and Jackson, 1990). Also as mentioned previously, the HNN has no learning algorithm as such. As the current experiment required an ANN to possess the ability to learn patterns in data, the HNN was not chosen.

The next section (2.4.2.2) discusses the architecture and properties of the Self- Organised Map.

Self-Organised Map (SOM)

The type of learning utilised in a multi-layer perceptron requires the correct response (target) to be provided during training (Lippmann, 1987; Beale and Jackson, 1990; Haykin, 1999). This approach is known as supervised learning. Though biological

systems display this type of learning, they are also capable of learning by themselves. Learning without the assistance of targets is known as unsupervised learning.

A system with the capability to learn unsupervised, requires self-organisation. During training, such a system learns appropriate associations without any targets (or prior knowledge) being provided.

An ANN model of this type is the Self-Organising Map (SOM), also known as the Kohonen Network (after it founder Dr. Teuvo Kohonen). The SOM is a competitive neural network; such networks represent a type of ANN model where nodes in the output layer compete with each other to determine a `winner'. The winner indicates which prototype pattern is most representative of the input pattern.

As illustrated in Figure 2.15, the SOM has only one layer of nodes (the com- petitive layer or sometimes referred to as the Kohonen feature map). This layer is two dimensional, with lateral interconnections forming a grid like topology. Note that the architecture (with only one layer of nodes) is dierent to the hierarchical structure of layers in perceptrons.

All inputs are connected to every node in the competitive layer. There is no des- ignated output layer; each node in the competitive layer is an output node. As there is only the one layer of nodes, error cannot be fed backwards through the network. Instead, feedback is facilitated via the lateral interconnections of neighbouring nodes in the competitive layer.

When presented with the training input data, the learning algorithm organises the competitive layer nodes into local neighbourhoods that act as feature classiers. The topology of nodes is congured by the cyclic process of comparing input patterns

Documento similar