3. PROPUESTA DE SOLUCIÓN E IMPLEMENTACIÓN
3.1 ATRACTIVIDAD DE LA INDUSTRIA DE FABRICACIÓN DE
An ANN attempts to imitate the computational functionality and memory capacity of the human brain (Beale and Jackson, 1990). However, an ANN is not biological. An ANN is a massively parallel distributed processor made up of simple processing units, which have a propensity for storing experiential knowledge and making it available for use (Aleksander and Morton, 1990).
Thus an ANN imitates the brain's capacity to (Aleksander and Morton, 1990): 1. Acquire knowledge from its environment through a learning process.
2. Store the acquired knowledge in synaptic weights, via inter-neuron connection strengths.
The human brain contains approximately 1010 (i.e. ten thousand million) basic
analogue processing units, called neurons (Beale and Jackson, 1990). Each neuron is connected to about 104 (i.e. ten thousand) other neurons. Figure 2.9 illustrates
the components of a biological neuron.
Figure 2.9: Components of a Biological Neuron The image was sourced from Jian-Kang, 1994.
The basic operation of a neuron is to accept many inputs (in the form of nerve impulses) from dendrites, which are laments attached to the cell body; these nerve impulses are received from other neurons (Beale and Jackson, 1990). The neuron accepts all inputs, and if the accumulated signal (called the `resting potential') surpasses a critical threshold, the neuron activates; otherwise it remains inactive.
The axona component of the cell body that serves as the output channelacts as a non-linear threshold device. That is, if a neuron activates (or res), the axon produces a series of rapid pulses called the `action potentials'. These pulses are propelled along the axon, which terminates at the junction of dendrites from other neurons; such a junction is called a synaptic junction.
There is no actual connection at this junction, rather a chemical reaction when the synapse's potential is raised suciently by the action potential received via the axon. Neurotransmitters released by the synapse allow for the ow of ions across the junction, which alter the dendritic potential and results in a nerve impulse which is conducted along the dendrite to its cell body. Each dendrite may have many synapses acting on it, which facilitates massive inter-connectivity.
Synaptic junctions alter the eectiveness of the transmitted signal. Thus a synapse may `excite' or `inhibit' a dendrite. If a dendrite is excited by a synapse, a large nerve impulse traverses the junction and is conveyed by the dendrite to the cell body; if a dendrite is inhibited, a small nerve impulse passes to the cell body. The release of more neurotransmitters increases the coupling at the synaptic junction, which increases the connection strength.
These modications to the normal connection strength are thought to facilitate learning. Because the neuron receives all inputs, and the signal strength aects the accumulated resting potential, the ring of a neuron (i.e. exceeding the threshold value) is inuenced by the strength of incoming signals.
The parallel nature of the brain's functionality means that the processing work- load is distributed across many neurons (Lippmann, 1987). If one neuron malfunc- tions, its aect is unlikely to have a signicant impact on the overall result. That is, the biological brain is generally fault tolerant.
ANNs attempt to model the operations of the brain, and consist of (Lippmann, 1987; Beale and Jackson, 1990; Haykin, 1999):
1. Simple processing units or elements called nodes (because of their correspon- dence to the biological neuron, they are often referred to as neurons).
2. An inter-connection topology in the form of weights. 3. A learning scheme.
Thus, ANNs do not attempt to represent each component of the biological neu- ron, only the functionality of the biological neuron. As the interaction between biological neurons occur at the synaptic junctions, it is the operations of the synap- tic junction that ANNs attempt to model.
The model of a simple neuron consists of three basic components (Haykin, 1999): 1. A set of weights expressing the connection strength between the input signals to the neuron; this is analogous to the synapses of a biological neuron. For an input signal xi connected to a neuron k, i is the connecting synapse and the
weights are denoted wki16.
2. A summation function which accumulates the input signals, by multiplying each input signal by the weight connecting it to the neuron.
3. An activation function for limiting the amplitude of the neuron's output. For two class problems, binary output is typically 0 or 1. For multiple class prob- lems, the output range is typically in the closed continuous interval [0,1] or [−1,1].
Figure 2.10 illustrates the components of a simplied non-linear model of a neu- ron k. Here, the value for each input node is multiplied by its associated weight17.
16Note that this common naming convention has the variable designating the neuron listed prior
to the variable designating the connecting weights.
17Note that because weights are analogous to the signal strength received by dendrites from
synapses in the biological model neuron, larger weight values correspond to excitatory synapses (which transmit larger pulse signals across the synaptic junction); smaller weight values correspond to inhibitory synapses.
The sum of this accumulated resting potentialplus a bias component18 b
kis
then applied to an internal threshold in the activation function. When the resting potential exceeds the threshold, the neuron activates; otherwise, it remains dormant.
Figure 2.10: Simple Non-linear Model of a Neuron The image was sourced from Haykin, 1999.
Equations 2.8 and 2.9 describe the mathematical operations of the simplied non-linear neuron model illustrated in Figure 2.10:
υk = n X i=1 wkixi+bk (2.8) yk =ϕ(υk) (2.9)
where x1, x2, . . . , xn are the input signals, wk1, wk2, . . . , wkn are the synaptic
weights of neuron k, bk is the bias, υk is the induced activation potential, ϕ is
the activation function, andyk is the output signal of neuronk.
Alternatively, the bias can be incorporated into Equation 2.8 by initialisingx0 =
1 (and to always retain this value), wk0 =bk and then dening the mathematically
equivalent Equation 2.10 (Beale and Jackson, 1990; Haykin, 1999):
18The bias eectively adds an oset to the accumulated resting potential, and is intended to
υk= n
X
i=0
wkixi (2.10)
Note that because the process ow of the model neuron is one way, presenting the inputs and producing the output, it is called a feedforward system.
The activation functionϕ(υk)determines the eventual output of neuronk, and is
dependent on the summation function output υk. Numerous activation or threshold
functions may be applied in determining the eventual output. Thus the choice of activation function plays an important role.
Figure 2.11 provides three examples of typical threshold functions that are ap- plied in the activation function (Lippmann, 1987). These are: the step-wise function (a), the piece-wise linear function (b), and the sigmoid function (c). More complex nodes may include temporal integration and other types of time dependencies19.
Figure 2.11: Illustrations of Step-wise (a), Piece-wise (b) And Sigmoid (c) Functions The Learning Process
Human beings (particularly when young) very often learn from positive or negative reinforcement (Beale and Jackson, 1990). That is, a positive outcome results from `good' behaviour, while a negative outcome results from `bad' behaviour. Of course, the interpretation of what is positive and good or negative and bad is subjective to the situation under observation, but in general this process of reinforcement is often helpful in the learning process.
19It should be noted, that the step-wise function (Figure 2.11a) is the usual threshold function
applied to the simple model of a neuron discussed thus far because the model produces binary output (i.e. 0 or 1).
As with human beings, the simple model neuron can be taught to `learn' from its mistakes. That is, to reduce the chance of an incorrect or unwanted outcome from occurring.
To demonstrate, assume two classes A and B. The learning process requires the following steps (Beale and Jackson, 1990):
1. Assign random values to all weights from the input nodes to the output node. This corresponds to the state of the neuron knowing nothing.
2. Present an instance of class A input.
3. Perform the actions assigned to the summation and activation functions. If the resting potential exceeds the internal threshold, output 1; otherwise output 0. 4. For inputs of class A, assuming an output of 1 (the correct answer), do nothing; assuming an output of 0 (the incorrect answer), increase the resting potential (by increasing the weight values) so that the threshold is exceeded and the correct output is produced.
5. For inputs of class B, the neuron should be expected to produce an output of 0. When an instance of class B is input, decrease the weight values to keep the resting potential below the threshold.
By adjusting the weight values according to the steps 4 and 5, the neuron learns to recognise instances of class A input, and that instances of class B input are not instances of class A input. So, the ability to learn is directly attributable to the storage and adjustment of weight values.
Thus the following rule20can be dened to facilitate learning, by adjusting weight
values (Beale and Jackson, 1990):
1. Increase the weights (on active inputs), when active output (i.e. the value 1) is required. This can be achieved by adding the inputs to the existing weight values.
2. Decrease the weights (on active inputs), when inactive output (i.e. the value 0) is required. This can be achieved by subtracting the inputs from the existing weight values.
This rule presupposes knowledge of the correct class. That is, knowing that the input is an instance of class A, and that class A is the intended class. As the rule guides learning using this knowledge, it is known as `supervised learning'.
When adopting a supervised learning approach, instances of the correct class are presented to a neuron along with their expected output (i.e. the value 1) (Lippmann, 1987). Also, instances of the incorrect class are presented to the neuron along with their expected output (i.e. either 0 or -1). The expected output is commonly referred to as the `target'.
By presenting a neuron with instances of both classes and their respective target outputs, the neuron learns to correctly classify the intended class. The presen- tation of incorrect class instances helps the neuron to learnby adjusting weights accordinglynot to attribute correct classication to incorrect class instances. That is, the neuron learns what to classify correctly according to examples of what is cor- rect, and what is not correct (positive and negative reinforcement).
In 1962 Frank Rosenblatt coined the term `perceptron' to describe a feedforward ANN composed of one or more simple output neurons that function as discussed thus far (Beale and Jackson, 1990). This perceptron is the simplest kind of ANN, and can be considered a binary classier, because each neuron outputs either a 1 or a 0; it is also often referred to as a linear classier, because it classies two classes according to the hyperplane that separates the two classes (called the decision boundary).
A perceptron may consist of a single layer; that is, one or more simple output neurons or nodes making up the output layer21. A perceptron may also consist of
multiple layers; that is, one or more simple output nodes making up the output layer, with one or more intermediate layers (consisting of one or more nodes) between the input neurons and the output layer. In such a case, all nodes from all layers have full inter-connectivity.
ANN Training
In the training process, inputs are presented to a processing element (node) which apply a summation function to inter-connections (weights) that act upon the inputs. As inter-connections can be excitatory or inhibitory, the weights can be of larger or smaller magnitude (which add to or subtract from the accumulated value). When a specic threshold is reached, the node `res' according to an activation function, which produces the resultant output.
The result of an activation function can be in the binary domain (that is, dis- crete values(0,1)22) or the continuous domain (that is, over the continuous interval [0,1]23). Discrete value results are typically classied using the sum-and-threshold
model (for example, a step-wise functionrefer Figure 2.11aor a piece-wise linear functionrefer Figure 2.11b) (Bishop, 1995). This seems a natural choice because the optimal discriminant function needs to distinguish two classes. Continuous value results are typically classied using an exponential or logarithmic model (for exam- ple, the logistic sigmoid functionrefer Figure 2.11c).
The output of the activation function is compared with the expected or target output. The dierence between the activation function output and the target is calculated; this dierence is commonly referred to as the error. The error is added to each input and the corresponding weights are adjusted (refer Equation 2.12). This training process continues until error is minimised. The weights are then stored and used in the testing process.
The number of dierent outcome classes denes the problem space. If there are two possible outcome classes, there exists a linearly separable problem space (Masters, 1993). That is, the outcome can only be one class or the other. The boundary between outcome classes may be linear or non-linear. This boundary is called the decision boundary, and distinguishes between classes in the problem space. It also determines the number of output layer nodes the network requires24.
22Note, the discrete values(−1,1)are sometimes used. 23Note, the continuous interval[−1,1]is also commonly used.
The behaviour of a neural network, as it attempts to arrive at a solution, can be thought of in terms of the error or energy function. The energy is a function of the inputs and the weights. For a given input pattern, the energy function can be plotted against the weights to determine the energy surface in a three dimensional space. This can be envisaged as a landscape of hills and valleys, with points of minimum energy (known as wells) and maximum energy (known as peaks).
If the problem space has more than two possible outcome classes, the error can be minimised by adjusting weights so they correspond to points of lowest energy (that is, minimised error). The wells in the error surface may be many, but there will be one that is deeper than any other. This is the global minimum, and corresponds to the lowest possible error that can be attained for that input pattern. The other wells are local minima.
The objective in training the network is for it to reach the global minimum error; this means that it has trained to most accurately classify that class of pattern. The weights of the network in this state (that is, when the training objective has been achieved) are stored for the testing procedure.
ANN Testing
The testing procedure involves constructing an ANN of the same conguration as that used during training, and loading the stored weights (resulting from the training process) into this ANN. Test input patterns are then presented to the ANN. It is preferable (so that the trained ANN can generalise) that the input patterns presented for testing have not been used in the training process (that is, dierent samples of the same class of input patterns). For the testing procedure, no target outcomes are provided (Lippmann, 1987).
The summation and activation functions are applied (as previously explained) and the outputs are produced by the ANN. The type of output is dependent on the number of decision boundaries in the problem space, which determines the type of activation function used and therefore the nature of the output data (that is, binary or continuous).
It is important to remember that the ANN output resulting from the testing procedure may or may not be correct. That is, ANNs can, and do, attribute cor- rect and incorrect class membership. Output in the continuous domain is typically subject to a nal classication scheme, where a decision threshold is applied (refer Chapter 6 section 6.2).
If the nature of the training and testing data exhibits consistency and the ANN has been well trained, classication should be accurate. However, if the nature of the training and testing data exhibits variability or the ANN has not been well trained, the classication may not be as accurate.
For the current study, ANNs were used to classify query inputs as belonging to (or not) the training group members' classes, according to the training group members' registered templates. The registered template for a training group member consisted of the weights of the ANN that had been trained to recognise the pattern of their training data. During testing, query data sets were applied to the ANN (using the stored weights for registered templates) to determine correct classication.
ANNs are generally eective in solving classication and pattern recognition problems (Beale and Jackson, 1990), and as shown in the next section, there are a number of architectural models designed for these types of problems. The archi- tectural model used in the current research was the Multi-Layer Perceptron (with error back propagation), because it was well suited to the complexity of the pat- tern recognition task of this experiment. An explanation of the operations of the Multi-Layer Perceptron (MLP) is provided in the next section.