Ámbito 5 Técnicas artesanales tradicionales
21. IMPACTOS
Artificial Neural Networks are highly scalable learning methods which attempt to repli- cate the biological structures present in the human brain (Rosenblatt, 1958). This scalability allows them to produce state-of-the-art performance on many modern tasks at the cost of requiring substantial amounts of training data and computational re- sources. Neural Network algorithms use layers of neurons, a nexus of connections from a previous layer where the outputs of the previous layer are weighted, summed and then have a non-linear activation function applied to them to introduce non-linearity in the system (Vora and Yagnik,2014;Che et al.,2011). The networks assume that the vari- ables are normally distributed and therefore the data should be rescaled prior to use. Transforming some of the variables using logarithms may also be recommended. Artificial Neural Networks combine multiple neurons into a connective learning net- work which can adopt multiple topologies. The two most common topologies are the Feedforward Neural Network and the Recurrent Neural Network (Fine,1999;Jain and Medsker, 1999). Feedforward Neural Networks propagate information throughout the layers in one direction, from the early hidden layers to the later hidden layers until it reaches the output layer. Recurrent Neural Networks also implement links which allow information from later layers to be transmitted to earlier layers or even the same layer. These recurrent links make these networks much more difficult to train but allow them to store information in a sort of neural memory. For this reason they are attractive to data containing a temporal structure. Recurrent Neural Networks have seen recent successes in the field of variable star classification (Naul et al.,2018).
Feedforward Neural Networks contain an input layer, composed of the number of input features plus a bias unit, i.e. the input layer has N + 1 neurons where N is the number of features. There are then one or more hidden layers, sets of neurons which learn representations of the input features which map well to the output task. Finally, the output layer is either a single linear neuron for regression tasks or, for classification, a set of m neurons where m is the number of classes. This classification output layer can either feed into another classification method such as the Support Vector Machine or alternatively can be implemented using the softmax function, a multinomial logistic regression classification function. They can assume a number of different topologies based on the number of neurons present in the hidden layers and the number of hidden
Machine Learning 140
Figure 4.5: A single hidden layer Artificial Feedforward Neural Network. The inputs from the previous layer are weighted by learned parameters (Θjmatrix for the jthlayer)
and summed. This final summation has a non-linear activation function applied to it and this result is sent to the next layer.
layers. Figure 4.5 demonstrates the topology of a simple Feedforward Neural Network with three input features and three hidden layer neurons.
The Feedforward Artificial neural network can be defined using a set of activation func- tion neurons a(j)i , the activation of unit i in layer j. These activations are weighted by a matrix of weights Θ(j) controlling the mapping from layer j to j + 1. Equation4.12 and equation 4.13demonstrate the activation of the hidden layer neurons by the inputs.
a(2) = g Θ(1) > X
for the first hidden layer (4.12)
a(j)= g Θ(j−1) > a(j−1)
for the remaining layers (4.13)
where a(j) is the output of layer j, Θ(j−1) is the weight matrix of the layer j − 1, X = x1, x2, . . . , xn is the input feature vector and g(x) is the non-linear activation
function of the network. There are a number of popular activation functions including the sigmoid function, shown in equation4.14, the tanh function g(x) = tanh(x) and the rectified linear unit (relu) function g(x) = max(0, x).
g(x) = 1
1 + exp(−x) (4.14)
This forward pass of information is named Forward Propagation and it allows the in- puts to be mapped to the outputs of the Neural Network. However, this method does
not allow the training of the weight matrices Θ(j) based on a training dataset. The Back-Propagation algorithm was developed to accomplish this task through an iterative foreward then backward pass structure. The forward pass would compute the outputs of a set of input training data X. The output layer is then compared to the desired output through the computation of a cost or error function. The errors are then determined as a function of the neurons in the network and corrected through the calculation of an ‘error derivative’ for the output layer neurons. This correction then propagates back- wards through the network until the weight matrices of each layer have been adjusted. A forward pass is then computed to determine the change in error from this corrective backward pass and this process is repeated iteratively until the output errors on the training set have been minimised.
Consider δj(l)as the error of neuron j in layer l. We can compute the error of the output layer as δ(o) = a(o)− y where a(o) is the output layer after a forward pass, y is the
objective ground output and δ(o) is the error in the output layer after a foreward pass. Equation4.15defines the computation of these derivatives as they propagate backwards through the network.
δ(j)=Θ(j)
>
δ(j+1) g0 Θj−1>a(j−1) (4.15)
where δ(j)is the error vector for the jth layer, Θ(j)is the weight matrix of the jth layer, a(j) is the activation function of the jth layer and g0(x) is the first derivative of the activation function g(x). represents the elementwise multiplication operation. In this backpropagation method there is no δ(1). Backpropagation is not the only way to train a Neural Network and the Levenberg-Marquadt method has also been used to produce trained models (Basterrech et al.,2011). Genetic algorithms are also a potential training method due to their ability to locate global minima in complex functions (Che et al.,
2011).