• No se han encontrado resultados

Ciclo de Sub-Proyecto y Actividades Ambientales Y Sociales, incluidos Aspectos de

2.3.1

Basics

Back-propagation networks are multi-layered networks that use back-propagation learn- ing. Back-propagation learning is a supervised learning that is an extension of the Widrow- Hoff delta ruleto multiple layers proposed in 1986 [26]. The name of this learning mech- anism stems from the way that the error of a network under training propagates backwards to adjust the connections’ weights. The networks built around this learning technique are very robust, error tolerant, and adaptable.

The power of back-propagation neural networks is rooted in their structure and learning mechanism. The basic structure is shown in Figure 1.4. It consists of as few layers as three or as many layers as a computer can handle. The same can be said about neurons. The basic neuron of the network is similar to a perceptron and is shown in Figure 1.3. The typical structure of a back-propagation neural network consists of three layers, an input layer, at least one hidden layer, and an output layer.

Figure 1.4 shows a template of neural networks. The input layer contains the input nodes that represent and take in the values of the variables of a problem domain. Each input node connects to all the neurons in the next layer via weighted connections. Input nodes are the representation of an environment in which the network is trying to learn. The

0.75 1 0.5 0.25 0 −0.25 −0.5 −0.75 −1 0 1 0.75 0.25 0.5

Figure 2.1: The sigmoid function.

neurons of the hidden layer(s) are named hidden neurons. A neural network can have many hidden layers with various numbers of hidden neurons. The number of hidden layers and neurons is determined using many techniques as will be discussed in Section 2.6. Each neuron in a hidden layer is connected to all neurons in the proceeding layer via weighted connections.

The final layer in a back-propagation neural network is named the output layer. This layer contains a number of output neurons. These neurons determine the class for each input instance. The output neurons, as well as the hidden neurons, are more intricate than the input nodes, as demonstrated in Figure 1.3. They are computational units which sum up the weighted inputs and project them into some thresholding function. The value of the thresholding function gets weighted and propagated to the next layer, given that the summation value exceeds the threshold limit, as formulated in Equation 1.3. The most common thresholding function used in back-propagation neural networks is the sigmoid functionformulated in Equation 2.1 and illustrated in Figure 2.1.

f(x) = 1

1 + e−x (2.1)

function, that resembles an “S” shape and produces values within the range [0, 1]. The reason of popularity of the sigmoid function is due to the fact that it is differentiable which eases the weight adjustment process [50].

2.3.2

Learning

Back-propagation learning, also named delta rule, error-correction, or widrow-hoff, is a form of supervised learning that trains a given network based on expected outputs. That is, the network is trained on making predictions by repeatedly adjusting its weights such that its predicted output matches the expected output or comes subjectively as close as possible. This process can be formalized as follows. For a given untrained network, the weights wof the connections are initially assigned through some mechanism, such as random as- signment or constant value assignment. The back-propagation learning algorithm adjusts the weights of the untrained network with each training iteration i in proportion to the error of the prediction e(i) produced by the network (i.e., the difference between the network’s output y(i) and the actual desired output d(i) associated with the specific iteration e(i)), as in Equation 2.2 [9, 26, 28].

e(i) = d(i) − y(i) (2.2)

Equation 2.2 constitutes the stimulus that triggers the weights’ adjustment process. In order to make the weights’ adjustment proportional to the error, we need to define a cost function based on the error produced at each training iteration i. This cost function must not exhibit the same oscillation behavior of the error function e(i). Otherwise, this would make automating the error reduction a much more difficult task. The learning algorithm observes and attempts to reduce the values produced by a cost function defined based on

the error of Equation 2.2. This cost function is defined in Equation 2.3.

ε(i) = 1 2e

2(i) (2.3)

The task of learning becomes an optimization task of minimizing the cost function ε(n) and thus can be solved using some methods, such as Gradient Descent. The mechanism by which the weights are adjusted is the merit of the learning. In this learning mechanism, the adjustment for each connection’s weight wk j, connecting neurons k and j, for a specific

training iteration i is defined in Equation 2.4:

∆wk j(i) = ηε‘(i)xj(i) + β(wk j(i − 1) − wk j(i − 2))

∆wk j(i) = ηek(i)xj(i) + β(wk j(i − 1) − wk j(i − 2)) (2.4)

where η is the learning rate. Learning rate is a neural network’s control or tuning param- eter. It is used by some learning algorithms to control the amount of weights that need to be adjusted. Typically, the value of the learning rate ranges between [0, 1]. A high learn- ing rate causes the network to make radical changes to the weights, making the previously learned weights almost overruled at each iteration. This may cause the network to diverse. On the other hand, a low learning rate allows for small weight changes, which causes the network to take longer to converge. Another control parameter is momentum, denoted as β. It is a term associated with each connection that varies within range [0, 1]. Its main pur- pose is to help reduce the oscillation of the weight changes and to prevent the system from converging to a local minima or saddle point. Therefore, it helps the objective function to converge faster and ultimately speeds up the learning process. Setting a momentum too high can create a risk of missing the minimum while a momentum that is too low cannot

reliably avoid local minima [53].

Now that we have determined the weight adjustment value ∆wj, we can formulate the

update value for a specific weight at the next iteration wj(i + 1), as shown in Equation 2.5.

wk j(i + 1) = wk j(i) + ∆wk j(i) (2.5)