• No se han encontrado resultados

Curvas sin puntos de inflexi´on

4. Puntos de inflexi´ on

4.3. Curvas sin puntos de inflexi´on

This section introduces terminology and ruling principles used later on in this work. As NNs are a re-emerging technology in computer science there exists a mixture of well-established and recently introduced, less familiar methods and terms. Figure 6.1 gives an overview how the terms artificial intelligence (AI), machine learning (ML) and DL relate to each other.

NNs and, in particular, deep neural networks (DNNs) are nowadays associated with

Figure 6.1: Deep learning embedded in a (historical) context of artificial intelligence and machine learning from the early 1950’s until today. Starting with a theoretical framework, i.e. the Hebbian Theory, ideas of brain-like machines and general artificial intelligence have arisen across the decades of digitalization. Neural networks, which are inspired by the brain’s biology, have been discussed since the early days of AI, but recently obtained attraction because of increasing computational power, availability of significant amounts of data and improved methods for efficient learning.

the DL. Both terms, NN and DNN, are used in this work and are often interchangeable. However, whereas NN is a rather general term, DNN describes NNs with multiple (> two) layers, which have gained high interest, recently.

A NN is built up of individual components, denoted as nodes or neurons. Each of them forms a simple functional unit with one or more inputs xi and one output y as

depicted in Figure 6.2. The sum of the weighted inputs xi connected to a neuron,

X =∑︂

i

xiwji , (6.1)

forms a linear input passed to an activation function f

y = f (X) . (6.2)

6.1 Deep Learning

Figure 6.2: The input vector x1, ... xn (left) is multiplied, ×, by corresponding weights

w1, ... wn. The sum, Σ, of weighted inputs is fed into the activation function f of the

neuron.

If the neuron’s output y is not the final output of the NN, it will be connected to one or more successive neurons. The interconnected neurons in a NN are typically organized in a layer-like structure where a layer represents a collection of neurons, which are not connected to each other, but have some kind of connection to its preceding and successive layers’ neurons as shown in Figure 6.3. NNs in which layers receive inputs from previous layers and yield outputs to successive layers are named feedforward neural networks as they have no circular connections between neurons. DNNs have at least one hidden layer that is not directly connected to the in- or output. At the network’s output, the output layer yields a prediction y which

Figure 6.3: If all neurons of adjacent layers are connected to each other by a weighted connection (weights not shown), the network is called fully connected. There are no connections between neurons of the same layer. The source data fed into the input and passed forward through the network to produce the predicted output y, the correct output, e.g. target image is compared to y by the loss function.

is compared to the target. Prediction and target can be multi dimensional data, however, NNs in this work are limited to 2D outputs. Data that pass through a NN are organized as tensors which are often described as vectors for N dimensional data in literature.

6 Neural Networks and Deep Learning

If all neurons of adjacent layers are connected to each other, the layers are referred to as fully connected as illustrated in Figure 6.3. Another type of connection is present for convolutional layers. These layers apply a convolution operation to the input. The size of the convolution kernel defines the size of the receptive field and the number of weighted connections. Further details on the principles of connecting operations will be given in Section 7.1.

Independent of their connection type, the neurons map their input to the output as defined by their activation function. In order to approximate arbitrary functions, non-linear activations are needed after the linear operations, i.e. weighting and summation of the input [HSW89]. A common choice for the activation function in the hidden layers of a DNN is the rectified linear unit (ReLU) function [LYH15]:

fReLU(X) = max(0, X) = ⎧ ⎨ ⎩ 0 , for X < 0 x , for X ≥ 0 (6.3)

The strictly monotonic hyperbolic tangent

ftanh(X) =

e2X − 1

e2X + 1 (6.4)

can be an option to produce negative and positive values, but limit their range to ] − 1, +1[. Whereas, the linear identity function

y = f (X) = X (6.5)

is a common choice for the activations in the last layer(s), i.e. output layer. These linear output units are common in regression models where the prediction has to match continuous values [Hin+12]. Many other activation functions exist. In this work, the above-mentioned activations are used exclusively.

The shape and dimensionality of the training data define the shapes of the in- and output layers, whereas the design and architecture of the hidden layers is developed depending on the confronted problem or class of tasks. Often heuristic, less simple designs combine different types of layers to get a desired behavior of the net [Nie15]. The convention for the following sections will be a bottom-up or left-right design (Fig. 6.3), where the bottom/left-hand side layers in a given network serves as the visible input layers, stacked by a desired number of hidden layers and visible output layers on top/right-hand side.

The networks implemented for this work are setup with Keras [Cho+15] using a TensorFlow backend [Aba+16]. TensorFlow organizes data in tensors as common interchange and data structure format to compute on GPU, therefore the term tensor will be used for data objects processed by a network. In contrast to the chapters

6.2 Training of a Neural Network

Documento similar