• No se han encontrado resultados

Metodología para el Análisis de Datos

3. CAPITULO TERCERO

3.1. Metodología para el Análisis de Datos

Figure 2.5: Visualization of the essentials parts of the ANNs, the neurons. This nonlinear model image is taken from [24].

2.2.2 Convolution layer

Convolution is a mathematical term frequently used in computer vision methods, which is dened as applying a function repeatedly over some array of values. From a computer vision standpoint, it means applying a lter over an image at all possible osets. Having a small 2D image patch as input, the lter consistant of a layer of connection weights produces a single unit output. A convolution layer is parametrized by:

• Number of maps M, • size of the maps (Mx, My),

• kernel sizes (Kx, Ky),

• skipping factors Sx and Sy that dene how many pixels the kernel skips in

both x and y directions between subsequent convolutions.

Each layer has the same number of maps and their size is equal. A kernel of size (Kx, Ky) is shifted over the image just like the sliding window technique. The

kernel must be completely inside the image to be able to perform the convolution. The output map size is then dened as

14 CHAPTER 2. METHODS USED Mxn = M n−1 x − Kxn Sn x + 1 + 1; Myn = M n−1 y − Kyn Sn y + 1 + 1, (2.15)

where n is the index of the layer, Kn

x and Kyn are the kernel sizes along both x and

ydirections in the nth layer and nally Sxnand Synare the skipping factors in layer n along both x and y axes. Each map in layer Ln is connected to at most Mn−1 maps in the layer Ln−1. The neurons of a particular map share their weights but

have dierent input elds. The function of the convolution operators is to extract various features of the input. The rst convolution layers obtain the lowest-level features such as lines, edges and corners. In contrast, the deeper layers obtain higher-level features.

2.2.3 Subsampling layer

The main role of the subsampling layer (or pooling layer) is to reduce the variance of the outputs from the convolution layers. Subsampling (or down-sampling) in general means reducing the overall size of a signal. Subsampling has a dierent meaning in dierent domains and in the domain of 2D lter outputs it can be understood as a method of increasing the position invariance of the lters (weights). This layer computes the maximum (like the popular LeNets in Cae [28] do) or the average value of a particular feature throughout a region of the image. A very good characteristic of these computations for object detection and classication is that they are invariant to small translations.

To be more precise, the matrix of lter outputs is divided into small overlapping grids which take the maximum or average value in each grid as the value in the reduced matrix. Intuitively, the larger the grid - the greater the signal reduction. Oftenly, skipping factors are used to skip nearby pixels prior to convolution. Consequently, using subsampling layers in between convolution layers increases the feature abstractness, resulting in local spatial abstractness increase as well. Therefore, CNNs take into account local information as well.

2.2. CONVOLUTIONAL NEURAL NETWORKS 15

2.2.4 ReLU layer

ReLU (Rectied Linear Units) is a layer of neurons that use a non-saturating activation function

f (x) = max(0, x). (2.16) This layer increases the nonlinear properties of the decision function and the whole network, surprisingly without negatively aecting the receptive elds of the con- volution layer.

Besides the max activation function, there are many other functions used to in- crease the nonlinearity, such as the ones mentioned in [30]:

• Saturating hyperbolic tangent f(x) = tanh(x),

• absolute saturating hyperbolic tangent f(x) = |tanh(x)|, • sigmoid function f(x) = (1 + e−x)−1.

Compared to the aforementioned nonlinear functions, the ReLU activation func- tion is easier to compute, thus greatly increases the neural networks training time. For example, in [30] two equivalent networks were trained with the tanh function and a ReLU function, and the ReLU reached the results six times faster. They also suggest that fast learning has a great inuence on the performance of large models trained on large datasets. ReLUs have a very desirable characteristic that they do not require input normalization to prevent them from saturating. Even if a small portion of the training examples produce a positive input to a ReLU, learning will happen in that particular neuron. This only consolidates this layers usefulness in CNNs.

2.2.5 Fully connected layer

We have reached the high-level reasoning part of the neural network. After several convolution layers and max pooling layers, the neurons from the previous layer need to be connected to the fully connected layers' neurons, which is the main

16 CHAPTER 2. METHODS USED purpose of this layer. These layers can be visualized as one-dimensional since they are not spatially located anywhere. That means that there can not be any further convolution layers after a fully connected layer. One good characteristic of these layers is that they work with multiple-dimensional neural networks. To understand the aforementioned layers better, Figure 2.6 shows how they are connected.

Figure 2.6: A visualization of the architecture of a CNN. Image taken from [8].

2.2.6 Error propagation in a fully-connected network

Having mentioned and described the types of layers a CNN is made of, it is time to have a look at the way the network learns and adjusts the neuron weights. To understand this, it is needed to know the basic types of neurons in a standard fully-connected neural network with L layers:

• An input layer (consisted of units u0

i) whose values are xed by the input

data.

• Hidden layers (consisted of units u`

i) whose values are derived from their

2.2. CONVOLUTIONAL NEURAL NETWORKS 17

Documento similar