Formas de sociedad, acción comunicativa y Derecho

The attributes of power and ease of use have lead to the widespread usage of neural networks in such diverse areas as finance, medicine, pharmaceuticals, engineering, geology, chemistry, and physics. While linear techniques often fail to model complex functions adequately, the nonlinear nature of NNs provides a powerful and sophisticated technique capable of modeling complex functions. Additionally, NNs may be utilized readily with a lower depth of knowledge than would be required with more traditional nonlinear statistical techniques. This section will provide an overview of neural network construction, example architecture, and general architectural examples (more detailed presentations are available in a number of textbooks (e.g. [227-232]).

Neural Network Construction

A neural network attempts to replicate a biological network of neurons. As applied to an artificial neuron, the nature of the biological neuron is described as follows:

1. The neuron is the recipient of a number of inputs.

2. Inputs are conducted to the neuron via a connection, which has an associated weight (or strength). In a biological neuron these weights would correspond to a synaptic efficiency.

3. The neuron has a single threshold value, which is subtracted from the weighted summed inputs to provide an activation value of the neuron.

4. This activation value is processed with an activation function to produce the neuronal output.

A collection of interconnected artificial neurons composes a NN, where the neurons are arranged in a minimum of two layers, an input and output layer. Figure 10-1 presents a schematic diagram of a feed-forward two layer NN, which consists of a nonlinear hidden layer and a linear output layer.

Inputs Hidden Layer of S Neurons

a=f(Wp+b)

Figure 10-1. Schematic of a Neural Network

In this schematic, the NN is provided an input set (P1, P2…PR) and produces an output based on the relationships between the weights (wi,j) and biases (bi,j).

Since the notation used in Figure 10-1 quickly becomes cumbersome with a large number of neurons, an abbreviated notation has been developed [233] to simplify the representation of the neuronal architecture, as shown in Figure 10-2.

f

Inputs

w

b 1

+

Layer of S Neurons

a=f(Wp+b)

S x 1 a

S x 1 n S x R

S x 1 R x 1

Figure 10-2. Abbreviated Notation for a Neural Network

For the example in Figure 10-2, the NN is composed of an input vector p with a number of R values, and S hidden layer neurons. The activation function is represented by f, which can be any continuous function, including typical examples such as linear, hard limit, soft limit, and sigmoidal. When representing nonlinear functions, most NNs will employ a tan-sigmoidal hidden layer function and a linear output layer function, which are shown graphically in Figure 10-3.

Figure 10-3. Example of a Linear (left) and a Tan-Sigmoidal Activation Function (right)

-1 0 1

-3 -2 -1 0 1 2 3

-1 0 1

Mathematically, the output (ai) of each neuron in Figure 10-2 may be expressed as

a_i = f

(

wp+b

)

_(10-1)

where w is a matrix of the weights, b is the bias, and f is the activation function.

During training of the NN, the values for the weights and biases are determined as a part of a nonlinear optimization problem, where an objective function is minimized while values of the weights and biases are varied. A sum of squared errors is typically utilized as an objective function and is expressed as:

∑ ( )

−

= ⁿ

i i

D t a

2 (10-2)

where ai is the network response (output) and ti is the target value. Many different methods have been employed for the optimization of the objective function, including gradient descent, line searches, conjugate gradient, quasi-Newton, and Levenberg-Marquardt.

Example Neural Networks

While a large number of structurally different NNs exist, back propagation (BP) is one of the most popular choices. A variant of BP is a radial basis function (RBF) network. These two networks will be described briefly.

Back Propagation

The back propagation network is a feed forward multi-layered network employing supervised learning for adjustment of the weights. Once calculation results are passed from each layer until an output is generated, the error between the output and target values is calculated. This error is returned to the network, which then uses the

information to adjust the weights. After sufficient iteration of this process, a predefined tolerance is eventually met.

While the BP is capable of very good accuracy, there are disadvantages to this technique.

1. The exact number of hidden layers and neurons in the hidden layers is unknown, and several repetitions using different BP architectures are used to determine an optimum network structure.

2. Since optimization of the objective function involves the calculation of a gradient vector to move along the error surface, there are difficulties in knowing what size step to employ for each move of the gradient vector. Small steps will drastically increase the time until a solution is reached, and large steps may over step the solution.

3. Most importantly, BP networks are prone to over-learning, where the error associated with the input set is minimized, and predictions are poor using a new data set with the trained network.

Generalization is the ability of a network to accurately predict the values of a new data set, and an analogy can be drawn between polynomial curve fitting and generalization. When using a polynomial equation to fit data, a low order polynomial may not be flexible enough to fit the data accurately, but a higher order polynomial may become highly convoluted in order to fit the data at the expense of representing the underlying function. Similar to this, NNs with more weights are able to model complex functions with low error, but they are prone to over-learning. If fewer weights are used, the resulting model may have improved generalization capabilities, but relation to the

underlying function present in the input data is decreased. While using different BP network structures can result in better generalizations, regularization strategies, such as Bayesian methods, are available to alleviate over-learning.

Radial Basis Function

An alternative to the BP network is the use of a RBF network, which shares the same general architecture as a BP with a similar flow of information. The two networks differ in the choice of a hidden layer activation function, which is a fixed Gaussian function in a RBF and a general nonlinear function in a BP. While the speed of the network is improved from that of BP, the RBF can be complicated to train, and the network will learn incorrect patterns as quickly as correct patterns.

Neural Network Architecture

Appearing frequently in the literature, many types of networks, which include both newly developed networks and variants of known networks, have been applied to a variety of research interests. Taxonomic categories, such as learning algorithm, network topology, and data type, may be applied to organize these networks.

Learning Algorithm

Supervised and unsupervised learning are the two main types of learning algorithms utilized in NNs. Frequently, the type of learning algorithm applied may be difficult to classify.

With supervised learning, the NN is given the expected results or target values.

During training, the NN weights are adjusted in an attempt to match the output values with the target values. After training, the NN is validated by providing a new set of input

values and observing the difference between the resulting output values and the correct target values. Supervised learning algorithms are further classified as auto-associative, where the input values and target values are equal, and hetero-associative, where the target values differ from the input values.

In unsupervised learning, the NN is not provided with any target values during training; however, input and target values are often equal; thus, these networks perform as auto-associative networks [234, 235]. Normally, these types of algorithms are employed for data compression, and two widely used algorithms are vector quantization (or "Kohonen network”) [236, 237], and Hebbian learning [231]. Another example of unsupervised learning is Kohonen's self-organizing (feature) map [238], which combines competitive learning with dimensionality reduction by separation of clusters on an a priori grid.

Network Topology

Feedforward and feedback comprise the two types of network topology. The connections between neural units in a feedforward NN do not form cycles. This enables the NN to quickly respond to an input. Feedforward networks are trained with a wide variety of conventional numerical methods such as conjugate descent gradients, and Levenberg-Marquardt, but these methods do not guarantee a global optimum solution.

To avoid local minima, conventional methods are employed with a variety of random starting points, or more complicated methods, simulated annealing and genetic algorithms, may be utilized to find a global optimum directly.

In a feedback NN, cycling occurs in the connections between neural units, which may produce slow responses to inputs. The number of cycles may be large, and these networks tend to be more difficult to train than feedforward NNs.

Data Type

Neural networks differ in the type of input data used, and the two types are categorical variables and quantitative variables. Categorical variables take on a number of possible values or symbolic values (classifications such as male, or female) with several members of each category present in the network. When supervised learning with categorical target values, or unsupervised learning with categorical outputs is used, these types of NNs are known as classification networks.

Quantitative variables involve the measurement of some attribute of an object, such as boiling points of compounds. These measurements should reflect analogous relationships among the input values of the objects. When supervised learning with quantitative target values is employed, these types of NNs are known as regression networks. Some variables can be treated as either categorical or quantitative, such as number of children or any binary variable. Organized tables of taxonomically classified example NNs, which were adapted from Sarle [234], are provided in Tables 10-1 - 10-3.

Table 10-1. Supervised Neural Networks Supervised Neural Networks Feedforward

Linear

Hebbian - [162, 228]

Perceptron - [164, 166, 228, 239]

Adaline - [165, 228]

Higher Order - [227]

Functional Link - [240]

MLP: Multilayer perceptron - [227, 228, 241]

Backprop - [242]

Cascade Correlation - [228, 243]

Quickprop - [244]

RPROP - [245]

RBF networks - [227, 246, 247]

OLS: Orthogonal Least Squares - [248]

CMAC: Cerebellar Model Articulation Controller - [249, 250]

Classification only

LVQ: Learning Vector Quantization - [228, 251]

PNN: Probabilistic Neural Network - [228, 252-254]

Regression only

GNN: General Regression Neural Network - [255-257]

Feedback - [231, 258]

BAM: Bidirectional Associative Memory - [228, 237]

Boltzman Machine - [185, 228]

Recurrent time series

Backpropagation through time - [259]

Elman - [260]

FIR: Finite Impulse Response - [261]

Jordan - [262]

Real-time recurrent network - [263]

Recurrent backpropagation - [228, 264]

TDNN: Time Delay NN - [265]

Competitive

ARTMAP - [266]

Fuzzy ARTMAP - [267, 268]

Gaussian ARTMAP - [269]

Counterpropagation - [228, 236, 270, 271]

Neocognition - [228, 272]

Table 10-2. Unsupervised Neural Networks Unsupervised Neural Networks- [231]

Competitive

Vector Quantization

Grossberg - [273]

Kohonen - [274]

Conscience - [275]

Self-Organizing Map

Kohonen - [228, 238]

GTM: - [276]

Local Linear - [277]

Adaptive Resonance Theory ART 1 - [228, 278, 279]

ART 2 - [228, 280]

ART 2-A - [266]

ART 3 - [281]

Fuzzy ART - [282]

DCL: Differential Competitive Learning - [237]

Dimension Reduction - [283]

Hebbian - [162, 228]

Oja - [284]

Sanger - [285]

Differential Hebbian - [237]

Autoassociation

Linear autoassociator - [179, 228]

BSB: Brain State in a Box - [179, 228]

Hopfield - [171, 228]

Table 10-3. Nonlearning Neural Networks Nonlearning Neural Networks Hopfield - [231]

Various networks for optimization - [286]

Chapter 11. Quantitative Structure-Property Relationship Models

In document LA DE - (página 190-200)