2.3 ¿Qué necesitamos?
2.4. Como empezar, elementos básicos [WqsCon]
The Artificial Neural Networks are noted for robustness in their learning ability, and abil- ity to generalize. However, designing an ANN requires choosing parameters such as the transfer functions, topology of the neural network (i.e. feedforward or recurrent), and the number of layers and nodes, which is a delicate task. This is because these define the com- putational capacity of the neural networks, which subsequently directly affect its ability to
generalize. If it is designed with relatively less complexity than the problem, then it may underfit the problem, and if it is relatively more complex than the problem it may lose its ability to generalize to unseen datasets. Considering also that there is yet to be a defined method for designing neural networks, and knowing the role of that their architecture plays in its successful application for a given task; it makes the task of finding approaches to optimizing neural network architectures essential.
It is known that matching the complexity of neural network’s architecture to the prob- lem’s complexity is one way optimal generalization ability can be achieved by a neural network [17], and various methods have been suggested to achieve this. These were sum- marized to be classifiable into three categories according to Jankowski [55], and include (i) regularization (ii) ontogenic grow/shrink network, and (iii) choice of appropriate trans- fer functions (i.e. transfer function optimization). However, in this section, only the first two methods mentioned will be explained, while the approach dealing with the choice of transfer functions is left to be discussed in more detail in later chapters.
In regularization methods, a penalisation scheme is applied to the objective function, which typically factors in the neural network’s size and penalizes accordingly. There are various forms of penalization that have been used in studies [55]. One approach is weight decay, which diminishes the weights of connections towards zero. The effect of this is that weaker connections, which are assumed to be of less importance, are pruned in the course of training. However, this does not consider that stronger connection weights are also be- ing equally penalised as weaker connections, making it counterproductive as explained by Engelbrecht [40]. However, a variant by Hansen - highlighted in [40] - used functions such as hyperbolic and exponential functions to determine the amount of penalization to be made for each weight; weaker connection weights are penalized more than relatively stronger connection weights. Some other approaches similar to regularization include pe- nalizing based on the number of weights and a regulatory constant, proposed by Weightend et al, minimisation of networks energy, measured as the sum of each hidden unit’s activa- tion squared as proposed by Chauvin, and regulating the sharing of hidden units between output nodes studied by Yasui - as highlighted in [40].
The second approach to complexity control is the use of grow/prune methods in an at- tempt to find the appropriate network size for a given problem. In constructive or growing methods, operations that add nodes to the network are included in the training process, and usually, rules are implemented to govern this growth. The network starts off with a small network architecture and grows the network if the networks architectural complexity is still not able to learn the problem with a desired estimated generalization error. Engelbrecht [40] explains that the rule that governs when to increase the network size and when to stop
is crucial for obtaining optimal network architectures. This is due to the rule directly in- teracting with the network’s complexity and in consequence, its generalization ability. Its counterpart are the pruning methods that remove nodes and connections to shrink the size of the network from a predetermined size. The decision of which node or connection to prune is usually based on a rule that relies on measurements of how useful the node or con- nection is to the network performance. It could also involve statistics based methods such as saliency tests. This approach towards complexity regularization could be expected to have faster convergence. Also, though Engelbrecht argues that this approach is guaranteed to learn the underlying mapping function for the data; this is only under the precondition that the network’s initial size is sufficiently complex. If the initial complexity of the net- work is relatively less than that of the problem; it might result in the training process not being able to learn the underlying function of the problem. Bishop’s analogy on polyno- mial curve fitting describes the delicate nature of finding the balance between overfitting and underfitting problems [17].
Various approaches for pruning in artificial neural networks have also been studied, and can be classified into: pruning by evolutionary algorithms [12, 34, 98, 71], and pruning by statistical test [32]. Intuitive pruning methods assume that nodes that are frequently activated and have larger connection weights are more important, and nodes that lack this are less important. However, this assumption is not without its flaws; weights that are weak are important between hidden and output nodes [40]. Evolutionary pruning methods allow pruning of nodes to be done by the evolutionary algorithm; it can be by adding operations in the form of mutation operators [12, 34, 98], or encoding binary values representing the status of the connections onto the gene string to be switched off or on [71]. However, neural networks are sensitive to a node or weight removal. This is the motivation for the use of sensitivity analysis by other approaches [32].
In summary, model complexity plays a significant role in determining the generaliza- tion ability of neural networks. This relationship is further explained in a later chapter in relation to the bias-variance decomposition. In the following sections, we highlight some evolutionary algorithms and hybrids of artificial neural networks which evolve the archi- tectural components of artificial neural networks, and as a result, control their complexity.