Capítol 5. Metodologia
5.2. Context de l’estudi
5.2.2. Panorama general de les assignatures de Didàctica de les Ciències I i II
In most of the neural network applications the data that is presented as the training set seems to vary. A few applications require grouping of data that may, or may not be, clearly definable. In such cases it is required to identify a group as optimal as possible. Such kind of networks are grouped as data conceptualization networks.
4.4.1 Adaptive Resonance Network
The last unsupervised learning network we discuss differs from the previous networks in that it is recurrent; as with networks in the next chapter, the data is not only fed forward but also back from output to input units.
Background
In 1976, Grossberg introduced a model for explaining biological phe-nomena. The model has three crucial properties:
1. a normalization of the total network activity. Biological systems are usually very adaptive to large changes in their environment.
For example, the human eye can adapt itself to large variations in light intensities.
2. contrast enhancement of input patterns. The awareness of subtle differences in input patterns can mean a lot in terms of survival.
Distinguishing a hiding panther from a resting one makes all the difference in the world. The mechanism used here is contrast en-hancement.
3. short-term memory (STM) storage of the contrast-enhanced pat-tern. Before the input pattern can be decoded, it must be stored
© 2010 by Taylor and Francis Group, LLC
in the short-term memory. The long-term memory (LTM) imple-ments an arousal mechanism (i.e., the classification), whereas the STM is used to cause gradual changes in the LTM.
The system consists of two layers, F1 and F2, which are connected to each other via the LTM (Figure 4.13). The input pattern is received at F1, whereas classification takes place in F2. As mentioned before, the input is not directly classified. First a characterization takes place by means of extracting features, giving rise to activation in the feature representation field.
FIGURE 4.13: The ART Architecture
The expectations, residing in the LTM connections, translate the input pattern to a categorization in the category representation field. The clas-sification is compared to the expectation of the network, which resides in the LTM weights from F2 to F1. If there is a match, the expectations are strengthened, otherwise the classification is rejected
ART1: The Simplified Neural Network Model
The architecture of ART1 is a very simplified model and consists of two layers of binary neurons (with values 1 and 0), called the comparison layer denoted as F1 and the recognition layer denoted as F2 (Figure 4.14). Every individual neuron in the comparison layer is connected to all neurons in the recognition layer through the continuous-valued forward long term memory (LTM) Wf , and vice versa via the binary-valued backward LTM Wb. A gain unit and a reset unit are also available.
There are two gain units and are denoted as G1 and G2. Every neuron
148 Computational Intelligence Paradigms
in the F1 layer receives three inputs: a component of the input pattern, a component of the feedback pattern, and a gain G1. The neuron fires if and only if two-third of the input is high. The neurons in the recognition layer each compute the inner product of their incoming (continuous-valued) weights and the pattern sent over these connections. The winning neuron then inhibits all the other neurons via lateral inhibition. Gain 2 is the logical “or” of all the elements in the input pattern x. Gain 1 equals gain 2, except when the feedback pattern from F2 contains any 1; then it is forced to zero. Finally, the reset signal is sent to the active neuron in F2 if the input vector x and the output of F1 differ by more than some vigilance level.
Architecture
The ART 1 has computational units and supplemental units. Its ar-chitecture is shown inFigure 4.13.
Computational Units
The computational unit comprises of F1 and F2 units and the reset unit. The F1(a) input unit is connected to the F2(b) interface unit. The input and the interface units are connected to reset mechanism unit. By means of top-down and bottom-up weights, the interface layer units are connected to the cluster units and the reciprocity is also achieved.
Supplemental Units
There are a few limitations of the computational unit. All the units of the computational unit are expected to react very often during the learning process. Moreover the F2 unit is inhibited during some spe-cific conditions and then again should be returned back when required.
Therefore, in order to overcome these limitations two gain control units G1 and G2 act as supplemental units. These special units receive sig-nals from and send their signal to, all the units present in occupational structure. In Figure 4.14, the excitatory signals are indicated by “+”
and inhibitory signals by “−”. The signal may be sent, wherever any unit in interface or cluster layer has three sources from which it can receive a signal. Each of these units also receives two excitatory signals in order to be “on”. Hence, due to this, the requirement is called the two-thirds rule. This rule plays a role in the choice of parameters and initial weights. The reset unit R also controls the vigilance matching.
© 2010 by Taylor and Francis Group, LLC
FIGURE 4.14: The ART1 Neural Network
Operation
The network starts by clamping the input at F1. Because the output of F2 is zero, G1 and G2 are both on and the output of F1 matches its input. The pattern is sent to F2, and in F2 one neuron becomes active.
This signal is then sent back over the backward LTM, which reproduces a binary pattern at F1. Gain 1 is inhibited, and only the neurons in F1 which receive a ’one’ from both x and F2 remain active. If there is a substantial mismatch between the two patterns, the reset signal will inhibit the neuron in F2 and the process is repeated.
Training Algorithm
The parameters used in the training algorithm are n: Number of components in the input vector m: Maximum number of clusters that can be formed bij: bottom-up weights (from F1 (b) to F2 unit)
tij: top-bottom weights (from F2 to F1 (b) units) ρ: vigilance parameter
s: binary input vector
x: activation vector for interface layer (F1 (b) layer (binary))
||x ||: norm of vector x (sum of the components xi)
150 Computational Intelligence Paradigms
The binary input vector is presented to F1(a) input layer and is then received by F1(b), the interface layer. The F1(b) layer sends the acti-vation signal to F2 layer over weighted interconnection path. Each F2
unit calculates the net input. The unit with the largest net input will be the winner that will have the activation d=1. All the other units will have the activation as zero. That winning unit alone will learn the cur-rent input pattern. The signal sent from F2 to F1 (b) through weighted interconnections is called as top-bottom weights. The “X” units remain
“on” only if they receive non-zero weights from both the F1 (a) to F2 units.
The norm of the vector ||x ||will give the number of components in which top-bottom weight vector for the winning unit tji and the input vector S are both ’1’. Depending upon the ratio of norm of x to norm of S (||x ||/||S ||), the weights of the winning cluster unit are adjusted.
The whole process may be repeated until either a match is found or all neurons are inhibited. The ratio (||x ||/||s ||) is called Match ratio.
At the end of each presentation of a pattern, all cluster units are returned to inactive states but are available for further participation.
The pseudocode of the training algorithm of ART 1 network is as follows.
Initialize parameters L >1 and 0 <¡ ρ ≤ 1
Initialize weights 0 <bij (0) <L−1+nL <tji(0) = 1 While not stopping condition do
For each training input
Assign activations of all F2 units to zero
Assign activations of F1 (a) units to input vector s Compute the norm of s: || s ||= X
i
Si
Send input signal from F1 (a) to F11(b) layer xi=si For each F2node that is not inhibited
If yJ 6= -1
© 2010 by Taylor and Francis Group, LLC
If || x ||/|| s ||<ρ
yJ=-1, (inhibit node J) End If
End While End For
Update the weights for node J bij (new) = L−1+jj xjjLxi tji (new)= xi
End For
Test for stopping condition End While
The stopping condition may be no weight changes, no units reset or maximum number of epochs searched.
In winner selection, if there is a tie, take J to be the smallest such index. Also tjiis either 0 or 1, and once it is set to 0 during learning, it can be never set back to 1, and once it is set to 0 during learning, it can be never set back to 1 because of stable learning method.
The parameters used have the typical values as shown below.
Parameter Range Typical value
L L >1 2
ρ 0>ρ ≤ 1 0.9
bij 0<bij (0)<
(L/(L-1+n)) 1/(1+n)
(Bottom-up weights) tji tji (0)=1(top down weights)1
4.4.2 Implementation of ART Algorithm in MATLAB The top-down weights for an ART network after a few iterations are-given as tji=[1 1 0 0;1 0 0 1;1 1 1 1] and the bottom up weights are bij=[.57 0 .3;0 0 .3;0 .57 .3;0 .47 .3]. The following MATLAB code il-lustrates the steps of ART algorithm to find the new weight after the vector [1 0 1 1] is presented.
clc;
clear all;
% Step 1: Initialization
% The bottom up weights
b=[.57 0 .3;0 0 .3;0 .57 .3;0 .47 .3 ];
152 Computational Intelligence Paradigms
% The top down weights
t=[1 1 0 0;1 0 0 1;1 1 1 1];
% Vigilance parameter (0<ρ ≤ 1) set to 0.9 p=0.9;
% Initialize L (L>1) to 2 L=2;
% Step 2: Start training
% Step 3: Present the new vector x=[1 0 1 1];
% Step 4: Set the activations to the input vector s=x;
% Step 5: Compute the norm of s according to the formula || s ||= X
i
Si
norm s=sum(s);
% Step 6: Send input signal from F1 (a) to F1(b) layer xi=si
% Step 7: Calculate the net input yj= P
i bij xi
y=x*b;
stop=1;
while stop
% Step 8: While reset do for i=1:3
% Step 9: Find J such that yJ ≥ yj for all nodes j
if y(i)==max(y) J=i;
end end
% Step 10: Recomputing activation x of F1(b) xi=sitJi
x=s.*t(J,:);
% Step 11: Compute the norm of vector
© 2010 by Taylor and Francis Group, LLC
x: || x ||= X
i
Xi
norm x=sum(x);
% Step 12: Test for reset if norm x/norm s >= p
% Step 13: Updating the weights b(:,J)=L*x(:)/(L-1+nx);
t(J,:)=x(1,:);
stop=0;
else
y(J)=-1;
stop=1;
end
if y+1 == 0 stop=0;
end end
disp(’Top down weights’) disp(t);
disp(’Bottom up weights’) disp(b);
Output
The updated weights are:
Top down weights
1 1 0 0
1 0 0 1
1 0 1 1
Bottom up weights
0.5700 0 0.5000
0 0 0
0 0.5700 0.5000
0 0.4700 0.5000
4.4.3 Self-Organizing Map
The Kohonen network was developed by Teuvo Kohonen in the early 1980’s, based on clustering data. In this network if two input vectors are close, they will be mapped to processing elements that are close together
154 Computational Intelligence Paradigms
in the two-dimensional Kohonen layer that represents the features or clusters of the input data. Here, the processing elements constitute a two-dimensional map of the input data.
The basic use of the self-organizing map is to picture topologies and hierarchical structures of multidimensional input spaces. The self-organizing network has been used to create area-filled curves in two-dimensional space created by the Kohonen layer. The Kohonen layer can also be used for optimization problems by providing the connection weights to settle out into a minimum energy pattern.
The major advantage of this network is that this network learns based on unsupervision. When the topology is combined with other neural layers for prediction or categorization, the network first learns in an unsupervised manner and then switches to a supervised mode for the trained network to which it is attached.
The basic architectural model of a self-organizing map network is shown in Figure 4.15. The self-organizing map has typically two layers:
input layer and the Kohonen layer. The input layer is fully connected to a two-dimensional Kohonen layer. The output layer shown here is used in a categorization problem and represents three classes to which the
in-FIGURE 4.15: An Example Self-Organizing Map Network
© 2010 by Taylor and Francis Group, LLC
put vector can belong. This output layer typically learns using the delta rule and is similar in operation to the counter-propagation paradigm.
The processing elements in the Kohonen layer measures the Euclidean distance of the weights from the presented input patterns. During recall, the Kohonen element with the minimum distance is the winner and outputs a one to the output layer. Since this a competitive network, once the winning unit is chosen all the other processing elements are forced to zero. Hence the winning element is the nearest element to the input value and this represents the input value in the two-dimensional map.
During the training process, the Kohonen processing element with the smallest distance adjusts its weight to be closer to the values of the input data. The neighbors of the winning element also adjust their weights to be closer to the same input data vector.
The processing elements naturally represent approximately equal in-formation about the input data set. Where the input space has sparse data, the representation is compacted in the Kohonen space, or map.
Where the input space has high density, the representative Kohonen elements spread out to allow finer discrimination. In this way the Koho-nen layer is thought to mimic the knowledge representation of biological systems.
Organized Maps allow a network to develop a feature map. Self-Organized learning can be characterized as displaying “global order emerging from local interactions”. One example of self-organized learn-ing in a neural network is the SOM algorithm. There are three important principles from which the SOM algorithm is derived:
1. Self-amplification 2. Competition 3. Co-operation
These principles are defined as follows:
1. Self-amplification: units, which are on together, tend to become more strongly connected. Thus, positive connections tend to be self-amplifying. This is the Hebbian learning principle.
2. Competition: Units enter into a competition according to which one responds “best” to the input. The definition of “best” is typ-ically according to either (i) the Euclidean distance between the unit’s weight vector and the input, or (ii) the size of the dot prod-uct between the unit’s weight vector and the input. Provided the vectors are normalized, a minimum Euclidean distance is equiv-alent to a maximum dot product so it doesn’t matter which you
156 Computational Intelligence Paradigms
choose. The best-matching unit is deemed to be the winner of the competition.
3. Co-operation: In the SOFM, each unit in the “competing layer”
is fully connected to the input layer. Further, each competing unit is given a location on the map. Most often, a two dimensional map is used so the units are assigned locations on a 2-D lattice. (maps of one dimension or more than two dimensions are also possible).
Whenever a given unit wins the competition, its neighbors are also given a chance to learn. The rule for deciding who are the neighbors may be the “nearest neighbor” rule, i.e., only the four nearest units in the lattice are considered to be in the neighborhood, or it could be “two nearest neighbors”, or the neighborhood could be defined as a shrinking function of the distance from each other unit and the winner. Whatever the basis for determining neighborhood membership, the winner and all its neighbors do some Hebbian learning, while units not in the neighborhood do not learn for a given pattern.
Architecture
The architecture of the Kohonen SOM is shown in Figure 4.16. All the units in the neighborhood that receive positive feedback from the winning unit participate in the learning process. Even if a neighboring unit’s weight is orthogonal to the input vector, its weight vector will still change in response to the input vector. This simple addition to the competitive process is sufficient to account for the order mapping.
FIGURE 4.16: Architecture of Kohonen SOM
© 2010 by Taylor and Francis Group, LLC
Training Algorithm
The weights and the learning rate are set initially. The input vectors to be clustered are presented to the network. When the input vectors are presented, the winner unit is calculated either by Euclidean distance method or sum of products method based on the initial weights. Based on the winner unit selection, the weights are updated for that particular winner unit using competitive learning rule as discussed earlier. An epoch is said to be completed once all the input vectors are presented to the network. By updating the learning rate, several epochs of training may be performed.
The pseudocode of the training algorithm of the SOM network is shown below:
Initialize topological neighborhood parameters Initialize learning rate
Initialize weights
While not stopping condition do For each input vector x
For each I, compute squared Euclidean distance
D(j)= P
(wij-xi)2, i=1 to n and j=1 to m Find index J, when D (j) is minimum
For all units J, with specified neighborhood of J,
and for all i, update the weights as
wij(new) =wij(old) + α [xi-wij(old)]
End For
Update the learning rate
Reduce the radius of topological neighborhood at specified times
Test the stopping condition End While
End
The map function occurs in two phases:
• Initial formation of perfect (correct) order
• Final convergence
The second phase takes a longer duration than the first phase and re-quires a small value of learning rate. The learning rate is a slowly de-creasing function of time and the radius of the neighborhood around a
158 Computational Intelligence Paradigms
cluster unit also decreases as the clustering process goes on. The initial weights are assumed with random values. The learning rate is updated by, α (t+1) =0.5 α (t).