Capítol 3. Formació inicial de mestres per a l’ensenyament i aprenentatge de les ciències
3.1. Ciència escolar basada en la investigació i la modelització, argumentació
3.1.2. Model didàctic d’investigació, modelització i argumentació
The second class of applications is classification. A network that can classify could be used in the medical industry to process both lab re-sults and doctor-recorded patience symptoms to determine the most likely disease. Some of the network architectures used for data classi-fication are Learning Vector Quantization (LVQ), Counter-propagation Network (CPN), and Probabilistic Neural Network (PRNN). This sec-tion discusses the architecture, algorithm, and implementasec-tion of these networks.
© 2010 by Taylor and Francis Group, LLC
110 Computational Intelligence Paradigms 4.2.1 Learning Vector Quantization
The vector quantization technique was originally evoked by Tuevo Kohonen in the mid 80’s. Both Vector quanization network and self-organizing maps are based on the Kohonen layer, which is capable of sorting items into appropriate categories of similar objects. Such kind of networks find their application in classification and segmentation prob-lems.
Topologically, the network contains an input layer, a single Kohonen layer, and an output layer. An example network is shown inFigure 4.1.
The output layer has as many processing elements as there are distinct categories, or classes. The Kohonen layer consists of a number of pro-cessing elements classified for each of the defined classes. The number of processing elements in each class depends upon the complexity of the input-output relationship. Every class has the same number of elements throughout the layer. It is the Kohonen layer that learns and performs relational classifications with the aid of a training set. However, the rules used to classify vary significantly from the back-propagation rules. To optimize the learning and recall functions, the input layer should con-tain only one processing element for each separable input parameter.
Higher-order input structures could also be used.
Learning Vector Quantization sorts its input data into groups that it determines. Fundamentally, it maps an n-dimensional space into an m-dimensional space. The meaning is that the network has n-inputs and produces m-outputs. During training the inputs are classified without disurbing the inherent topology of the training set. Generally, topology preserving maps preserve nearest neighbor relationships in the training set such that the input patterns which have not been previously learned will be categorized by their nearest neighbors in the training data.
While training, the distance of the training vector to each processing element is computed and while doing so, the processing element with the shorter distance is declared the winner. Always, there is only one winner for the entire layer. This winner fires only one output processing element, which determines the class or category the input vector belongs to. If the winning element is in the expected class of the training vector, it is reinforced toward the training vector. If the winning element is not in the class of the training vector, the connection weights entering the processing element are moved away from the training vector. This later operation is referred to as repulsion. On this training method, individual processing elements allotted to a particular class migrate to the region associated with their specific class.
During the recall mode, the distance of an input vector to each pro-cessing element is computed and again the nearest element is declared
© 2010 by Taylor and Francis Group, LLC
FIGURE 4.1: An Example Learning Vector Quantization Network
the winner. That in turn generates one output, signifying a particular class found by the network.
There are some limitations with the Learning Vector Quantization ar-chitecture. Apparently, for complex classification problems with similar objects or input vectors, the network requires a large Kohonen layer with many processing elements per class. This can be overcome with better choices, or higher-order representation for, the input parameters.
The learning mechanisms have some disadvantages which are ad-dressed by variants to the paradigm. Usually these variants are applied at different stages of the learning process. They pervade a conscience mechanism, a boundary adaptation algorithm, and an attraction func-tion at different points while training the network.
In the basic form of the Learning Vector Quantization network a few processing elements tend to win too often while others, do nothing. Those processing elements that are close tend to win and those that are far away do not involve. To overcome this defect, a conscience mechanism is added so that a processing element which wins too often develops a blameable conscience and is penalized. The actual conscience mechanism is a dis-tance bias which is added to each processing element. This disdis-tance bias is proportional to the difference between the win frequency of an ele-ment and the average processing eleele-ment win frequency. As the network progresses along its learning curve, this bias proportionality factor needs to be decreased.
A boundary modification algorithm is used to refine a solution once a relatively good solution has been found. This algorithm effects the cases
112 Computational Intelligence Paradigms
when the winning processing element is in the wrong class and the second best processing element is in the right class. A further limitation is that the training vector must be near the midpoint of space joining these two processing elements. The winning processing element is moved away from the training vector and the second place element is moved toward the training vector. This procedure refines the boundary between regions where poor classifications commonly occur.
In the early training of the Learning Vector Quantization network, it is sometimes desirable to turn off the repulsion. The winning processing element is only moved toward the training vector if the training vector and the winning processing element are in the same class. This option is particularly helpful when a processing element must move across a region having a different class in order to reach the region where it is needed.
Architecture
The architecture of an LVQ neural net is shown in Figure 4.2.
FIGURE 4.2: Architecture of LVQ
The architecture is similar to the architecture of a Kohonen self orga-nizing neural but without a topological structure assumed for the out-put units, In LVQ net, each outout-put unit has a known class, since it uses supervised learning, thus differing from Kohonen SOM, which uses un-supervised learning. The architecture may resemble competitive network architecture, but this is a competitive net where the output is known;
hence it is a supervised learning network.
© 2010 by Taylor and Francis Group, LLC
Methods of initialization of reference networks
i. Take first ’m’ training vectors and use them as weight vectors; the remaining vectors are used for training.
ii. Initialize the reference vectors randomly and assign the initial weights and class randomly.
iiii. K-means clustering method can be adapted.
Training Algorithm
The algorithm for the LVQ net is to find the output unit that has a matching pattern with the input vector. At the end of the process, if x and w belong to the same class, weights are moved toward the new input vector and if they belong to a different class, the weights are moved away from the input vector. In this case also similar to Kohonen self-organizing feature map, the winner unit is identified. The winner unit index is compared with the target, and based upon the comparison result, the weight updation is performed as shown in the algorithm given below. The iterations are further continued by reducing the learning rate.
Parameters Used in the Pseudcode
The various parameters used in the training of the LVQ network is given below.
x: Training vector (x1,...,xi,...,xn)
T: Category or class for the training vector
wj: Weight vector for the jth output unit (w1j,...,wij,...wnj) Cj: Category or class represented by jthoutput unit
| | x-wj | |: Euclidean distance between input vector and weight vector for the jth output unit
The pseudocode of the LVQ algorithm is as follows:
Initialize weights (reference) vectors.
Initialize learning rate
While not stopping condition do For each training input vector x
Compute J using squared Euclidean distance D(j)= P (wij− xi)2
Find j when D(j) is minimum Update wJ as follows:
If T=CJ , then
wJ(new) = wJ(old)+ α (x-wJ(old)
If T6=CJ , then
wJ(new) = wJ(old)+ α (x-wJ(old)
114 Computational Intelligence Paradigms End For
Reduce the learning rate.
Test for the stopping condition.
End While
The stopping condition may be fixed number of iterations or the learn-ing rate reachlearn-ing a sufficiently small value.
Variants of LVQ
Kohonen developed two variant techniques, LVQ2 and LVQ3, which are more complex than initial LVQ but allow for important performance in classification. In the LVQ algorithm, only the reference vector that is closest to the input vector is updated. In the LVQ2, LVQ3 algorithms, two vectors learn if several conditions are satisfied. The vectors are win-ner up and runwin-ner up. The technique followed is, if the input is approx-imately the same distance from the winner up and runner up, then each should learn.
LVQ2
In this case, the winner and runner up represent different classes. The runner up class is the same as the input vector. The distances between the input vector to the winner and the input vector to runner are ap-proximately equal. The fundamental condition of LVQ2 is formed by a window function. Here, x is current input vector, yc is reference vector that is closest to x, yr is the reference vector that is next closest to x, dc is distance from x to yc and dr is distance from x to yr. The window is defined as : the input vector x falls in the window if,
(dc/dr) > (1 − ǫ) and (dc/dr) < (1 + ǫ) where ǫ is the number of training samples (ǫ =0.35)
The updation formula is given by
yc(t + 1) = yc(t) + α(t)[x(t) − yc(t)]
yr(t + 1) = yr(t) + α(t)[x(t) − yr(t)]
LVQ3
The window is defined as
min(dc1/dc2, dc2/dc1) > (1 − ǫ)(1 + ǫ)(ǫ = 2) Considering the two closest vectors yc1 and yc2 .
© 2010 by Taylor and Francis Group, LLC
LVQ3 extends the training algorithm to provide for training if x, yc1 and yc2 belong to the same class. The updates are given as
yc1(t + 1) = yc1(t) + β(t)[x(t) − yc1(t)]
yc2(t + 1) = yc2(t) + β(t)[x(t) − yc2(t)]
The value of β is a multiple of the learning rate α(t) that is used if yc1
and yc2 belong to different classes, i.e., β = m α(t); for 0.1 < m < 0.5.
This change in β indicates that the weights continue to approximate class distributions and prevents codebook vectors from moving away from their placement if the learning continues.
4.2.2 Implementation of LVQ in MATLAB
An LVQ network can be created with the function newlvq available in MATLAB Neural Network tool box as follows:
net = newlvq(PR,S1,PC,LR,LF)
where: PR is an R-by-2 matrix of minimum and maximum values for R input elements.
S1 is the number of first layer hidden neurons.
PC is an S2 element vector of typical class percentages.
LR is the learning rate (default 0.01).
LF is the learning function (default is learnlv1).
Example
Enter the input and the target vectors clear all;
close all;
The input vectors P and target classes Tc below define a classification problem to be solved by an LVQ network.
inp = [-3 -2 -2 0 0 0 0 +2 +2 +3;0 +1 -1 +2 +1 -1 -2 +1 -1 0];
target class = [1 1 1 2 2 2 2 1 1 1];
The target classes are converted to target vectors T. Then, an LVQ network is created (with inputs ranges obtained from P, four hidden neurons, and class percentages of 0.6 and 0.4) and is trained.
T = ind2vec(target class);
116 Computational Intelligence Paradigms
The first-layer weights are initialized to the center of the input ranges with the function midpoint. The second-layer weights have 60% (6 of the 10 in Tc above) of its columns with a 1 in the first row, (corresponding to class 1), and 40% of its columns will have a 1 in the second row (corresponding to class 2).
network = newlvq(minmax(inp),4,[.6 .4]);
network = train(network,inp,T);
To view the weight matrices
network.IW(1,1) ; % first layer weight matrix The resulting network can be tested.
Y = sim(network,inp) Yc = vec2ind(Y)
Output: The network has classified the inputs into two basic classes, 1 and 2.
Y =
1 1 1 0 0 0 0 1 1 1 1
0 0 0 1 1 1 1 0 0 0 0
Yc =
1 1 1 2 2 2 2 1 1 1 1
Weight matrix of layer 1 ans =
2.6384 -0.2459
-2.5838 0.5796
-0.0198 -0.3065
0.1439 0.4845
Thus the above code implements the learning vector quantization al-gorithm to classify a given set of inputs into two classes.
4.2.3 Counter-Propagation Network
The Counter Propagation Network was proposed and developed by Robert Hecht-Nielsen to synthesize complex classification problems in-stead of reducing the number of processing elements and training time.
The learning process of Counter propagation is more or less similar to LVQ with a small difference such that the middle Kohonen layer plays
© 2010 by Taylor and Francis Group, LLC
the role of a look-up table. This look-up table finds the closest fit to a given input pattern and outputting its equivalent mapping.
The first counter-propagation network comprised of a bi-directional mapping between the input and output layers. Essentially, while data is presented to the input layer to generate a classification pattern on the output layer, the output layer in turn would accept an additional in-put vector and generate an outin-put classification on the network’s inin-put layer. The network got its name from this counter-posing flow of infor-mation through its structure. Most developers use a uni-flow variant of this formal representation of counter-propagation. Counter propagation networks have only one feedforward path from input layer to output layer.
An example network is shown in Figure 4.3. The uni-directional counter-propagation network has three layers. If the inputs are not al-ready normalized before they enter the network, a fourth layer is some-times required. The main layers include an input buffer layer, a self-organizing Kohonen layer, and an output layer which uses the Delta Rule also known as the Grossberg out star layer to modify its incoming connection weights.
FIGURE 4.3: An Example Counter-Propagation Network
118 Computational Intelligence Paradigms
Depending upon the parameters that define a problem the input layer’s size varies. If the input layer has very few processing elements, then the network may not generalize and if the input layer has a large number of processing elements then the processing time is very high.
Generally for fine operation of a network, the input vector must be normalized. Normalization refers to the process of adding that is for every combination of input values, the total “length” of the input vec-tor must add up to one. The normalization process can be done with a preprocessor before presenting the data to the network. In specific appli-cations, a normalization layer is added between the input and Kohonen layers. The normalization layer requires one processing element for each input, plus one more for a balancing element. This normalization layer assures that all input sets sum up to the same total.
Normalization of the inputs is necessary to insure that the Kohonen layer finds the correct class for the problem. Without normalization, larger input vectors bias many of the Kohonen processing elements such that weaker value input sets cannot be properly classified. Due to the competitive nature of the Kohonen layer, the bigger value input vectors overcome the smaller vectors. Counter-propagation uses a standard Ko-honen layer which self-organizes the input sets into classification zones.
It follows the classical Kohonen learning law. This layer acts as a closest neighbor classifier such that the processing elements in the competitive layer autonomously update their connection weights to divide up the input vector space in approximate correspondence to the frequency with which the inputs occur. There should be as many processing elements as possible in the Kohonen layer equivalent to the output classes. The Kohonen layer generally has a lot more elements than classes simply be-cause additional processing elements provide a finer resolution between similar objects.
The output layer for counter-propagation fundamentally consists of processing elements which learn to produce an output when a specific input is applied. Because the Kohonen layer is a competitive layer, only a single winning output is produced for a given input vector. This layer renders a method of decoding the input to a purposeful output class.
The delta rule is used to back-propagate the error between the desired output class and the actual output generated with the training set. The weights in the output layer are alone updated while the Kohonen layer is unaffected.
As only one output from the competitive Kohonen layer is active at a time and all other elements are zero, the only weight adjusted for the output processing elements are the ones connected to the winning element in the competitive layer. In this way the output layer learns to reproduce a definite pattern for each active processing element in the
© 2010 by Taylor and Francis Group, LLC
competitive layer. If numerous competitive elements belong to the same class, then the output processing element will acquire weights in response to those competitive processing elements and zero for all others.
The major limitation of this architecture is the competitive Kohonen layer learns without any supervision. Therefore it cannot predict the type of class it is reporting to. This infers that it is possible for a process-ing element in the Kohonen layer to learn two or more trainprocess-ing inputs, which belong to different classes. During this process, the output of the network will be multi-valued for any inputs. To overcome this difficulty, the processing elements in the Kohonen layer can be pre-conditioned to learn only about a specific class.
Counter propagation network is classified into two types. They are 1. Full counter propagation network
2. Forward only counter propagation network
In this section, the training algorithm and application procedure of full CPN is described.
Training Phases of Full CPN
The full CPN is achieved in two phases.
The first phase of training is called as In star modeled train-ing. The active units here are the units in the x-input (x = x1, . . . , xi, . . . xn), z-cluster (z = z1, . . . , zj, . . . , zp) and y-input (y = y1, . . . , yk, . . . , ym)layers.
Generally in CPN, the cluster unit does not assume any topology, but the winning unit is allowed to learn. This winning unit uses our standard Kohonen learning rule for its weight updation. The rule is given by
vij(new) = vij(old)+ α(xi− vij(old))
= (1 − α)vij(old)+ αxi; i = 1 to n wjk(new) = wkj(old)+ β(yk− wjk(old))
= (1 − β)wkj(old)+ βyk; k = 1 to m
In the second phase, we can find only the J unit remaining active in the cluster layer. The weights from the winning cluster unit J to the output units are adjusted, so that vector of activation of units in the y output layer, y*, is approximation of input vector x. This phase may be called the out star modeled training. The weight updation is done by the Grossberg learning rule, which is used only for out star learning. In out star learning, no competition is assumed among the units, and the
120 Computational Intelligence Paradigms
learning occurs for all units in a particular layer. The weight updation rule is given as,
Uij(new)= ujk(old)+ α(yk− ujk(old))
= (1 − a)αjk(old)+ ayk; k = 1 to m tji(new)= tji(old)+ α(xi− tji(old))
= (1 − b)tji(old)+ bxi; i = 1 to n
The weight change indicated is the learning rate times the error.
Training Algorithm The parameters used are
x- Input training vector x=(x1, ...xi, ...xn) y- target output vector Y=(y1,...yk,...ym) zj- activation of cluster unit Zj
x*- approximation to vector x y*- approximation to vector y
vij- weight from x input layer to Z-cluster layer wjk- weight from x input layer to Z-cluster layer tji- weight from x input layer to X-cluster layer ujk- weight from x input layer to Y-cluster layer α , β- learning rates during Kohonen learning a, b- learning rates during Grossberg learning
The algorithm uses the Euclidean distance method or dot product method for calculation of the winner unit. The winner unit is calcu-lated during both first and second phase of training. In the first phase of training for weight updation Kohonen learning rule is used and for
The algorithm uses the Euclidean distance method or dot product method for calculation of the winner unit. The winner unit is calcu-lated during both first and second phase of training. In the first phase of training for weight updation Kohonen learning rule is used and for