D. Leader Dependence on Followers

This section does not intend to give a complete theoretical foundation of Self-Organizing Maps (SOMs) [Kohonen(1995)]. In order to understand better of SOMs, some back-ground of SOMs and relevant part of SOM theories are addressed. (More details can be found in books on SOMs such as [Kohonen(1982), Kohonen(1988), Kohonen(1995), Hinton and Sejnowski(1999), Haykin(1999)].)

SOM [Kohonen(1995)] is an unsupervised neural network, which does not require that the user specifies desired outputs, in contrast to the supervised neural network, which require that one or more outputs are specified in conjunction with one or more inputs

to find patterns or relations between data [Haykin(1999)]. SOM is also a feed forward neural network which uses an unsupervised training algorithm, and through a process called self-organization, configures the output units into a topological representation of the original data [Kohonen(1982)].

The SOM algorithm is based on competitive learning. SOM reduces multi-dimensional data to a lower dimensional map or grid of neurons [Hinton and Sejnowski(1999)]. It provides a topology preserving mapping from the high dimensional space to map units [Kohonen(1988)]. Map units or neurons usually form a two-dimensional grid and thus the mapping is a mapping from a high dimensional space onto a simple topology, e.g.

rectangular or hexagonal. The property of topology preserving means that a SOM groups similar input data vectors on neurons: points that are near each other in the input space are mapped to nearby map units in the SOM. The SOM can thus serve as a clustering tool as well as a tool for visualizing high-dimensional data.

SOM consists of two layers of processing units [Kohonen(1995)]: the first is an input layer containing processing units for each element in the input vector; the second is an output layer or grid of processing units that is fully connected with those at the input layer. The number of processing units at the output layer is determined by the user based on the initial shape and size of the map that is desired. Unlike other neural networks there is no hidden layer or hidden processing units [Haykin(1999)].

6.1.1 SOM Algorithm

The principal goal of the SOM algorithm developed by Kohonen [Kohonen(1982)] is to transform an incoming signal pattern of arbitrary dimension into a one- or two-dimensional discrete map, and to perform this transformation adaptively in a topological ordered fashion. When an input pattern is presented to a SOM network, the winning output unit will be the unit whose incoming connection weights are the closest to the input pattern in terms of Euclidean distance [Kohonen(1995)]. Thus, the input is pre-sented and each output unit competes to match the input pattern. The output that is closest to the input pattern is declared the winner. Often starting from randomised weight values, the output units slowly align themselves such that when an input pattern is presented, a neighbourhood of units responds to the input pattern. The connection weights of the winning unit are then adjusted, i.e. moved in the direction of the input pattern by a factor determined by the learning rate.

As training progress, the size of the neighbourhood around the winning unit and the learning rate will decrease [Kohonen(1995)]. Initially large numbers of output units will be updated, but as the training proceeds, smaller and smaller numbers are updated until at the end of the training only the winning unit is adjusted. SOM creates a topological mapping by adjusting not only the winner’s weights, but also adjusting the weights of

the adjacent output units in close proximity to the neighbourhood of the winner. So, not only is the winner adjusted, but also the whole neighbourhood of output units is moved closer to the input pattern.

There are three basic steps involved in the application of the algorithm after initial-isation, namely, sampling, similarity matching, and updating. These three steps are repeated until the map formation is completed. The algorithm is summarized as follows based on Kohonen’s book [Kohonen(1988)]:

1. Initialisation. Choose random values for the initial weight vectors wj(0). The only restriction here is that the w_j(0) be different for j = 1, 2, ..., N,, where N is the number of neurons in the lattice. It may be desirable to keep the magnitude of the weights small.

2. Sampling. Draw a sample x from the input distribution with a certain probability;

the vector x represents the sensory signal.

3. Similarity Matching. Find the best-matching (winning) neuron i(x) at time n, using the minimum-distance Euclidean criterion: i(x) = argmin_j||xn− wj||, j = 1, 2, ..., N

4. Updating. Adjust the synaptic weight vectors of all neurons, using the update formula

mj(n + 1) =

( wj(n) + η(n)[x(n) − wj(n)], j ∈ Λ_i(x)(n)

w_j(n), otherwise

where η(n) is the learning-rate parameter, and Λ_i(x)(n) is the neighbourhood func-tion centred around the winning neuron i(x); both η(n) and Λ_i(x)(n) are varied dynamically during learning for best results.

5. Continuation. Continue with step 2 until no noticeable changes in the feature map are observed.

The learning process involved in the computation of a feature map is stochastic in nature, which means that the accuracy of the map depends on the number of iterations of the SOM algorithm. Moreover, the success of map formation is critically dependent on how the main parameters of the algorithm, namely, the learning-rate parameter η and the neighbourhood function Λ_i are selected. Unfortunately, there is no theoretical basis for the selection of these parameters.

6.1.2 SOM’s Properties

The SOM has properties of both vector quantization and vector projection algorithms.

6.1.2.1 Quantization

The quantization from the N training samples to M prototypes reduces the original data set to a smaller, but still representative, set to work with. Further analysis is performed primarily, or at least initially using the prototype vectors instead of all of the data.

Using the reduced data set is only valid if it really is representative of the original data.

When the number of prototypes approaches infinity and neighbourhood width is very large, numerical experiments have shown that the results are relatively accurate even for a small number of prototypes [Kohonen(1999)]. While the connection between the density of prototypes of SOM and the input data has not been derived in the general case, it can be assumed that SOM roughly follows the density of the training data. The primary benefit of using a reduced data set is that the computational complexity of subsequent steps is reduced. Another benefit of vector quantization is that it usually involves averaging of data samples, thus removing zero-mean noise and reducing the effect of outliers.

6.1.2.2 Projection

Since the prototype vectors of SOM have well-defined positions on the low-dimensional map grid, SOM is a kind of vector projection algorithm. The projection of a data sample can be defined to be the index b or location r_b of its BMU on the map grid. The projection is discrete as it can only get as many values as there are map units. Therefore, different vectors may be projected to the same point. Also, since the shape of the map is defined beforehand, information of the global shape of the data manifold is lost. The topological ordering of map units depends primarily on the local neighbourhood, which is defined on the map grid. Since there are more map units where data density is high, the neighbourhood in these areas becomes smaller as measured in the input space. Thus, the projection tunes to local data density.

In document BURAK OC (página 42-48)