• No se han encontrado resultados

Subsistemas Multiagente.

5.2.5. Agente de Evaluación.

To demonstrate the effectiveness of the proposed algorithm, numerical experiments were carried out using a number of real-world data sets. Algorithm13was coded in NetBeans IDE

under Java platform and tested on a MAC OSX with 2.7GHz core i7 CPU and 10GB of RAM. 8 data sets, 2 small (Iris and Wine), 2 medium size (TSPLIB1060 and Image Segmentation), 2 large (D15112 and Gamma Telescope) and 2 very large (NE and Pla85900) were used in experiments. A brief description of data sets is presented in Table 10. More details can be found in [14,119,135].

The results obtained by the Split and Merge algorithm, which is in Step 1 of Algorithm

13, are presented in Table7.2. In this table, fmin is the value of the problem (7.1), β∗ is the

value of the parameter β corresponding to fmin, |Ψ| is the number of initial neurons and t is

the CPU time.

Table 7.2: The results of the Split and Merge algorithm

Data sets β∗ |Ψ| fmin t

Fisher’s Iris Plant 0.05 23 4.95 ×101 0.02

Wine 0.05 31 8.62 ×105 0.02 TSPLIB1060 0.05 30 6.43 ×109 0.05 Image Segmentation 0.10 62 3.17 ×107 0.14 D15112 0.10 72 2.53 ×1011 7.13 Gamma Telescope 0.05 87 2.01 ×108 6.61 NE 0.20 61 3.66 ×102 4.57 Pla85900 0.05 75 3.07 ×1015 1.09

Let ESOM and E be values of the quantization error obtained by the SOM and the

Modified SOM, respectively. Then the improvement P achieved by the Modified SOM in comparison with the result of the SOM is defined as

P = ESOM − E ESOM

· 100%. (7.27)

The values of quantization error using equation (1.6) for different iterations and different data sets are presented in Tables7.3-7.4. From these results one can see that the Modified SOM outperforms SOM in all data sets. The maximum improvement of 42.7% is obtained in Wine data set. The improvement P in Image Segmentation and Iris data sets is 38.3% and 24.8%, respectively. The minimum improvement is obtained in Pla85900 data set which is 4.4%. On other data sets the improvement P is between 6.9% and 14.1%. Note that the Modified SOM starts with a smaller value of E than the SOM algorithm. This is due to the use of the special initialization procedure in the Modified SOM algorithm.

Table 7.3: Results for small and medium size data sets

iter E t E t iter E t E t

Iris TSPLIB1060

SOM Modified SOM SOM Modified SOM 2 3.17E+00 0.06 2.96E-01 0.02 2 4.84E+08 0.23 2.47E+03 0.08 4 2.05E+00 0.08 2.76E-01 0.03 4 2.03E+05 0.39 5.60E+03 0.11 6 1.95E+00 0.09 2.29E-01 0.03 6 7.32E+03 0.52 2.03E+03 0.12 8 9.70E-01 0.09 2.23E-01 0.05 10 5.57E+03 0.70 3.66E+02 0.17 10 5.56E-01 0.11 2.22E-01 0.05 14 3.82E+03 0.90 3.19E+02 0.22 12 3.51E-01 0.12 2.22E-01 0.05 18 1.29E+03 1.08 3.17E+02 0.25 14 2.88E-01 0.12 2.22E-01 0.06 22 3.26E+02 1.23 3.17E+02 0.30 16 2.86E-01 0.14 2.22E-01 0.06 25 3.21E+02 1.36 4.67E+02 0.33 20 2.86E-01 0.16 2.15E-01 0.08 30 3.21E+02 1.56 2.99E+02 0.37

Wine Image Seg.

SOM Modified SOM SOM Modified SOM 2 6.65E+02 0.08 3.45E+01 0.03 2 1.82E+07 0.73 1.01E+02 0.40 4 2.21E+02 0.11 1.33E+01 0.06 4 2.41E+03 1.28 8.01E+01 0.62 6 2.00E+02 0.12 1.19E+01 0.08 6 1.84E+02 1.75 2.90E+01 0.84 8 1.48E+02 0.14 1.11E+01 0.09 10 1.42E+02 2.74 1.89E+01 1.26 10 6.18E+01 0.17 1.12E+01 0.09 14 1.02E+02 3.65 1.75E+01 1.68 12 2.93E+01 0.19 1.12E+01 0.11 18 4.55E+01 4.54 1.74E+01 2.09 14 1.87E+01 0.20 1.12E+01 0.12 22 2.69E+01 5.40 1.74E+01 2.51 16 1.85E+01 0.20 1.54E+01 0.14 25 2.69E+01 6.04 1.75E+01 2.84 20 1.85E+01 0.23 1.06E+01 0.16 30 2.69E+01 7.00 1.66E+01 3.35

in all data sets except Pla85900. The Split and Merge algorithm that initializes the Modified SOM is very efficient and it is not time consuming. The new initialization algorithm which is based on the Split and Merge algorithm speeds up the convergence of the Modified SOM and makes it less time consuming than the SOM. The maximum time reduction by the Modified SOM, comparing with the SOM, was achieved in D15115 and Gamma Telescope data sets. On the other hand the minimum computational time reduction is on two very large data sets: Pla85900 and NE data sets.

The dependence of the CPU time on the number of iterations for the SOM and Modified SOM algorithms using D15112 and Gamma Telescope data sets is given Figure 7.8. Note that the Modified SOM requires more CPU time at the early iterations due to the use of the Split and Merge algorithm for initialization. Once the Modified SOM initialized, it converges much faster than the SOM which is initialized randomly.

Note that the error E shows the quantization quality of the network. However, there is a distortion measurement which can be used to calculate the overall quality of the map. Unlike the quantization error, the distortion measure ξ considers both vector quantization

Table 7.4: Results for large and very large data sets

iter E t E t iter E t E t

D15112 NE

SOM Modified SOM SOM Modified SOM 2 3.65E+04 7.36 1.56E+04 7.77 2 2.02E+07 5.18 3.28E+07 6.47 4 2.52E+08 14.35 5.15E+06 8.32 4 3.32E-01 9.86 8.56E+32 8.18 6 1.60E+04 21.28 1.71E+09 8.86 6 3.01E-01 14.56 3.88E-02 9.89 10 9.59E+03 34.99 1.37E+05 9.97 8 2.70E-01 19.25 2.15E-02 11.54 18 1.57E+03 61.85 4.09E+02 12.20 10 2.56E-01 23.93 1.23E-02 13.20 22 7.26E+02 75.30 3.89E+02 13.29 12 1.94E-01 28.52 1.12E-02 14.88 26 4.28E+02 88.73 3.85E+02 14.37 14 5.41E-02 32.99 1.12E-02 16.55 30 3.80E+02 102.23 3.85E+02 15.46 16 1.16E-02 37.46 1.12E-02 18.24 40 3.80E+02 135.96 3.51E+02 18.14 20 1.12E-02 46.30 1.03E-02 21.51

Gamma Telescope Pla85900

SOM Modified SOM SOM Modified SOM 2 3.93E+20 10.97 8.28E+01 8.39 2 2.17E+113 4.29 5.81E+112 6.49 4 5.11E+12 21.32 4.08E+01 10.00 4 4.41E+05 8.14 1.43E+05 11.23 6 2.22E+03 31.73 3.46E+01 11.62 6 4.32E+05 11.72 1.99E+88 15.19 10 2.21E+02 53.17 3.15E+01 14.84 8 4.09E+05 16.99 3.09E+04 20.84 18 1.30E+02 95.26 3.02E+01 21.23 10 3.87E+05 18.69 2.74E+04 22.73 22 7.56E+01 116.35 3.00E+01 24.43 12 3.71E+05 22.23 2.63E+04 26.46 26 4.53E+01 137.56 2.99E+01 27.63 14 1.87E+05 25.91 2.61E+04 30.19 30 3.34E+01 158.70 2.99E+01 30.81 16 5.00E+04 29.44 2.61E+04 33.96 40 3.33E+01 210.52 2.86E+01 38.44 20 2.73E+04 36.41 2.61E+04 41.31

0 5 10 15 20 25 30 35 40 0 20 40 60 80 100 120 140 160 Iteration t (Se c. ) SOM Modified SOM (a) D15112 0 5 10 15 20 25 30 35 40 0 50 100 150 200 250 Iteration t (Se c. ) SOM Modified SOM (b) Gamma Telescope

Figure 7.8: SOM vs Modified SOM using CPU time.

and topology preservation of the SOM. The distortion measure is defined as follows [8,13]:

ξ = X

xi∈A

X

wj∈Ψ,wj6=wc

hcjkxi− wjk2, (7.28)

where c is the BMU of xi and hcj is the value of the neighborhood function h, defined by

(7.23), for neurons c and j.

Table 7.5 presents the distortion measure (7.28) and the number of active neurons nact

for all data sets. One can see that the distortion error ξ obtained by the Modified SOM is less than that obtained by the SOM in all data sets. This is due to the topology of the

Modified SOM where the neurons from different dense areas are not connected. This prevents deterioration of the network from its optimal value of ξ and E simultaneously.

Table 7.5: Results of distortion measure on all data sets

SOM Modified SOM

Dataset ξ nact ξ nact

Fisher’s Iris Plant 1.25×10−6 69 3.62×10−7 92

Wine 1.68×10−2 72 6.23×10−3 104 TSPLIB1060 1.63×10−1 204 2.11×10−2 258 Image Seg. 3.26×10−4 210 8.73×10−5 490 D15112 6.19×10−2 397 2.69×10−2 710 Gamma Telescope 2.35×10−5 400 1.39×10−5 759 NE 1.66×10−1 375 3.11×10−2 610 Pla85900 7.66×10−6 400 4.97×10−6 636

7.6.1 Comparison with other algorithms

In this subsection the Modified SOM (MSOM) is compared with well-known high dimen- sional visualization algorithms such as: Growing Grid [67], Growing Neural Gas [66] and Growing Hierarchal SOM [117] using computational results. In the Growing Grid algorithm the number of iterations are set 10 times of that for the Modified SOM. In other algorithms parameters are defined as the same as the similar parameters in the Modified SOM. The CPU time limitation is set to 6 hours. The results for the quantization error E, defined by (1.6), are presented in Table7.6. In order to compare results by different algorithms the best known value Ebest of the quantization error and relative errors Re of algorithms are included

in this table. The relative error is computed as

Re=

¯

E − Ebest

Ebest

100,

where ¯E is the value of the quantization error (1.6) obtained by an algorithm.

Results presented in Table 7.6 demonstrate that the Modified SOM algorithm outper- forms all other algorithms in all data sets used in this research. The dash line shows that GHSOM algorithm failed to produce results in large data sets NE and Pla85900. Results also demonstrate that the SOM is quite efficient in data sets with small number of features (2 or 3). In small data sets (Iris Plant and Wine) GHSOM produced good results however it fails as the number of data points increases. Although the GG and GNG algorithms are not

Table 7.6: Comparison of different algorithms.

Data set Ebest Re

GG GNG SOM GHSOM MSOM

Fishers Iris Plant 2.148×10−1 27.27 18.58 33.33 14.04 0.00

Wine 1.061×101 69.65 37.43 74.28 48.51 0.00 TSPLIB1060 2.991×102 7.27 25.68 7.36 53.88 0.00 Image Seg. 1.658×101 60.34 27.36 62.22 218.59 0.00 D15112 3.510×102 39.30 10.09 8.31 135.18 0.00 Gamma Telescope 2.855×101 16.67 10.14 16.76 130.99 0.00 NE 1.032×10−2 121.63 66.88 8.23 - 0.00 Pla85900 2.610×104 938.81 134.04 4.65 - 0.00

computationally expensive, their results are not satisfactory in comparison with the Modified SOM and in some data sets also in comparison with the classical SOM.

Note that the quantization error, E, is similar to the notion of the compactness error, which is used in [56] to express the quality of the clusters obtained.

In Figure 7.9, the values of E obtained by the Modified SOM are compared with those obtained by other algorithms on Iris and Wine data sets. On both data sets the Modified SOM starts with a value of E close to the value of E at the global solution and converges to the optimal value within the given number of iterations. Since the SOM is initialized randomly, it takes more time to converge. The initial grown neurons of the GHSOM are closer to the optimal solution than those generated by GG and GNG algorithms.

2 4 6 8 10 12 14 16 18 20 0 0.5 1 1.5 2 2.5 3 3.5 Iteration E GHSOM GNG SOM GG MSOM

(a) Fisher’s Iris Plant

2 4 6 8 10 12 14 16 18 20 0 100 200 300 400 500 600 700 Iteration E GG GNG SOM GHSOM MSOM (b) Wine

Figure 7.9: Comparison of algorithms using E values.

The notion of distinctness error is introduced in [56] and can be formulated as:

D = X

wi,wj∈Ψ,i6=j

kwi− wjk. (7.29)

of D means better distribution of neurons. Results for the distinctness error using all data sets are presented in Table 7.7. In this table the best value Dbest of D obtained using all

five algorithms and also the relative error RD of results obtained by these algorithms are

included. The relative error RD is computed as follows:

RD =

Dbest− ¯D

¯

D 100.

Here ¯D is the value of the distinctness error obtained by an algorithm.

Results from Table7.7demonstrate that the Modified SOM outperforms other algorithms in all data sets except the Iris Plant data set, where GHSOM reached the maximum value of D. However the GHSOM algorithm fails in two large data sets NE and Pla85900 and performs poorly in two dimensional data sets TSPLIB1060 and D15112. In other data sets this algorithm performs better than GG, GNG and SOM.

Table 7.7: Results for the distinctness error

Data set Dbest RD

GG GNG SOM GHSOM MSOM

Fisher’s Iris Plant 1.460×104 1479.67 857.19 179.23 0.00 43.44

Wine 1.633×106 1436.65 120.54 128.91 35.64 0.00 TSPLIB1060 1.989×108 533.91 259.51 65.88 1082.92 0.00 Image Seg. 2.085×107 2764.31 1132.94 454.36 323.44 0.00 D15112 2.167×109 2216.38 384.15 221.13 1341.98 0.00 Gamma Telescope 6.050×107 2330.17 368.15 400.61 65.37 0.00 NE 4.924×104 7676.32 242.36 113.17 - 0.00 Pla85900 6.144×1010 1499.12 768.24 160.30 - 0.00

In order to demonstrate the time efficiency of the proposed algorithm in comparison with other algorithms the CPU time t required by algorithms is reported in Table 7.8. The GHSOM is not efficient in large data sets. The Modified SOM converges faster than other algorithms on three data sets: Iris, Wine and TSPLIB1060. In all other data sets the GG and GNG are faster than the Modified SOM. Only exception is the Pla85900 data set where the Modified SOM is faster than the GG.

Figure7.10displays the visualization of the D15112 data set and clusters in it obtained by algorithms. Clusters are visualized using Voronoi diagrams. One can see that the Modified SOM identifies dense areas and generates more neurons in such areas more efficiently than all other algorithms. The GHSOM algorithm performs better than other algorithms (except the Modified SOM) in identifying dense areas. However it fails to generate more neurons in such

Table 7.8: CPU time required by algorithms

Data set t

GG GNG SOM GHSOM MSOM

Fisher’s Iris Plant 0.44 0.31 0.25 2.06 0.08

Wine 0.43 0.44 0.26 2.87 0.15 TSPLIB1060 1.06 0.62 2.48 62.88 0.48 Image Seg. 3.07 1.39 7.40 303.68 3.57 D15112 16.36 2.68 277.61 10848.36 20.40 Gamma Telescope 29.76 10.21 385.67 18363.03 37.13 NE 16.69 9.77 74.00 - 21.05 Pla85900 62.27 11.58 42.18 - 38.07

areas in the given number of iterations. Notice that the SOM algorithm tries to distribute neurons uniformly over the data set, which is one of the drawbacks of the SOM [117].

7.6.2 Topology preservation

The comparison of topology preservation of the Modified SOM and other algorithms in TSPLIB1060 data set is presented in Figure 7.11. It can be observed that the Modified SOM spreads the neurons more efficiently than other algorithms. This can be proved by the error values, which are reported in Tables7.6-7.7. If consider white areas (where there is no input data) in Figure 7.11, one can see that the Modified SOM forces neurons to map the data accurately whereas many neurons of other algorithms are located in white areas. The topology of the Modified SOM is defined in order to prevent any attraction of neurons from different dense areas. This decreases the value of the quantization error E.

7.7

Summary

In this chapter, the Modified SOM (MSOM) algorithm is developed to solve large data visualization problems. The MSOM is novel in the sense of initialization algorithm and topol- ogy. The proposed algorithm is tested on 8 small to large data sets. Furthermore, the MSOM is compared with SOM-based data visualization algorithms in the sense of computational time and topology preservation.

(a) D15112 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 0 0.5 1 1.5 2 2.5x 10 4 (b) MSOM 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 0 0.5 1 1.5 2 2.5x 10 4 (c) GHSOM 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 0 0.5 1 1.5 2 2.5x 10 4 (d) SOM 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 0 0.5 1 1.5 2 2.5x 10 4 (e) GNG 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 0 0.5 1 1.5 2 2.5x 10 4 (f) GG

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 x 104 1000 2000 3000 4000 5000 6000 7000 8000 9000 (a) MSOM 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 x 104 1000 2000 3000 4000 5000 6000 7000 8000 9000 (b) GHSOM 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 x 104 1000 2000 3000 4000 5000 6000 7000 8000 9000 (c) SOM 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 x 104 1000 2000 3000 4000 5000 6000 7000 8000 9000 (d) GNG 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 x 104 1000 2000 3000 4000 5000 6000 7000 8000 9000 (e) GG

Figure 7.11: Topology preservation of algorithms in TSPLIB1060 data set (data points are in blue and neurons are in red color).

Chapter 8

Convolutional recursive modified

SOM for handwritten digits

recognition

8.1

Introduction

In this chapter, we present a semi-supervised tool for handwritten digit recognition using a Convolutional Structure of Recursive Modified SOM. The Modified SOM is presented in Chapter7.