• No se han encontrado resultados

4. ANÁLISIS DEL SECTOR DE LA GESTIÓN INTEGRAL DE LOS RESIDUOS

4.3 BENCHMARKING COMPETITIVO

4.3.2 Control de Gestión

4.3.2.1 De acuerdo al Decreto 1077 de 2015 Por medio del cual se expide el

4.3.2.1.1 Puntos críticos

Figure 7*1 «1 shows a population of 100 points comprising four bivariate normal distributions having means (Ï3, -3) and unit

' 1 6 ^ — + 4' t 4- 4- 4- 4-u_ 4- 4- + O, 0, 44* + 4- 1 4- . d I 4-“ -H- 4"nr 4~

Figure 7*1«1. A 100 point A-cluster population generated from 4 bivariate normal distributions. Two origins of coordinates were used, these being sited at 0-| and Og respectively. The starting classi­ fications chosen for iterative relocation are indicated by:

START 1 (four bad random points); START 2 ^ (four good points);

START 3 (the four optimum clusters, parti­ tioned by the coordinate axes through 0 ).

variances (for both variables within each group); since the

variances on both axes were equal, the data were effectively stan­ dardised. Tifo origins of coordinates (O^ and 0^ in figure 7-1.1)

were chosen to demonstrate those similarity measures which are origin dependent. In both cases, three initial classifications were used to start the iterative relocation procedure. These were

-170- four clusters, described as follows;

START 1 : A bad points, shown by * in figure 7,1.1, which were selected from the same bivariate normal distribution,

START 2; A good points, shown by © in figure 7'1«1, each selected from a different distribution. This can be described as a part- optimum initial solution.

START J>: The four optimum clusters, partitioned by the coordinate

axes through 0^ in figure 7,1.1. The intention of supplying the expected final result as starting solution is to expose unstable or badly defined similarity criteria.

Each of the twelve similarity criteria shown in Table 7*1 were submitted to iterative relocation using these six combinations of origin and starting solution; naturally, some final classifi­ cations were duplicated, and of the total of 72 tests 17 unique

results were obtained. Twelve of these are shown in figure 7.1.2 using partition lines to demark cluster boundaries; four of the other five results were sufficiently random to preclude the drawing of partition lines, and the fifth comprised one cluster being the entire population. The 72 results are identified in Table 7.1.1, which also shows the number of iterations required before stability was reached. In two tests the maximum of 15 iterations was com­ pleted, so that the procedure terminated without reaching stability.

a J-T f'-i- -f 4- -f -U- -|- + .t + È .. " +-4. . 44-Jr- _L ’ 37 + of ' “I -f T* 10 _L + "r..L ■u-rj 4- "HT'

Figure 7.1.2. 12 different partitions of the data of figure 7*1• obtained during 6 tests to optimise 12 similarity criteria by

iterative relocation. The number against each square is the frequency of the result in Table 7.I.1.

SIMILARITY CRITERION START ^1 1 2 3 °2 START 1 2 3 Ol START 1 2 02 START 1 2 3 Distance a a a a a a 6 1 0 6 1 0 Average Distance a a a ' a a a 4 1 0 4 1 0 Similarity Ratio a a a i a a 4 1 0 3 1 0

Error Sum of Squares a a a a a a 5 2 0 5 2 0

Variance a a a a a a 11 6 0 12 6 0 Cosine a a a d h h 4 1 0 3 2 2 Nonmetric R R R a a a U 8 u 6 1 0 Size Difference b b b b b b 2 3 2 2 3 2 Shape Difference c e e c e e 4 3 3 4 3 3 Dispersion g g S S g g 2 1 1 2 1 1 Correlation g g R g g f 4 4 4 2 2 5 Dot Product 1 a a - k j 3 1 0 2 3

Table 7.1.1 - Summary of the results obtained using itera­ tive relocation to optimise twelve different similarity criteria with the data of figure 7«1* 1 * The two columns of letters correspond to the twelve final partitions shown in figure 7.1.2 ^ and on the right are the numbers of

iterations required for convergence in each test. The data were clustered using three different starting solutions with reference to two origins of coordinates O-j and Og (figure

7.1.1), making a total of six tests against each criterion.

R denotes a grouping which was sufficiently random to pre­ clude the drawing of partition lines; - denotes the dot product test in which only one final cluster was obtained; U specifies the two tests for which the nonmetric coefficient

-171-

certain extent, self-evident; they may be summarized as follows: 1) Distance, Average Distance, Similarity Ratio, Error Sum of Squares and Variance all perform satisfactorily, although Variance exhibits a certain instability.

2) Cosine and Nonmetrlc are origin dependent. It is worth noting that the Nonmetric coefficient is often used with unstan­ dardised all-positive scores (e.g. binary I/O data, or species frequencies in stands) - the satisfactory performance of this coefficient using origin Op therefore accounts for its successful usage (Lance and Williams, 1966b).

3) Dispersion, Correlation and Dot Product are very unsatis­ factory, and their further use is not recommended.

4) Size Difference and Shape Difference produce interesting results, although their value can probably be questioned. Both coefficients are origin independent, and the resulting elongated clusters could be regarded as symptomatic of the need to eliminate such internal factors of variation as shape and size, respectively,

5) As expected, the part-optimum starting solution (START 2)

yields faster convergence than the random initial classification (start 1), especially with the first five coefficients in Table 7.1 .1 .

Large Populations

-172-

tlien they must also be duplicated with large populations. Figure

7.1 .3 shows a population of 8OO points generated from the same model

as figure 7*1 *1j thus each cluster increases in size by a factor of 8. Figure 7*1«3 also shows the four points which constitute the

random starting solution (START 1) with these data. The part-

optimum starting classification (START 2) was replaced in this case with the worst population-partition which could be devised (shown

in figure 7« 1.4); every fourth point was allocated to the same

cluster, and since the distributions were generated in blocks of 200 points numbered sequentially, each starting cluster contained •jj- of each final cluster. Table 7*1 -2 shows the results of iterative

relocation using the three starting solutions with the origin at the intersection of the coordinate axes.

These tests confirm that Variance is unreliable (failing to converge within 15 iterations, excepting with the optimum result

as starting solution). Distance, Average Distance, Similarity Ratio and the Error Sum of Squares all perform satisfactorily, although the Error Sum of Squares finds an unstable local optimum result (i) with the random starting solution, A rather unexpected finding is that the ’worst possible' population-partition of figure 7'1.4 yields a much faster convergence than the four random points

(figure 7.1.3). It seems, therefore, that in the absence of a

suitable part-optimum starting solution, a random population-partition is probably a better initial classification than k random points.

-173" 4- 4- 4* 4- 4-4- -4-n- J L T 4-, ÿ m t - 4- ^4- 4- 4-4- ^ 4- 4- 4" 4- 4- * 4- 4- '1’4- -1 4 + *f -1-4- 4-, -u 4P' 4- n'-r 4_ 4" "t -pr JT

4#%-^

?"4u_ -r 4- 4-

. a # ;

4-4- 4- t 4- 44- * 4- -%4- 4-

Pigure 7-1.3' An 800 4-cluster distribution generated from the same distribution as figure 7.1.1. 0 denotes the 4 bad random

points used as starting solution with iterative relocation to optimise the 5 well-conditioned similarity criteria listed in

Table 7.1.2. The origin of coordinates is at the intersection of the axes, which serve to partition the population into the optimum 4-cluster solution.

- 1 7 4 - - 2 H 1 2 H S 11 M i .?3 3 H 2 H._ 1 D 3a. ,.3 LL ^ Y @ % m- , M l ïîr> vsA-» : 1 i¥\3H s m % 2 a

Figure 7.1.4. A bad random population-partition of the data of figure 7.1.3 into four clusters. Each digit is the code of the cluster which has been allocated the corresponding point of figure 7.1 .[$.

-175-

similarity CRITERION

STARTING CLASSIFICATION 4 POINTS CLUSTERS4 BAD RESULTOPTIMUM (T-i .3) (7*1 .4) (7.1 .3)

Distance a 7 a 2 a 0

Average Distance a 5 a 2 a 0

Similarity Ratio a 5 a 2 a 0

Error Sum of Squares i 4 a 2 a 0

Variance a U a U a 1

Table 7*1.2. Summary of the results for iterative relocation

using the data of figure 7*1 .3 to optimise 5 similarity criteria

for 4 clusters. Each similarity measure was tested using three starting conditions: 4 BAD RANDOM POINTS (shown in figure 7.1O); 4 BAD CLUSTERS (shown in figure 7,1,4); and the OPTIMUM RESULT (indicated by the partition of the coordinate axes in figure 7*1.3). The type (a or 1) of result is illustrated in figure

7*1*2 - note the single convergence of the error sum of squares

to a sub-optimum solution (type 1). The figures are the numbers of iterations required for convergence (U denotes no convergence after 15 Iterations),

Consistency of Results

One criticism of the tests so far used is that the populations contain four very distinct clusters, a situation seldom found in

'real' data. It is quite possible that if these clusters had been overlapping to a greater extent, then the results for the four successful similarity measures would not have been so good. To check the consistency of the similarity measures in finding an

obscure classification, the iterative relocation procedure was used to partition a unimodal bivariate normal distribution, shown in

-176-

flgure 7*1.5. The starting solution of two points (ringed in

figure 7*1 *5) generated the final partition shown by the partition

line of figure 7*1-5j for which the error sum of squares had the value 14 0.8 3» Figure 7*1*6 shows a random population-partition

used as starting solution, together with its final 2-cluster partition; in the latter case, the value of the error sum of squares was 137*18 - a slight improvement on the previous result.

These two very different results were duplicated in each test of all four similarity criteria, shown below. The difference in the partitions is very distinct, and it is probable that other unique results could have been obtained by carefully choosing pairs of points to act as initial cluster centres. The number of iterations required for convergence of the starting solution of figure 7*1,5

(two random points), and the population-partition (figure 7*1*6) with each test were as follows :

7*1 ,5 (2 points) 7*1.6 (population-partition)

Distance 3 7

Similarity Ratio 3 3

Average Distance 4 8

Error Sum of Squares 3 3

It is noticeable that the population-partition takes longer to con­ verge than the 2 random points. This finding is exactly the reverse of the 4 cluster test (previous paragraph), but since the population- partition appears to yield a slightly better final classification the

177-

"T

Figure 7*1 »5* A standardised mimodal bivariate normal distribution showing two points (indicated by #) which were used as initial

centres for iterative re­ location to optimise four

similarity criteria (the first four in Table 7*1*1) at the two cluster level. The partition line indicates the final classi­ fication obtained with all four measures, and the error sum of squares for this grouping was

140.83*

1.

Figure 7*1 *6. The random popu­

lation-partition of the

distribution in figure 7*1*5 used

as starting classification with iterative relocation at the 2

cluster level. The same final partition was obtained with all four tested similarity measures and the error sum of squares for this grouping was 137 *1 8,

suggesting that it improved the previous result shown in figure 7*1 *5.

-178-

conclusion and preference still hold. The tests also point to a certain lack of consistency in the iterative relocation method under different initial classifications.

Elongated Clusters

One general demand that can reasonably be made of a classi­ fication method is that if there exists a data set having, say, 4 well-defined clusters (such as those in figure 7*1 *3) which the

method finds, and if we remove two of these clusters then the method should also successfully find the two remaining clusters. To test the iterative relocation procedure in this way, two 100-point

bivariate normal distributions were generated with means ( - 3 j O ) and

unit variances; that is, from the same model as that used to gener­ ate the two upper clusters in figure 7.1.3* After standardisation, the clusters are seen to be elongated parallel to the y-axis

(figure 7*1*7)* At this stage it was thought sufficient to test the distance criterion with Iterative relocation (i.e. the k-mean system), since the other three satisfactory similarity measures have previously behaved almost identically. Ten 2-cluster

starting solutions were devised, as follows: five variants of two initial bad points were obtained by selecting two random individuals from the same elongated cluster; four variants of two good points were defined by two individuals, one from each of the elongated

clusters; the tenth starting solution was the random population- partition derived by allocating every other point in both of the

79*

nr

“T

dr±Ai

Figure 7'T*7* An example of two parallel clusters elongated through standardisation, and the two final partitions obtained using iterative

relocation to optimise distance (the k-mean system). Lines join pairs of points chosen as starting solutions for which the two stable partitions A and B were derived. The error sum of squares in each case was: A(113*9) and

180

elongated clusters to the same starting group.

Every test converged to one of two stable solutions, shown by the partition lines of figures J , J A and 7,1,7B, respectively.

The random population-partition produced the preferred result (figure Y,1,YA), and the final solutions are shown with their cor­ responding starting solutions indicated on each diagram of figure 7*1.7 by joining the two initial cluster centres with lines.

These results suggest that the iterative relocation pro­ cedure does not reliably find elongated clusters, and they support the previous criticisms (Sect, 6.1) that ’minimum-variance’ tech­ niques are inclined to force spherical clusters and may derive partitions which cut across dense swarms of points (viz, figure

7.I.7B). Once again, the random population-partition appears to

be preferable to the choice of random points or individuals for the initial classification.

7 . 2 HIERARCHICAL MODE ANALYSIS

When the distribution of figure 7*1 *7 was subjected to hier­ archical Mode analysis it was found that the preferred partition

(a) was obtained during each of 8 trials. However, this encouraging

result was slightly offset by the generation of other lower level classifications. Using the average distance as density estimate

(Sect. 6.6), values of k (the density parameter) from ^ to 10 were tried, and it was found that the method recognised six basic clusters

(shown in figure 7*2.1). Generally speaking, the higher the value of

-I8i

,tA:4

■tH*

©

Figure 7*2.1. The six-cluster partition of the data of figure 7*1*7 obtained using hierarchical Mode analysis. The optimum

2-cluster solution (1+2+3) and (4+5+6) was found eventually

during all of the tests, and different combinations of these six sectional groups were obtained by varying the density parameter k (see also figure 7*2.2).

k the fewer were the number of clusters (indeed, for k > 8 only

the 3 and 2 cluster levels were obtained). All of the unique results are shown by the dendrograms of figure 7.2.2, where the

cluster codes (1-6) correspond to the partitions in figure 7*2.1.

It is noticeable that a very large jump occurs prior to the 2- cluster grouping in each case, indicating stable separation of the two elongated clusters.

The question now arises: should hierarchical Mode analysis only recognise the two elongated clusters, or is it reasonable to

182-