4.5.1 Sensitivity Metric
In order to better evaluate the results attained, a Sensitivity Index (SI) is used as an operator to determine the sensitivity of each and every parameter The Sensitivity Index introduced by Hoffman and Gardner calculates the output difference when varying an input parameter from its minimum value to its maximum value [Hoffman and Gardner, 1983]. The SI is thus calculated by averaging the result difference between the minimum and maximum values attained from the optimal parameter value, as per table (4.6). SI is calculated as per equation (4.1):
SI=|Best − Low| + |Best − High|
2 (4.1)
The higher the SI value is the bigger the impact the parameter has on the overall model. The experiments which now follow are the combination of results attained from these baseline ex- periments augmented with results from 50 execution runs for each tested parameter.
Important issues for clustering algorithms are the number of parameters and how to find the best fitting values for different data sets. Too many parameters create an unnecessarily large search space for optimal settings as well as leading to over-fitting of training data. It is therefore useful to know how important the parameters are for the MPACA and how to limit the search space when applying it to different datasets. Tables (4.2) and (4.3) show that the best-fitting values for each parameter across a wide variety of datasets have low standard deviations, suggesting that they impact on results consistently across several runs. Furthermore, the ranges are comparable across the datasets, which means the starting points in the parameter search space can be confi- dently assumed to be in the region of the eventual optimal settings, which helps ensure they can be obtained accurately and within an acceptable time-frame.
Table (4.6) explores the problem in a little more detail, using two of the datasets: the Iris and Wine datasets with dimensionality of four and thirteen respectively. As the earlier analysis showed, many of the parameters are very tightly constrained, such as the edge length and step size. In fact, table (4.6) shows that most of them have optimal settings across all datasets that lie within a narrow range, making it easy to initialise them and learn the best fits. This includes those that have a high impact on performance, as indicated by the sensitivity values. This is
Parameters Best value Low (F-Measure) High (F-Measure) Sensitivity
Iris dataset Index (%)
Edge 8 (0.84) 5 (0.42) 14 (0.77) 0.25 Step-size 0.1 (0.83) 0.1 (0.83) 0.4 (0.53) 0.15 Complement 2 (0.83) 1 (0.83) 5 (0.76) 0.04 Range 2 (0.83) 1 (0.83) 5 (0.70) 0.07 Ph. Qty. 250 (0.84) 25 (0.74) 500 (0.55) 0.20 Evaporation 0.07 (0.83) 0.01 (0.69) 0.15 (0.71) 0.13 Coefficient 2 (0.84) 1 (0.83) 5 (0.62) 0.12 Residual 2 (0.84) 0 (0.76) 10 (0.66) 0.13 Feature 4 (0.84) 2 (0.71) 6 (0.71) 0.13 Colony 4 (0.84) 2 (0.66) 6 (0.68) 0.17 Visibility 4 (0.84) 1 (0.59) 8 (0.68) 0.21 Time-window 75 (0.83) 25 (0.43) 500 (0.59) 0.32
Parameters Best value Low (F-Measure) High (F-Measure) Sensitivity
Wine dataset Index (%)
Edge 8 (0.86) 5 (0.66) 11 (0.67) 0.20 Step-size 0.1 (0.86) 0.1 (0.86) 0.4 (0.50) 0.18 Complement 2 (0.87) 1 (0.86) 4 (0.84) 0.02 Range 2 (0.87) 1 (0.86) 5 (0.64) 0.12 Ph. Qty. 175 (0.86) 25 (0.65) 350 (0.63) 0.22 Evaporation 0.07 (0.86) 0.01 (0.66) 0.15 (0.64) 0.21 Coefficient 2 (0.86) 1 (0.84) 5 (0.60) 0.14 Residual 2 (0.87) 0 (0.79) 10 (0.61) 0.17 Feature 4 (0.86) 2 (0.75) 6 (0.72) 0.13 Colony 4 (0.91) 2 (0.74) 6 (0.71) 0.19 Visibility 4 (0.86) 1 (0.48) 8 (0.56) 0.34 Time-window 60 (0.86) 25 (0.52) 500 (0.69) 0.26
TABLE 4.6: Parameter sensitivity analysis as applied to the Iris and Wine dataset. Columns
represent parameter, the performance of its best value, the starting value, the highest value, and the sensitivity measure. The parameter values are accompanied by their F-measure in brackets for the overall model performance. The best performing or optimal values for this parameter are extracted from the baseline experiments presented in table (4.2). Low and high values are
executed over a set of fixed parameters for 50 instances each.
helpful because it ensures the parameter search does not have to traverse a large range, which would increase the chances of suboptimal settings. The sensitivity analysis combined with an understanding of the consistency of optimal settings suggests that the number of parameters within the MPACA will not undermine its usefulness for clustering different types of datasets.
4.5.2 Pheromone Driven versus a Random Model
A key question to be asked is; to what extent is pheromone important and is a pheromone driven search more powerful when compared to a random search? Unfortunately, in literature many ant algorithms are usually not compared against a random driven approach. By analysing the
Dataset Parameter Value 1 (F-Measure) Value 2 (F-Measure) Average (F-Measure) Wine dataset
Iris Ph. Qty. 0 (0.59) 25 (0.74) 250 (0.84)
Wine Ph. Qty. 0 (0.41) 25 (0.65) 175 (0.86)
TABLE4.7: Pheromone Driven versus a Random Model applied to the Iris and Wine dataset.
Columns indicate the following representations, the dataset being discussed, the applicable parameter being discussed, value 1 is when pheromone is null, value 2 is when pheromone is set to a higher value, average value represents an average amount of pheromone, both of the above have their respective F-Measure included in brackets. The best performing or optimal values for this parameter are representative of the average best-fits extracted from the baseline
experiments presented in table (4.3).
influence of the pheromone factor, it is possible to determine the direct effect, or lack of it, that this has on final clustering results.
When a value zero is considered this indicates purely random ant interaction. This is coupled with an actual minimum value, one which has been derived by theoretical analysis earlier, which established that a minimum quantity to be deposited is 25 units. Empirical analysis presented in table (4.2) shows that the average parameter value for the Iris and Wine datasets is respectively 250, and 175 units each. The optimal amount of pheromone deposited varies between dataset sizes.
Results in table (4.7) demonstrate that when no pheromone is present, cluster quality is very low. As the amount of pheromone used is increased, the cluster quality increases. Even a small amount, in this case 25 units, immediately impacts the clustering process. This demonstrates that clustering results are superior when pheromone is introduced, and proves the importance of the stigmergic effect of pheromone.