Lingüística pragmática y gramática del discurso
2.3. U N MARCO METODOLÓGICO COMPLEMENTARIO PARA EL ANÁLISIS ACÚSTICO DE LA ENTONACIÓN SUSPENDIDA
2.3.1. Modelos para el análisis de la entonación
To date there has been some work in the literature on applying MOEA to the clustering problem. A number of algorithms have been implemented and developed but so far there has been no developments that establish the best implementation of an MOEA for clustering.
Early attempts at applying genetic algorithms showed promise [11, 115]. The results of the genetic algorithm were better than those generated by the k-means algorithm on a single data set. A later attempt found similar results when applied to larger data sets but was not practical for real world usage as the algorithm took too long to execute [43]. Other studies performed in the same year used evolutionary algorithms as an initialisation procedure to determine an initial set of prototype centroids for classical clustering algorithms to cluster data [92] and to cluster regions of images [131]. Later algorithms [90, 100, 133] used hybrid strategies that combined
CHAPTER 4. SOLVING PROBLEMS WITH MULTIPLE OBJECTIVES 94 k-means with genetic algorithms to aid the search and found some promising results. An early comparative study [23] focussed on small data sets and a limited number of representations. The findings indicated that the performance of these algorithms was not good as they took a large period of time to execute. It was shown that the choice of representation was important and the objective function was a main con- tributing factor to performance of the algorithm. It appeared that a representation based on labelling the cluster of individual solutions worked well. This representa- tion has also been used in other algorithms since this study was published [96]. We described this representation in Section 4.4.2.
A number of experimental studies and literature reviews have focussed on apply- ing MOEAs to the crisp clustering problem[63, 70, 123, 110]. Studies focussed on applying MOEA to the fuzzy clustering problem with some success [105].
In 2004 Handl designed and implemented an interesting algorithm called Voronoi Inisalized Evolutionary Nearest-Neighbour Algorithm (VIENNA) [56] that used an LBIE encoding in conjunction with the Pareto Evolutionary Strength Algorithm (PESA-II) algorithm [27, 26]. The algorithm used objectives based upon Connec- tivity that we previously described in Section 2.3.7. It did not have a crossover operator but used a mutation operator that moved several objects from one cluster to another cluster.
Handl later improved VIENNA with MOCK (Multi-Objective Clustering with automatic determination of the number of clusters) [57, 58, 59, 62, 61]. MOCK improved on VIENNA by introducing a novel adjacency graph based representation of a clustering solution that works well with Connectivity and also introduced the uniform crossover operator. MOCK uses the Gap statistic [141] to determine the solution that occurs at a ’knee’ in the Pareto front as the final clustering solution. We have not implemented MOCK here as it has a very specific implementation of its representation. In 2005 Handl described MOCK-am [60], this version of the algorithm was based upon MOCK but used an MBBE representation instead of the graph based representation, while this implementation performed faster their later
experiments continued to use the graph based representation as it produced better clustering solutions.
MOCK has also inspired a number of other clustering algorithms. The graph based representation is now being used for applications within social networking to identify groups of users [82, 37]. A number of other algorithms have also been devised that either extend or slightly modify MOCK. Chen introduced a variation of MOCK called MOEAD [20] that used NSGA-II instead of PESA-II and used the CBRE to represent the clustering solutions. Shirakawa [134] introduced another variant of MOCK that specialised in identifying regions in images, this was based on SPEA2 and introduced a modified version of Connectivity called Edge designed to identify the boundaries between regions of colour. Qian implemented MECEA [119] which is the same as MOCK with the exception that it uses a novel technique for merging the Pareto solutions together to find edges in images.
In 2000 Maulik and Bandyopadhyay [102] introduced a genetic algorithm for clus- tering that used a version of Centroid Based Real Encoding we described in Section 4.4.3 to represent clusters, this implementation used a fixed number of clusters. They crossed over their solutions used the single point crossover and mutated indi- vidual values of each centroid by multiplying them with positive or negative values randomly drawn from a uniform distribution in the range [0,1]. Later in 2007 they expanded upon this with a novel Multi-objective algorithm for clustering, MOGA [5]. MOGA uses NSGA-II and is designed for detecting regions in satellite imagery. This algorithm differs from some of the other algorithms as it identifies fuzzy clus- ters. The algorithm used two objectives XB [149] and Jm. XB is a cluster quality
measure for fuzzy clustering that uses the ratio between the total variation and min- imum separation of the clusters which is similar to the measures defined for crisp clustering later by Halkidi that we defined in Section 2.3.4. The clustering solution with the highest value of the I index [103] was chosen as the final clustering solution. They later expanded upon this algorithm with MOGA-SVM [111], this iteration of the algorithm improved upon the previous version of the algorithm by using a novel
CHAPTER 4. SOLVING PROBLEMS WITH MULTIPLE OBJECTIVES 96 technique where an SVM [143] is combined with the results of NSGA-II to select the final clustering solution. MOGA-SVM has also been applied to bioinformatics problems [104]. The most recent version of this algorithm MOVGA [112, 105] now allows for the number of clusters to be varied. MOVGA uses updated versions of the objective functions and reverts to using the I index to select the final clustering solution.
4.6
Summary
In this chapter we have introduced the concept of multi-criteria decision making, dominance and the Pareto front. We then reviewed Genetic Algorithms and then Multi-Objective Evolutionary Algorithms. We also introduced a range of techniques that allow us to assess the quality of Pareto fronts: the volume of the dominated space, coverage, generational distance, inverted generational distance and a measure of entropy.
We then detailed some of the other work that has used MOEA to attempt to solve clustering problems. In particular we detailed three possible representations of the clustering problem with a range of mutation and crossover operators that work with these representations. Later we will perform experimental evaluations of various combinations of these operators and representations.
A Novel Multi-Objective
Evolutionary Clustering
Algorithm
5.1
Introduction
Multi-Objective Evolutionary Algorithms (MOEAs) have some good potential for cluster analysis. Clustering algorithms optimise specific measures of cluster qual- ity, such as compactness and separation. Many clustering algorithms have been defined in the literature [76] and they generally aim to optimise a single objective. Unfortunately, defining what constitutes a good clustering solution remains a dif- ficult problem and no individual measure of clustering quality has emerged as the overall winner. In this context, MOEAs give us the opportunity to optimise sev- eral of these quality measures at once. Furthermore, they will deliver a number of clustering solutions representing trade-offs between the different quality measures.
Previous research into evolutionary algorithms for clustering has been conducted by Cole [23] who explored various techniques for representing clustering solutions and various objectives to be optimised. Handl and Knowles [56] and Chen and Wang
CHAPTER 5. A NOVEL MO CLUSTERING ALGORITHM 98 [20] have developed their own multi-objective clustering algorithms that operate with new cluster quality measures. These previous works have used different methods such as a graph based technique to assign objects to clusters. Here we will use a new centroid-based technique to establish cluster membership.
Broadly speaking, a MOEA consists of the following several components: a selec- tion method; a strategy to manage the Pareto front; a fitness function; a crossover operator(s); a mutation operator(s). This research re-uses the selection method and the strategy to manage the Pareto front of NSGA-II, but the other three com- ponents (fitness, crossover and mutation operators) are new or new variations of existing operators introduced in this thesis.
In this chapter, we propose a new MOCA and evaluate its performance against the well known k-means algorithm, as an initial benchmark. In section 5.2 we propose a new Multi-Objective Cluster Algorithm; in section 5.3, we propose a method of assessing its quality; finally, we report our results in section 5.4 and give our conclusions in section 5.5.