The requirement to define what is meant by performance and to understand how algorithms perform arises from the desire to design algorithms and to gain insight into how well they work in practice, which in turn means how to measure their performance. Obtaining metrics for performance enables the comparison of algo- rithms, provides input into the design process, which may also include producing estimates of time and computational complexity and to assist in deciding how the algorithm should terminate.
Fundamentally, performance comprises both the quality of solutions produced and the effort (CPU time) or time (elapsed) required to elucidate them (Fonseca and Fleming, 1996). Effort is naturally a reflection of the time complexity of the algorithm running on the problem itself, usually expressed as ‘Big O’ notation, in which the time needed to run the algorithm is a function of the size of its input, where O(n) indicates a linear relationship (Garey and Johnson, 1979). Other
considerations include computational complexity (a measure of the theoretical difficulty of the problem), and resource usage such as memory, and possibly disk space.
The quality of the solutions achieved essentially means their objective function values in the final generation or iteration. If the global optima are known a priori
then the quality is relative to their closeness to them.
A necessary step in defining performance is to draw a distinction between the utility or cost function, more generally termed the objective function (OF), and the concept of fitness of solution. The optimisation problem is defined by the specification of the OF, whereas fitness is a measure of the desirability of a candidate solution at a given point in the execution of the algorithm. While the value of one may be equal to the value of the other in some single objective algorithms, the distinction is certainly necessary when considering algorithms in which the assessment of a given solution is dependent upon a vector of objective values (Fonseca and Fleming, 1997), and in MOOPs this is the case by definition. For MOOPs then, ‘performance’ encompasses a number of considerations, fore- most of which are those related directly to the Pareto optimal (PFopt) front which is the set of solutions which are globally optimal. Thus convergence to the PFopt front, both in terms of being able to achieve it and the rate of progress towards it, are basic concerns. The other primary concern is the diversity and spread of solutions along the PFopt front, or along the locally optimal approximation sets prior to convergence, since it is desirable, in the absence of knowledge about the possible solution set (that a domain expert might have), to obtain results from across the width of the (hyperplane of the) front (Fleming and Purshouse, 2001). The subsidiary concern of maintaining a degree of lateral diversity of solutions across objective space, in which solutions are perpendicular to the PFopt front, can be seen as a putative mechanism of performance improvement rather than as a direct measure of performance itself. Both cases of diversity are important in the performance of a GA as potentially good solutions should not be lost, where possible (Laumanns et al., 2001). Diversity of solutions helps prevent premature
convergence in the earlier stages of the process, and ameliorates the tendency to search in non productive directions later.
It is the case that the PFopt front may not be known, thus whether maximal convergence has been achieved is not always knowable, in which case performance in this regard can only be measured from an alternative known reference, such as an existing approximation front or a pre-defined set, typically of low quality points in objective space.
There are a variety of quantitative metrics for determining the spread or den- sity of solutions in objective space, which are required both for a quality measure of the algorithm final solution set (the wider and more even the spread, the better) but also to help the algorithm while running to assign a fitness to a solution, since it is usually the case that an algorithm wants to remove one or more solutions from a densely clustered location and allow more solutions at sparsely populated locations.
A density/crowding/sparsity metric may require finding the distance between two or more given solutions in objective space and there are a number of ways such a thing can be calculated, for example the distances of Manhattan, Eu- clidean, Chebyshev, Minkowski, among others. Normalisation of the objective function values may be necessary where OFs are scaled differently, in order that the distances computed for one dimension do not swamp those of another. These values may then be used to define a region in which other solutions are to be counted, or for example by summing for a given solution the distances of all other solutions to it (Deb et al., 2000). Diversity of solutions, however, comes at the price of increased computational expense (Deb et al., 2003). The fitness of a so- lution can then be derived as a function of both its quality and its proximity to other solutions, either through a sharing scheme (Fonseca and Fleming,1995a) in which fitness is degraded as a function of higher population density, or through the secondary application of density as a criterion when ranking would otherwise be the same.
Measures of convergence are calculated for the approximation front(s), in which diversity of solutions may be taken into account. Such metrics include:
Hypervolume (Zitzler and Thiele, 1998) was originally described as “the size of the space covered”, is a measure of how much space (area or hypervolume, depending on the dimensionality of the objective space), is dominated by an approximation front, thus it is not only an indicator of convergence but also of breadth of front.
(Unary) ε-indicator (Zitzleret al.,2003) is a measure of the minimum distance of translation needed to move every solution in the discovered front, so that the front weakly dominates the most converged front found, thus is an intu- itive measure of Pareto dominance.
R indicators Hansen and Jaszkiewicz(1998) are three different but related met- rics that provide an assessment of the difference between approximation fronts, but do not work in all cases. R1 estimates the number of times the solutions of one front are better than the other. R2 is a graded esti- mate of the superiority of one front over the other, and R3 indicates the proportion by which one front is better than the other. They require a set of utility functions together with assumed probabilities of their occurrence and a numerical integration technique to solve the integral. Deb and Jain(2002) suggest a weighted set of Tchebycheff metrics for the utility functions.
It is important to remember that each convergence metric is in some way deficient by itself in that they may not always be applicable, depending on the distribution of points in the approximation or reference fronts being compared, or may not provide all the information required. Where fronts are comparable, different metrics may indicate opposite conclusions, again depending on the nature of the fronts, for example a wide spread of solutions related to a front which is only partially better converged. Thus it is advisable to use more than one metric in such comparisons, with both qualitative (better) but also quantitative (better by how much) metrics being available (Zitzler et al., 2003). Zitzler et al. (2002)
showed that at least M metrics are needed to compare two or more fronts of an
M objective problem.