of the same family as the Particle Swarm Optimiser. The generalisedGFAis specialised to the search space of programs, and a parallel implementation of it is presented and discussed.
4.6
Geometric Particle Swarm Optimiser
Moraglio, Chio and Poli formally derived theGeometric Particle Swarm Optimiser Algorithm (GPSO)(though without the inertia term) by applying concepts of geometric crossover and geometric mutation using convex set theory [190]. TheGPSOis generalised to arbitrary search spaces. From the originalPSO, the first observation made is that it must be possible to compute “linear” combinations of more than two particles, in whatever search space is necessary. Moraglio et al. used the concept of a convex combination to ensure that combinations of points in a convex hull in some metric space satisfy four conditions involving theweightof the points.
These requirements were given by Moraglio in detail [190], and are loosely reminiscent of the same concepts in continuous space. For example, supposeghas a fitness (orweight) ofwg= 0.4, and a particlexin the space has a
fitness (weight) ofwx= 0.1, and a linear combination between these are made without randomisation. It follows
that the new pointnlies somewhere betweenxandg, and specifically, more closely togdue to having a larger weight. The conditions essentially ensure that the weights satisfywx+wg= 1, that the pointnlies betweenxand
g, that the distance betweenxandgare coherent with their weights (the mid-point or equilibrium has a weight of zero), and that ifwg =wxthen the distance to the midpoint is the same. More detail on this is available in
Moraglio’s thesis [189], as well as the work of Moraglio et al. [190].
In order to accomplish the linear combination of several points (ie. multi-parent crossover) in arbitrary space, Moraglio et al. show that it can be done using several separate geometric crossovers [190]. The original PSO, for example, requires the linear combination of a particle with its personal best position found, as well as a global best (ie. a multi-parent recombination). Moraglio et al. also specialised theGPSOto Manhattan and Hamming metric spaces, as well as Euclidean space. Using these ideas, Togelius et al. created the so-called Particle Swarm Programming algorithm, which is a proof of concept optimiser based on theGPSO, and specialised to the search space of genetic programs [280] (ofGP). Togelius et al. proposed several possible operators for crossover, and the authors concede that significant research still remains in finding the most appropriate operators. Some effective ones presented by them include weighted subtree swap, weighted homologous crossover and weighted one-point crossover. Homologous crossover ensures that the common region between two candidates are kept intact [226]. Togelius and colleagues reported that common regions can sometimes be too small for this operator to be constructive [280]. The other two operators are more self-explanatory.
4.6.1
GPSO for Karva-expressions
As the authors of Particle Swarm Programming (PSP) have noted, more research would be beneficial for geometric optimisers especially in the search space of programs [280]. By combining the Karva language of Ferreira [64] with theGPSO[190], a different specialisation of the GPSO was obtained. A different search space has a considerable impact on an the ability of an optimiser to explore and converge. This specialisation of the GPSO is presented, and then some performance data is given. The algorithm is parallelised across graphics hardware to improve wall-clock performance. Henceforth this algorithm will be referred to as K-GPSO-GPU, and is described and investigated in this section.
The method used for implementing the K-GPSO-GPU algorithm described above is summarised in Algorithm
84 4. COMBINATORIAL OPTIMISATION
argument maps were pre-computed fromk-expressions to allow a separate CUDA kernel to execute programs without recursion.
The weighted crossover of the PSP algorithm of Togelius and colleagues [280] was modified to operate on k-expressions using an adapted multi-parent geometric crossover proposed by the authors of that article, shown in Equation4.3[280]. ∆GX((a, wa),(b, wb),(c, wc)) = GX((GX((a, wa wa+wb ),(b, wb wa+wb )), wa+wb),(c, wc)) (4.3)
allocate and initialise enough space forncandidate programs
allocate space for random deviates whiletermination criteria not metdo
call CURAND to fill the random number array with uniform deviates in the range [0,1)
copycandidates and candidate bests to device CUDA: compute argument maps()
CUDA: interpret/execute programs CUDA: update food locations/fitness copyback to host
ifend-of-generation thenthen CUDA: update candidate bests
CUDA: recombine and mutate programs replace old programs with new ones end if
Determine best candidate visualise the result end while
ALGORITHM9: The parallel implementation of the K-GPSO-GPU algorithm.
In Equation4.3,GXis the crossover operator, and∆GXis the multi-parent crossover operator. It is assumed thatwa,wb andwc are all positive and sum to1. Essentially this equation defines the weighted, multi-parent
crossover as two crossovers, the first being betweenaandb, where weights are re-normalised to sum to1, and the second is a crossover withc. Togelius, De Nardi and Moraglio provide more details on this using convex set theory [280]. More detail on the rationale and rigorous mathematical proofs of this procedure are available [192,189,190,280].
As can be seen in Algorithm9the majority of computations are parallelised as with the K-GP-GPU algorithm, in anticipation of large candidate populations. In order to execute the programs generated, arguments are again reordered in separate memory space as described in Section4.4.
4.6. GEOMETRIC PARTICLE SWARM OPTIMISER 85
At this point, what remains to be determined is precisely how the crossover operation given in Equation4.3
will be done on lineark-expressions. Ferreira defines one-point crossover as choosing a crossover site or “pivot”, and then exchanging symbols about this point to obtain two new candidates [63]. In order to ensure that this crossover is geometric in the sense that multi-parent one-point crossover can be computed and still be able to bias the result towards one parent candidate or the other, it must be ensured that it isweighted. This is however not enough to define the crossover as geometric. In order to do this, one must prove that this crossover satisfies the four conditions for a convex combination outlined by Moraglio et al. [190]. According to Togelius et al. the following is a reasonable approximation [280].
The method used for accomplishing this recombination is by using theω,φgandφpparameters as the weights
(wa,wbandwc) in Equation4.3. The candidateais further defined as the current candidate under consideration,b
as the corresponding personal best ofa, andcas the global best candidate discovered so far. The fitness values of these are not used in the crossover process. Note also that unlikeGP, selection is not required, other than simply a value forP(crossover), a probability defined by the user, as inGP. GEP crossover defines a “donor” and a “recipient” tree, which are chosen randomly [280].
The weights mentioned apply to the crossover operation between twok-expressions as an indication of how closely to the root of the tree the pivot (crossover site) is. This idea was obtained from the article of Togelius et al. where it was shown that this method roughly satisfies the four conditions of a convex combination [280].
Point mutation proceeds as with the K-GP-GPU algorithm, per Section4.4.
As the algorithm has been described, focus is now given to an evaluation of effectiveness. The K-GPSO-GPU algorithm described above was compared against the K-GP-GPU algorithm described in detail in Section4.4. The optimisation problem under consideration was the modified Santa Fe Ant Trail problem in three dimensions described in Section4.1.2. Analysis involved two aspects of the algorithms, firstly the ability of the algorithms to converge upon a good solution, and secondly the speed with which it was achieved.
The parameters used for the K-GP-GPU algorithm were: P(Crossover) = 0.8, P(Mutate) = 0.1. The same crossover and mutation rates were used for the K-GPSO-GPU algorithm, and for the PSO-specific settings, these parameters were used:ω= 0.1,φp= 0.6,φg= 0.3. As for the simulation itself, angular velocities are restricted
to0.1units, and initial velocities are initialised to between−0.16and0.16. This essentially serves to increase the difficulty in finding a good program. In order to use a higher mutation rate, Togelius et al. recommend using elitism [280], whereby the best candidate is replicated verbatim into the new population following the genetic operators. This is a common technique used in EAs to bias the population in a particular direction. Elitism is used in the GP and the K-GPSO-GPU algorithms.
4.6.2
Experimental Results
Figure4.13shows the convergence results for the K-GPSO-GPU and the K-GP-GPU algorithms. Each data point in all the plots shown have been averaged100times in independent runs. The figure shows apparent evidence that the K-GPSO-GPU algorithm is indeed more able to find a good solution faster, but is quickly superseded by the GP-based algorithm if computing fitness for more than about25timesteps. It could be that the GPSO-based algorithm is more appropriate if it is not viable to compute more than 25 generations.
Figures4.11and4.12show the convergence results for the two algorithms respectively. Elitism was experi- mented with in order to determine precisely how much of an effect it has on the space ofk-expressions. Figure4.13
86 4. COMBINATORIAL OPTIMISATION
FIGURE4.11: Modified 3D Santa Fe Ant Trail Problem convergence results for the K-GP-GPU algorithm, with elitism. The graph shows the average mean value of each generation, from 100 independent runs. The error bars represent the average standard deviation of the 100 runs in each generation.
FIGURE4.12: Modified 3D Santa Fe Ant Trail problem convergence results for the K-GPSO-GPU with elitism. Each data point has been averaged 100 times in independent runs, and the error bars represent the average standard deviation of each generation.
4.6. GEOMETRIC PARTICLE SWARM OPTIMISER 87
Each data point of the averaged mean was averaged across 100 separate runs. From these plots, it is clear that the K-GPSO-GPU has more spread per generation than the GP-based algorithm, which is not very desirable. Essentially this indicates that the algorithm is somewhat less reliable and not as consistent. The minimum and maximum values are also shown to indicate the full spectrum.
Finally, Figure4.14shows the average compute time, by generation, for both the K-GP-GPU and the K-GPSO- GPU. The fitness evaluation consisted of computing300frames of the candidate programs and gathering scores from this. Therefore, each data point represents the average frame compute time across each of the300frames, and then averaged100times by independent runs. The generation compute times are also shown, although they are somewhat hidden. While the first observation seems that the K-GPSO-GPU is faster than the K-GP-GPU algorithm, this is somewhat misleading. Essentially, the plots in Figure4.14would be completely linear, if all the terminal and function symbols were of the same complexity. As will be shown later, this is likely due to the lack of diversity in the populations of the K-GPSO-GPU algorithm and the overly quick convergence properties of it.
The average new-generation population parallel compute time for the K-GP-GPU was420µsec, and for the K- GPSO-GPU it was440µsec. Even though this is not remotely comparable to the fitness evaluation (340,000µsec), it was still worth the effort, as this must happen in serial following the fitness evaluation phase. In other words, any improvement on generation compute time will improve wall-clock performance since it cannot be done in parallel with the fitness evaluation phase. This allows to generate larger populations of candidates, but it should be noted that the fitness evaluation phase will also increase greatly in computing time. Therefore, this improved population generation algorithm is likely to be more useful for problems with low-complexity in fitness evaluation.
The function symbolIfFoodAheadhas a rough complexity ofO(f), wheref is the number of food particles, which would approachO(f N), should all candidates have one of these symbols in its program. Of course, the worst case here is that every candidate consists only of these functions and enough terminals to satisfy thek-expression’s head and tail sections. Hypothetically, given a maximum expression lengthl= 8, and a head lengthh= 3(hence a tail length of5), then the maximum number ofIfFoodAheadfunctions would be3. Extrapolating from this, assume allNparticles were formed like this, then evaluation could reach complexityO(3f N), which could also very well exceedO(N2), a highly undesirable complexity. Of course, this would depend on the value off, which was arbitrarily chosen in this instance.
Following from this argument, it is possible to make the conjecture that at generation20, the K-GP-GPU algorithm generally increased its use of theIfFoodAheadfunction, while the K-GPSO-GPU had reached a steady equilibrium of a certain number of these functions. This would seem to agree with the suspicion that the K-GP-GPU is in fact better in preserving population diversity.
Attempts to improve the K-GPSO-GPU beyond the results seen in Figure4.13was met with disappointment. Parameter tuning efforts forphig,phipandωincluded normalised combination of respective scores of particles and
also normalised weighted scores, but the best parameters observed were simplyphig= 0.3, phip= 0.6, ω= 0.1.
Fine-tuning crossover and mutation probabilities had varying effects on convergence. Removing the crossover phase with the global best solution reduced mean scores to around0.2, and similar results were obtained from removing the crossover with the personal best. Randomising the crossover point slightly with hand-tuned parameters to aid in diversity did not improve scores at all.
Results indicate that, at the very least, the K-GPSO-GPU operating overk-expressions is appropriate for when the fitness evaluation is extremely computationally expensive. Given enough time and compute power, however, the K-GP-GPU algorithm operating onk-expressions is more suited to this problem.
88 4. COMBINATORIAL OPTIMISATION
FIGURE4.13: Convergence results for the K-GP-GPU and K-GPSO-GPU, as well as the use of elitism for both in the modified 3D Santa Fe Ant Trail problem. Each data point has been averaged across 100 independent runs to obtain meaningful statistical data.
As for the performance data presented, it would be unwise to favour the K-GPSO-GPU from the observation in Figure4.14that the timestep compute time is lower. The slightly higher compute time does, after all, seem to translate into a higher success rate as shown in Figures4.11and4.13.
Having successfully integrated Karva into parallel modified versions of theGPand theGPSO, a next step is to examine the use of a slightly more sophisticated optimiser, also from the realm of continuous global optimisation.