• No se han encontrado resultados

In this experiment, the optimal settings are predicted from a model of algorithm efficiency. Preparation The model is a second-order quadratic linear model of the form:

log N=β0+ 10

i=1 βizi+ 10

i=1 10

j=i+1 βijzizj+ 10

i=1 βiiz2i (47)

where N is the efficiency (and log N, the chosen response measure as discussed in sec- tion 3.2.5), ziare coded algorithm parameters, and all β are model coefficients to be estimated. Experiments V, VI, and VII provide strong evidence of a rugged response surface with multiple optima, but equation (47) is a model of a smooth response surface which, since it is quadratic, has at most a single optimum. This simplification of the surface by the model is intentional. We speculate that it captures large-scale features of the response surface by ‘smoothing out’ smaller-scale features such as local optima and so may produce a more reliable estimate of the optimum than a technique such as RSM.

Given the relatively large number of parameters and the apparent complexity of the re- sponse surface, a linear model is likely to be a good fit over only a relatively small region of the parameter space. We therefore use the interquartile parameter ranges derived in the preceding experiment (listed in table 25) to define the boundaries of a suitably small region in which we estimate optimum parameter settings to lie. A minor adjustment is made: for integer-valued parameters, the lower boundary of the range is rounded down to the near- est integer, and the upper boundary rounded up. We refer to this region as the ‘region of interest’.

(The requirement for a small region of interest is the reason why a linear model could not be used as an alternative to RSM for tuning parameters in Experiments V, VI and VII: it was not possible to confidently and objectively identify a small region of interest prior to experimentation. In those experiments, RSM is advantageous because it ‘moves’ the region of interest around the parameter space until an optimum is found.)

Method In order to estimate the model coefficients, β, algorithm trials are performed at selected points in the region of interest in order to measure the efficiency. A central composite design is used for this purpose as it provides accurate estimates of the coefficients using relatively few design points.

A central composite design is a combination of a fractional factorial design in which factors have coded values±1; repetitions of the centre point at which all factors have coded value 0; and ‘star’ points at which one factor has the coded values±α(where α is normally

greater than 1 and depends on the number of factors) and all other factors are 0.

For ten parameters, the value of α would be normally be 3.36 and so, apart from the star points, the design points would be clustered in the centre of the region of interest. Such a design would provide little information about parameter interactions nearer the edges of the region. We therefore choose instead a face-centered central composite design in which the value of α is 1, and equate the boundaries of the region of interest with coded values±1 in the design.

A central composite design for the 10 algorithm parameters has 178 design points. At each design point, 16 algorithm trials are run (each with a different seed) for each of the four

SUTs in order to accommodate the noise in algorithm response; a total of 11,392 algorithm trials.

For consistency with calculation of efficiency used in Experiments V, VI, and VII, at each design point the median of the 16 observed responses is calculated for each SUT and the ‘average’ efficiency calculated from the medians as discussed in section 3.2.5. The model is fitted to the average efficiency observed at each design point using standard linear regression to estimate the model coefficients.

The optimal parameter settings are derived by optimising the fitted model using the Mat- lab optimization toolbox. Since it would be unreliable to use the model to estimate the efficiency outside the region of interest, the optimisation process is constrained to this region. Results In figure 25, parallel coordinates are used to plot the optimal common parameter settings predicted by the model as coded values (solid line), and the boundaries of the region of interest (dotted lines).

z1 z2 z3 z4 z5 z6 z7 z8 z9 z10 −3 −2 −1 0 1 2 Coded Value Coded Parameter

Figure 25 – The optimal common parameter settings as coded values (solid line), and the bound- aries of the region of interest (dotted lines) for comparison.

Table 26 lists the optimal common parameter settings in their natural forms rather than the transformed forms, xi, used for the experiments in this chapter; the column ‘Source’ indicates from which xi the natural parameter values are derived.

For completeness, the table includes four parameters that were temporarily considered configuration options for the purpose of the empirical work: the maximum number of parents and bins per node, the proportion of the maximum number of bins that are initially created in the representation, and the maximum number of fitness evaluations for each candidate profile (which limits the number of re-evaluations). The maximum number of bins is calculated from the number of coverage elements in the SUT according to equation (43).

Discussion and Conclusions Figure 25 shows that the predicted optimum settings are at a ‘corner’ of the region of interest: for five of the parameters the optimal value is at one of the bounds defined by the interquartile range. This suggests that the ‘true’ optimum of the fitted model is at a point outside the chosen region of interest, and that the predicted optimum using constrained optimisation is simply the best parameter settings inside the region.

Possible explanations for such a result are: (a) the noise in the algorithm response reduces the fit between the model and response surface leading to some inaccuracy in the prediction; and, (b) there is no reason why the optimum should necessarily be in the chosen region of interest. It would be possible to ‘move’ the region of interest in the direction of the model’s

Parameter Effect Source Optimal Setting

K evaluation sample size x1 302

λ neighbourhood sample size x2 9

ρprb bin probability mutation factor x3 54.002

ρlen bin length mutation factor x4 2.755

Wbins Gbinsmutation group weight 1 000

Wedge Gedgemutation group weight x5 144

Wdrct Gdrctmutation group weight x6 10 584

wrem Mremmutation weight 1 000

wadd Maddmutation weight x7 689

wprb Mprbmutation weight 1 000

wjoi Mjoimutation weight x8 886

wspl Msplmutation weight x8,x9 1 159

wlen Mlenmutation weight x10 1 028

µprnt Maximum no. parents per node 1

µbins Maximum no. bins per node =2.0)

ζ Initial no. bins (as proportion of µbins) 1.0

µeval Maximum no. evaluations per profile 10

Table 26 – The optimal common parameter settings for configuration Ainib(1.0).

optimum and then repeat the experiment using the new region of interest; such a process is essentially an iteration of RSM. Each repetition requires 11,392 algorithm trials and so would be require substantial computing resources, especially if more than one such ‘move’ is required. Instead, we regard the optimal parameter settings listed in table 26 as the ‘best effort’ prediction given practical resources.

We briefly comment on some individual parameter values in the optimal settings: • The value of ρprb, the multiplicative factor by which a probability value is increased

or decreased during mutation, is surprisingly high. It was expected that this mutation operator would make small refinements to the probability distributions, but—especially when the number of bins is small—one atomic mutation of this operator using such a large factor will be a substantial change to the distribution.

• The value of ρlen, which serves a similar purpose to ρprb for bin lengths, is relatively small in comparison to ρprb.

• The weight, Wdrct, for the mutation group implementing directed mutation is more than 10 times higher than the other groups, confirming that such mutations are important to the efficiency of the algorithm.

• The ratio of waddto wrem is less than 1 and so parsimony in the edges of the Bayesian network is encouraged

• The ratio of wsplto wjoi is greater than 1 and so parsimony in the number of bins is not encouraged. However, in this algorithm configuration, Ainib(1.0), a limit is placed on the number of bins, and the number of bins is initialised to this limit: therefore such parsimony may longer be required.

Documento similar