• No se han encontrado resultados

1º del BACHILLERATO de ARTES.

CRITERIOS DE EVALUACIÓN

The ability of an individual to solve all the training cases is represented by one single parameter, fitness function. As pointed out before representing the ability of an individual to solve a full training dataset in terms of a single value might be limiting. A typical fitness function is the sum of the errors made by an individual for different training cases. Imagine two individuals, one performing very well for certain training cases and other performing averagely for most of the training cases might get same overall final fitness value. In another case one individual performing well for half of the training examples and another individual perform- ing well for the other half will get the same fitness value, although, behaviourally they are different. If a crossover takes place between two individuals sharing same

5.2. MODIFIED CROSSOVER IN GP 74 weaknesses, their off-spring are likely to have same weaknesses because of the in- heritance. In this way the behavioural diversity of the population will reduce and so will the ability of individuals to solve all training cases. As the population evolves, the behavioural diversity to solve all the training cases across the popu- lation will continue to decrease and in the process the population might loose the genetic material to solve some training cases. In order to to identify the strengths and weaknesses of individuals for particular training cases in addition to over all fitness value, Binary String Fitness Characterisation (BSFC) is introduced. It was shown in [64] that the problems converging sub-optimally performed well for only particular training cases and lacked the genetic material to solve other training cases. An ideal solution should perform well for all the training cases in addition to over all good fitness value and for this purpose BSFC is introduced.

In BSFC, a training case in which an individual performs well is considered a strength and a case an individual is unable to solve is considered a weakness. Consider a logical problem where the output is either true or false, a correct output by an individual would mean a strength and a wrong output would mean a weakness. A binary string is attached with each individual and the length of this binary string is equal to the length of training dataset. Each bit is attached with one training case and a one is placed in the binary string bit if correspond- ing training case is solved and a zero is placed in the bit if the individual fails to solve the training case. Assigning strengths and weaknesses to quantify indi- vidual’s ability in the form of binary string is named as Binary String Fitness Characterisation. If all the training cases are solved, the binary string will have ones in all the bits.

Let us consider another problem where output is not binary but still we have a target value for each training case. The fitness function is the sum of the errors made by an individual for each training case and it is represented as e(t). The binary string could be assigned during the process of fitness evaluation. While it is possible to assign a one to the binary string bit if the individual’s output is within a small margin of the target output value, it will require to assume a predefined threshold which means that initial population of individuals might be dominated by zeros. A more generalised approach would be to relate strengths and weaknesses with fitness value. If the output error for a training case is less than the mean fitness value (fitness value is sum of the error for all test cases), it is considered a strength otherwise weakness. Figure 5.2 shows the graphical representation of the process. The calculation of different bits of binary string in the form of equation can be written as

5.2. MODIFIED CROSSOVER IN GP 75 Training cases E rr o r 2 4 6 8 10 12 14 16 18 20 0 0.2 0.4 0.6 0.8 1

Error for each case Mean error

Figure 5.2: Each circle is the error made by an individual for each training case. The solid line represents mean fitness or mean error. All the circles below this line will get a one in the binary string and circles above the line will get zeros. The resultant binary string will be 01000111011011000110

bi =    1 ife(ti)≤mean(f itness) 0 otherwise (5.1)

wherebi is the binary bit of binary string for the training casei,e(ti) is the error

for training case i and f itness represents the overall error for all the cases. An ideal solution will have a binary string consisting of ones only. The above equation gives more deeper details about the behaviour of GP individual compared to the fitness value.

For classification problems the assignment of binary string bits is not straight forward as there are no target values. The target in such a problem is to separate the two classes and the output points which prove more effective in terms of classification should be rewarded. The binary string for diabetes detection is updated as follows: an output point close to the mean of output distribution of the relevant class is considered as strength and gets a one in the binary string. Similarly, a point away from the mean distribution is considered a weakness and gets a zero. The question arises how to decide which points are closer and which points are far? In this study mean and standard deviation of output distribution are used to define close and far points.

5.2. MODIFIED CROSSOVER IN GP 76 classification problem can be written as

bi =    1 ifO(ti)≶(mean(O(t))±(k∗std(O(t)))) 0 otherwise (5.2)

where bi is the binary string bit for the training case i, O(ti) is the output of

an individual for training case i, k is a constant which determines the distance within which an output point is considered strength, is the multiplication sign, std represents the standard deviation, O(t) is the output vector for a particular class and mean is the function which takes the mean of a vector. According to this equation if the output point is within certain (k) standard deviations of the mean output, it is considered as strength otherwise weakness. The value ofk is an important parameter as it determines the maximum distance, a point could be away from the mean and still classified as strength. The value of k is problem dependent and we will use a value that gives the best results for a particular training data after trying a few values.