• No se han encontrado resultados

RESULTADOS Y VALORACIÓN GLOBAL DE LA PROPUESTA

Rules induced from raw, training data are used for classification of unseen, testing data. The classification system of LERS is a modification of thebucket brigade algorithm [1, 12]. The decision to which concept a case belongs to is made on the basis of three factors: strength, specificity, and support. They are defined as follows:Strengthis the total number of cases correctly classified by the rule during training.Specificityis the total number of attribute-value pairs on the left-hand side of the rule. The matching rules with a larger number of attribute-value pairs are considered more specific. The third factor,support, is defined as follows

matching rules R describing C

Strength factor(R)∗Specificity factor(R).

The concept C for which the support is the largest is a winner and the case is classified as being a member ofC.

In the classification system of LERS, if complete matching is impossible, all partially matching rules are identified. These are rules with at least one attribute-value pair matching the corresponding attribute-value pair of a case. For any partially matching rule R, the additional factor, called Matching factor (R), is computed. Matching factor (R) is defined as the ratio of the number of matched attribute-value pairs ofRwith a case to the total number

Table 1.An example of the data set Attributes Decision

hobby Case Age Gender

1 32 Female Shooting

2 27 Male Fishing

3 45 Male Shooting

4 63 Female Shooting

5 35 Male Fishing

of attribute-value pairs ofR. In partial matching, the conceptCfor which the following expression is the largest

partially matching rules R describing C

M atching f actor(R)∗Strength f actor(R)

∗Specif icity f actor(R)

is the winner and the case is classified as being a member ofC.

Every rule induced by LERS is preceded by three numbers: specificity, strength, and the rule domain size (the total number of training cases match- ing the left-hand side of the rule).

We will illustrate the MLEM2 algorithm with rule set induction from Table 1. The data set from Table 1 contains one numerical attributeAge and one symbolic attributeGender. Additionally, there are two concepts: [(Hobby, shooting)] ={1, 3, 4,} and [(Hobby, fishing)] ={2, 5}.

First we need to sort the numerical attribute Age. The sorted list is: 27, 32, 35, 45, 63. The corresponding cutpoints determined by MLEM2 are: 29.5, 33.5, 40 and 54. Thus the set of all attribute-value pair blocks (the search space for MLEM2) is:

[(Age, 27..29.5)] ={2} [(Age, 29.5..63)] ={1, 3, 4, 5} [(Age, 27..33.5)] ={1, 2} [(Age, 33.5..63)] ={3, 4, 5} [(Age, 27..40)] = {1, 2, 5} [(Age, 40..63)] = {3, 4} [(Age, 27..54)] = {1, 2, 3, 5} [(Age, 54..63)] = {4} [(Gender, female)] ={1, 4} [(Gender, male)] ={2, 3, 5}

Let us start from the set B equal to the concept [(Hobby, shooting)]. Thus B = G = {1,3,4}. The set T(G) of all attribute-value pairs relevant toGconsists of (Age, 29.5..63), (Age 27..33.5), (Age, 33.5..63), (Age, 27..40),

(Age, 40..63), (Age, 27..54), (Age, 54..63), (Gender, female) and (Gender, male). The most relevant attribute-value pair is (Age, 29.5..63), since among all attribute-value pairs fromT(G) the value of

|[(Attribute, value)]∩G|

is the largest for (Age, 29.5..63). However,

[(Age,29.5..63)](Hobby, shooting)].

Thus we have to start the second iteration of the inner while loop of the MLEM2 algorithm. This time T(G) is equal to the set T(G) that was initially computed except (Age, 29.5..63). Four attribute-value pairs: (Age, 33.5..63), (Age, 40..63), (Age, 27..54) and (Gender, female) are the most rel- evant. Since there is a tie, we have to use the second criterion to break the tie: the attribute-value pair with the smallest block cardinality. There are two candidates: (Age, 40..63) and (Gender, female). We have to use the third crite- rion: the first candidate, i.e., (Age, 40..63). Moreover, for the set T consisting of two attribute-value pairs computed so far: (Age, 29.5..63) and (Age, 40..63)

[T][(Hobby, shooting)].

The next step is to go through the loop for that follows the inner loop

while. In different words, we will try to minimize setT. The first test is whether

[(Age,40..63)][(Hobby, shooting)]

Since this is true, our final minimal complex is (Age, 40..63). Note that we used rule minimization, a part of the LEM2 algorithm that is common for both versions of MLEM2, with and without merging intervals, so both versions of the MLEM2 algorithm will induce the same rule. However,

[(Age,40..63)]= [(Hobby, shooting)],

or, G = B [T] = , so we have to run the MLEM2 algorithm through the next iteration of its outer while loop. This time G = {1} and T(G) =

{(Age, 29.5..63), (Age, 27..33.5), (Age, 27..40), (Age, 27..54), and (Gender, female)}. Obviously, every member ofT(G) is themost relevant. The second criterion, the minimum of|[(Attribute, value)]|returns two candidates: (Age, 27..33.5) and (Gender, female). The third criterion returns (Age, 27..33.5). Additionally,

[(Age,27..33.5)](Hobby, f ishing)],

so we have to go through the second iteration of the innerwhile loop of the MLEM2 algorithm. T(G) is equal to {(Age, 29.5..63), (Age, 27..40), (Age,

27..54), and (Gender, female)}. The second criterion will indicate that (Gen- der, female) is the best candidate. Thus T = {(Age, 27..33.5), (Gender, female)}. Furthermore,

[T][(Hobby, shooting)].

We will execute the loop for to minimize T. After the first attempt we have

[(Gender, f emale)][(Hobby, shooting)]

hence (Gender, female) is our second minimal complex. Additionally, forT =

{{(Age, 40..63)},{(Gender, female)}}, we have

[(Age,40..63)][(Gender, f emale)] = [(Hobby, shooting)],

soT is a local covering of [(Hobby, shooting)].

Our new input set to the algorithm MLEM2 is the other concept, i.e., the set {2, 5}. The set T(G) of all attribute-value pairs relevant to G is

{(Age, 27..29.5), (Age, 29.5..63), (Age 27..33.5), (Age, 33.5..63), (Age, 27..40), (Age, 27..54), (Gender, male)}. The most relevant attribute-value pairs are (Age, 27..40), (Age, 27..54), and (Gender, male). The second criterion, the minimum of|[(Attribute, value)]|does not break the tie since|[(Age, 27..40)]| =|[(Gender, male)]|= 3. The last resort is to select the first pair, i.e., (Age, 27..40). However,

[(Age,27..40)][(Hobby, f ishing)],

therefore we have to run the MLEM2 algorithm through the second iteration of the innerwhile loop. This time fromT(G) the attribute-value pair (Age, 27..40) is excluded, and the most relevant attribute-value pair is (Gender, male). Moreover, forT ={(Age,27..40),(Gender, male)} we have

T [(Hobby, f ishing)].

Furthermore, this set is already minimal and [T] = [(Hobby, f ishing)],

so our local covering for [(Hobby, fishing)] is the set containing as the only elementT ={(Age, 27..40), (Gender, male)}. The rule set, determined by the MLEM2 algorithm, in the LERS format, is

1, 2, 2

(Age, 40..63) –>(Hobby, shooting) 1, 2, 2

(Gender, female) –>(Hobby, shooting) 2, 2, 2

Documento similar