• No se han encontrado resultados

4. Sesiones propuestas

4.1. Sesiones para Secundaria

4.1.11. Sesión 11: Miedos

We have proposed fuzzy set covering as a new paradigm for fuzzy classification rule learning. However, the field is still wide open for further research. Some ideas include further extensions and improvements to FCF, the development of more algorithms, and the extension of fuzzy set covering in general. In the remainder of the chapter we give a brief overview of some of the open problems and also propose some possible strategies.

12.2.1 Neural Network Encoding of Fuzzy Rules

The encoding of an extracted fuzzy rule set in a neural network will provide a link between the sym- bolic and sub-symbolic connectionist approaches to concept learning. In this area we have already taken preliminary steps, and developed a method for encoding FuzzyAL rules [van Zyl and Cloete, 2004e]. The network can represent a fuzzy rule set with internal disjunction accurately under the right condi- tions. The knowledge encoding strength (bias) should be large enough, and the slope parameter of the sigmoidal activation functions should also be sufficiently large. We also showed empirically that the net- work is capable of correcting incorrectly encoded knowledge, and of improving given further training data. However, the final step of taking the trained neural network and again extracting FuzzyAL rules has still not been taken. Rule extraction would allow the seamless migration between both knowledge representations. Since the encoding method is related, but not central to the theme of this dissertation, we provide a summary of the method in Appendix D.

12.2.2 Extending the Description Language

FuzzyAL and FuzzyCAL are both powerful description languages - as can be seen from the good perfor- mance of algorithms using them. However, as discussed in Section 4.10.4, these description languages do not allow for the description of relations between different attributes. One possible way for extending the description language is to add more operators, such as relational operators. A further extension is the addition of fuzzy hedges, such as “very,” “little,” “at most,” etc.

12.2.3 Predicting Concept Membership

We discussed the semantic interpretation of the rules induced by FCF in Section 4.3.5. The membership of an instance to a rule antecedent is no prediction of the membership of the instance to the rule conse- quent. In some cases it may be desirable to know the membership to the concept. One approach may be to learn a non-linear mapping between instances’ membership to a rule’s antecedent and its consequent for all instances matched by the rule. Another approach may to adapt the induction process to learn a hierarchy of rules, such that rules on higher levels have higher membership strengths.

12.2.4 Rule Post-Pruning

FCF includes many efficiency criteria, and also includes the prepruning of rules. However, in this dissertation we did not address the question of rule post-pruning—i.e. pruning after the induction of the complete rule set. In the crisp case, rule post-pruning often increases the generalization performance of the rule set [Mitchell, 1997, p. 71]. We have already undertaken some preliminary steps to address rule post-pruning [Robbel, van Zyl and Cloete, 2004], but much more remains to be done.

12.2.5 Computing the Complete Most General Consistent Rule Set

FUZZYBEXA searches the lattice of conjunctions from top to bottom in a consistent manner, and it is guaranteed to find members ofCM, the set of most general consistent conjunctions, during each itera-

tion when using an infinite beam width. In fact, using an infinite beam width, FUZZYBEXA will find

all members ofCM during the first iteration of FindBestConjunction. However, presently FindBestCon- junction returns only a single conjunction. A further possible extension to FCF is to keep track of the set

of “best conjunctions.” This can be implemented by maintaining the set best conjunctions in FindBest-

Conjunction. This set is cleared each time the best conjunction is replaced by a conjunction that has a

better evaluation. Each time a conjunction is found with the same evaluation as the best conjunction, this conjunction is added to best conjunctions. The set of best conjunctions is then returned. To prevent the addition of many similar rules, CoverConcepts could only add rules from best conjunctions that have no instances in common with other rules from best conjunctions. Using this method, the set of all disjoint but equally good rules is found during each iteration of FindBestConjunction, which could be renamed to FindBestConjunctions. A larger beam width may prove helpful in this case.

12.2.6 Automatic Selection of αa

FCF requires the antecedent threshold αa to be specified by the user. Often the user (domain expert)

may have a good feeling for a suitable value ofαa, but this may also not be the case. Another extension

to the framework is thus to allow the framework to select αa automatically, and even select different

values ofαafor different rules. One concern, however, is that too many individually tuned values forαa

may reduce the comprehensibility of the rule set.

12.2.7 Evaluation Function Sensitivity to αa

We have not investigated the sensitivity of each rule evaluation method toαa. This may be an interesting

experiment, and we expect different evaluation functions to have different levels of sensitivity toαa. We

expect the Laplace function to be very sensitive, but the Accuracy function to be relatively insensitive to αa. Depending on the problem domain, one may opt to use a more insensitive function ifαacannot be

12.2.8 Using Genetic Algorithms for Adapting Membership Functions

We have spent very little time exploring the influence of different membership functions on the induction process. The rationale was that the membership functions are determined externally to the induction process, and the induction algorithm should make do with what it has. However, the induction process is certainly influenced by the membership functions, and better membership functions should allow the induction of more accurate and also more comprehensible rule sets. FCF would allow the genetic optimisation of membership functions, by providing an objective function in the form of a rule set. The process may functions roughly as follows. FCF is used to bootstrap the process by the induction of a rule set. The rule set is then used as objective function for membership function optimisation. After optimisation, the rule set is discarded, but the membership functions are kept for the next iteration of rule induction. The process can then be iterated until the classification performance of successive iterations do not improve anymore.

12.2.9 Incremental Learning and Prior Knowledge

It may be desirable to keep an old tried-and-tested rule set even when new information (training data) becomes available. In this case an incremental learning approach exploiting the prior information may be used. Prior information may also be presented in the form of knowledge extracted from domain experts. A first approach is to add the prior knowledge in the form of rules to the rule set prior to rule induction, and to continue rule induction as usual. Rule antecedents may also be pruned using the extra training data. Another approach may be to adapt rules that classify the new data incorrectly either by specializing or generalizing them.

12.2.10 Information from Knowledge Discovery

The last aspect which we address is the application of FCF to real world domains. FCF presents a new methodology for knowledge discovery which may prove very useful in many different domains. We have showed some preliminary results for two such applications, and FCF performed very well. However, we did not customize or adapt the data in any way. We expect a custom solution involving FCF and data adapted to fit the algorithm to yield very satisfactory results.

12.3

Conclusion

This dissertation advanced the state of the art in fuzzy classification rule induction by establishing fuzzy set covering as a new fuzzy rule induction paradigm. Fuzzy set covering algorithms are capable of inducing very comprehensible but also highly accurate rule sets. Thus, we hope the work presented in this dissertation make the use of fuzzy classification rules more acceptable to both the crisp rule set and numerical concept learning communities.

APPENDIX

A

Classical Set Covering Algorithms

In this dissertation we established set covering as a methodology for rule induction in the fuzzy case. Thus, it is appropriate to provide a brief review of classical (crisp) set covering algorithms. For a definition of set covering please see Section 3.2. Section A.1 reviews the AQR family of algorithms, Section A.2 reviews PRISM, Section A.3 reviews CN2, and Section A.4 reviews RIPPER.

A.1

AQR

The AQR family of inductive learning algorithms, of which AQ15 is an example, generates rules from training instances by following the principles first introduced by Michalski in 1969 [Michalski et al., 1986b; Michalski, 1969]. AQR builds decision rules that accounts for all positive and no negative in- stances by following a heuristic search of the a space of legal logical expressions. AQR rules are repre- sented in VL1, which is a multiple-valued logic propositional calculus with typed variables [Michalski,

1974a].

As an example of AQR, Table A.1 shows the basic AQ15 algorithm [Michalski et al., 1986a,b]. The algorithm is initialized with a partial cover of the positive examples. This initial partial cover may simply have the value true, or be a user defined hypothesis, providing AQ15 with an incremental learning facility. The procedure getStar obtains all maximally general complexes, or hypotheses, that cover a positive seed and not a negative seed. These are obtained by generating all maximally general complexes covering the positive seed, and removing those that also cover the negative seed. The maximally general complexes are then intersected with the current partial cover. This results in a new partial cover that still covers the positive seed, while not covering the negative seed.

This process is iterated and the results combined until no negative examples are covered. The best complex from the result ofgetStar is then added to the current rule set. AQ15 iterates the whole process until all positive examples are covered. The most recent incarnation of the AQ algorithm is AQ20 [Cervone et al., 2001]. Important new features include an object oriented implementation, handling continuous variables without prior discretization, and selecting multiple rules fromstar.

Table A.1:The AQ15 algorithm.

PROCEDURE AQ15(partialcover)

1 WHILEpartialcover does not cover all positive examples

2 seed = any uncovered positive example 3 star = getStar(seed)

4 best = best complex from star according to evaluation function 5 partialcover = patialcover ∨ best

6 RETURNpartialcover

END PROCEDURE

PROCEDURE getStar(positiveseed)

1 partialstar = T RU E

2 WHILEpartialstar covers some negative examples

3 negativeseed = any negative example covered by partialstar 4 negativestar = {c|c is maximally general, c covers positiveseed,

andc does not cover negativeseed} 5 partialstar = patialstar ∩ negativestar

6 retainmaxstar best disjoint complexes in partialstar

7 RETURNpartialstar

END PROCEDURE