13. ANEXOS
13.1 ANEXO I CONTENIDOS
A new approach was introduced, called ‘Building block approach to Genetic Programming’ (BGP), to find a good classifier for classification tasks in data mining. It is an evolutionary search method based on genetic programming, but differs in that it starts searching on the smallest possible individuals in the population, and gradually increases the complexity of the individuals. The individu- als in the population are decision trees, using relational functions in the internal nodes of the tree. Selection of individuals for recombination is done using tourna- ment selection. Four different recombination operators were applied to the decision trees: crossover, pruning and two types of mutation. BGP was compared to two standard machine learning algorithms, CN2 and C4.5, on four benchmark tasks: Iris, Ionosphere, Monks and Pima-diabetes. The accuracies of BGP were similar to or better than the accuracies of CN2 and C4.5, except on the Ionosphere task. The main difference with the C4.5 and especially CN2 is that BGP produced these accuracies consistently using less rules.
Two disadvantages of BGP are the time-complexity and problems with many continuous attributes. The development of scaling algorithms for BGP and GP for data mining to be suitable for handling large databases is an interesting topic for future research. Continuous-valued attributes enlarge the search space substan- tially, since there are an infinite number of threshold values to be tested. Currently the search for the best threshold is done through a mutation operator which adds a Gaussian value to the current threshold, thus doing a random search. Future extensions of BGP will include a mutation operator on thresholds that performs a local search for the best value of that threshold.
Other research directions to improve the performance of the current building block approach may include the following:
• adding a local search phase to optimize the threshold value of a condition, • adding semantic rules to the grammar of the GP to prevent the comparison of
incompatible attributes in the nodes of the decision tree,
• investigating new criteria, that also depends on classification accuracy, to test when new building blocks should be added, and
• implementing techniques to select the best building block to be added to individuals.
REFERENCES
Aarts, E.H.L., & Korst, J. (1989). Simulated Annealing and Boltzmann Machines. John Wiley & Sons.
Bäck, T., Fogel, D.B., & Michalewicz, Z. (Eds.). (2000a). Evolutionary Computation 1. Institute of Physics Publishers.
Bäck, T., Fogel, D.B., & Michalewicz, Z. (Eds.). (2000b). Evolutionary Computation 2. Institute of Physics Publishers.
Bojarczuk, C.C., Lopes, H.S., & Freitas, A.A. (1999). Discovering Comprehensible Classification Rules using Genetic Programming: A Case Study in a Medical Domain. Proceedings of the Genetic and Evolutionary Computation Conference (pp. 953-958). Morgan Kaufmann.
Bot, M. (1999). Application of Genetic Programming to Induction of Linear Classification Trees. Final Term Project Report. Faculty of Exact Sciences. Vrije Universiteit, Amsterdam.
Cherkauer, K.J., & Shavlik, J.W. (1996). Growing Simpler Decision Trees to Facilitate Knowledge Discovery. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining.
Clarke, P., & Niblett, T. (1989). The CN2 Induction Algorithm. Machine Learning. 3, 261- 284.
Craven, M.W., & Shavlik, J.W. (1994). Using Sampling and Queries to Extract Rules from Trained Neural Networks. Proceedings of the 11th International Conference on Machine Learning.
De Jong, K.A., Spears, W.M., & Gordon, D.F. (1991). Using Genetic Algorithms for Concept Learning. Proceedings of International Joint Conference on Artificial Intelli- gence (pp. 651-656). IEEE Press.
Eggermont, J., Eiben, A.E., & Van Hemert, J.I. (1999). Adapting the Fitness Function in GP for Data Mining. Proceedings of the European Conference on Genetic Programming. Flockhart, I.W., & Radcliffe, N.J. (1995). GA-MINER: Parallel Data Mining with Hierar- chical Genetic Algorithms Final Report. EPCC-AIKMS-GA-MINER-REPORT 1.0. University of Edenburgh.
Folino, G., Pizzyti C., & Spezzano, G. (2000). Genetic Programming and Simulated Annealing: A Hybrid Method to Evolve Decision Trees. Proceedings of the European Conference on Genetic Programming.
Fu, L.M. (1994). Neural Networks in Computer Intelligence. McGraw Hill.
Giordana, A., Saitta, L., & Zini, F. (1994). Learning Disjunctive Concepts by Means of Genetic Algorithms. Proceedings of the 11th International Conference on Machine Learning (pp. 96-104).
Hoffmann, R., Minkin, V.I., & Carpenter, B.K. (1997). Ockham’s razor and Chemistry. International Journal for the Philosophy of Chemistry. 3, 3-28.
Marmelstein, R.E., & Lamont, G.B. (1998). Pattern Classification using a Hybrid Genetic Program – Decision Tree Approach. Proceedings of the Third Annual Conference of Genetic Programming (pp. 223-231). Morgan Kaufmann.
McGrade, A.S. (Eds.). (1992). William of Ockham – A short Discourse on Tyrannical Government. Cambridge University Press.
Michalski, R., Mozetic, I., Hong, J., Lavrae, N. (1986). The AQ15 Inductive Learning System: an Overview and Experiments. Proceedings of IMAL.
Quinlan, R. (1992). Machine Learning and ID3. Morgan Kauffmann Publishers.
Quinlan, R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers. Quinlan, R. (1998). C5.0. Retrieved March 2001 from the World Wide Web:
www.rulequest.com.
Thodberg, H.H. (1991). Improving Generalization of Neural Networks through Pruning. International Journal of Neural Systems. 1(4), 317 – 326.
Wong, M.L., & Leung, K.S. (2000). Data Mining using Grammar Based Genetic Program- ming and Applications. Kluwer Academic Publishers.