Advances in Hybrid Evolutionary Computation for Continuous Optimization

Texto completo

(1)DEPARTAMENTO DE ARQUITECTURA Y TECNOLOGÍA DE SISTEMAS INFORMÁTICOS. Facultad de Informática Universidad Politécnica de Madrid. Ph.D. Thesis. Advances in Hybrid Evolutionary Computation for Continuous Optimization. Author Santiago Muelas Pascual M.S. Computer Science. Ph.D. Director José María Peña Sánchez PhD Computer Science. 2011.

(2)

(3) Thesis Committee. Chairman: Pedro de Miguel Anasagasti. Member: Enrique Alba. Member: Manuel Lozano. Member: Alexander Mendiburu. Secretary: Víctor Robles.

(4)

(5) Acknowledgments Resulta siempre difícil poder dar las gracias a todas las personas que de alguna manera me han ayudado a lo largo de esta tesis doctoral. En primer lugar, quiero dar las gracias a mi director de tesis José María Peña, por ser el primer responsable de que haya conseguido concluir este trabajo. Gracias por haber confiado en mi en todo momento, el trato cercano, los consejos que me has dado a lo largo de todo este tiempo y el buen carácter que siempre has tenido. Por todo ello y por los proyectos que están surgiendo y que puedan venir, gracias. Agradezco también al profesor Enrique Alba por haberme dado la oportunidad de visitar su grupo de investigación y por haber conocido a toda la gente excepcional que forman el grupo. No puedo dejar de mencionar a mis compañeros del despacho, laboratorio, departamento y grupo. Gracias por vuestras sugerencias y por haberme ayudado siempre que lo he necesitado. También quiero expresar mi más profundo agradecimiento a mis padres y hermanas que desde el primer hasta el último momento han estado apoyándome y ayudándome en todo lo posible. Sin vuestra ayuda y compresión en los momentos mas difíciles, no habría podido terminar esta tesis. Por último, aunque no por ello menos importante, quisiera agradecer a todos mis amigos por haberme aguantado todos estos años y por haberme dado la posibilidad de desconectar y reponer fuerzas cada día.. Santiago Muelas Pascual.

(6)

(7) Abstract Evolutionary Algorithms (EAs) are a set of optimization techniques that have become highly popular in recent decades. One of the main reasons for this success is that they provide a general purpose mechanism for solving a wide range of problems. Several approaches have been proposed, each of them having different search characteristics. Hybrid Evolutionary Algorithms (HEAs) are an effective alternative when approaching the optimization of a problem by means of EAs. The combination of several algorithms allows them to exploit the strength of each of the algorithms involved throughout the evolutionary process. Furthermore, it has been proven that, by means of the proper selection of algorithms and hybridization strategies, it is possible to obtain HEAs that outperform their composing algorithms thanks to the synergic relationships yielded by the hybridization. This characteristic has been the main motivation for the studies that have been carried out in the development of this thesis. Each of these studies analyzes a different key factor in the combination of the algorithms with the aim of designing more efficient Hybrid Evolutionary Algorithms. These factors include the application of preliminary algorithms to conduct the initialization of the solutions, control mechanisms to manage the exchange of information in distributed models and adaptive hybridization strategies, just to introduce some of them briefly. In the first study, a new initialization method is designed for the distributed Evolutionary Algorithms (dEAs). This mechanism uses a topological tool to restrict the initial search space of the nodes of the algorithm. The proposal carries out a systematic procedure by following two criteria: (i) homogeneous coverage of the whole solution space, (ii) no overlap of the space explored by each island. The behavior of the distributed Estimation of Distribution Algorithms (dEDAs) for continuous optimization is thoroughly analyzed as part of this thesis. The study infers the values of the parameters that obtain the best performance in a selected competitive scenario, as well as the relationships between them. Special emphasis is placed on comparing the methods available for exchanging information: individuals or models. In the third study, several competitive HEAs are defined and compared against the state-of-the-art algorithms in continuous optimization. These include a heterogeneous dEA, a memetic Differential Evolution (DE) algorithm and an adaptive High-level Relay Hybrid (HRH) algorithm. To design the adaptive algorithm, an extension to the Multiple Offspring Sampling (MOS) framework is conducted for defining HRH algorithms. All of the proposed algorithms achieve significant results, including some of the best results for the selected benchmarks. The final objective of this thesis is to introduce a mechanism to learn how to control the combination in HEAs. This task is achieved by a new framework that automatically generates competitive hybridization strategies in HRH algorithms. This procedure uses the information from several measures of the algorithms in past executions to infer a new model that best characterizes the beneficial combination patterns..

(8) To conclude, each of the proposals is tested on a set of well-known benchmarks on continuous optimization. Their results are compared with state of the art algorithms on continuous optimization by means of several statistical procedures. Keywords: Evolutionary Computation, Hybrid Algorithms, Continuous Optimization, distributed Evolutionary Algorithm, Multiple Offspring Sampling..

(9) Resumen Los Algoritmos Evolutivos (AEs) son un conjunto de técnicas de optimización que han recibido una gran atención en las últimas décadas. Al tratarse de algoritmos de propósito general, han sido aplicados a problemas de muy diversa índole, muchos de los cuales se encuadran en el contexto de aplicaciones científicas e industriales. Se han propuesto distintas aproximaciones, cada una con una estrategia distinta para afrontar el proceso de optimización. Los Algoritmos Evolutivos Híbridos (AEHs) son una eficaz alternativa para afrontar un problema de optimización por medio de un AE. La combinación de varias algoritmos les permite aprovechar las fortalezas de cada uno de los algoritmos involucrados. Se ha demostrado que, por medio de una adecuada selección de algoritmos y estrategias de hibridación, es posible construir un AEH que supere a cada uno de los algoritmos individuales gracias a las relaciones de sinergia que se generan en el proceso de combinación. Esta característica ha supuesto la principal motivación para los estudios que se han llevado a cabo en el desarrollo de esta tesis, analizando, cada uno de ellos, un aspecto clave de la combinación con el fin de mejorar el diseño de los AEHs. Algunos de los aspectos claves que se han estudiado en esta tesis son los siguientes: el uso preliminar de algoritmos para llevar a cabo la inicialización de las soluciones, mecanismos de control para gestionar el intercambio de información en modelos distribuidos, estrategias de hibridación adaptativas y la generación automática de estas mismas estrategias. En el primer trabajo se propone un nuevo método de inicialización para los algoritmos evolutivos distribuidos. Este mecanismo hace uso de una herramienta topológica con el objetivo de generar un conjunto de regiones independientes que cubran, lo más uniformemente posible, el espacio de búsqueda global. Cada una de estas regiones constituye el espacio inicial de búsqueda de cada uno de los nodos del modelo distribuido. El comportamiento del modelo distribuido de los algoritmos de estimación de la probabilidad en dominios continuos ha sido extensamente analizado en el siguiente estudio. Este trabajo ha permitido inferir los valores de los parámetros, así como sus relaciones, que mejor rendimiento han obtenido. Se ha puesto un especial énfasis en comprar los distintos métodos para el intercambio de información. En el tercer estudio se proponen y analizan varios AEHs utilizando, cada uno de ellos, un enfoque distinto para la combinación: desde un algoritmo evolutivo distribuido heterogéneo hasta un algoritmo híbrido High-level Relay Hybrid (HRH) obtenido por medio de una extensión al framework MOS para la generación de algoritmos HRH. Cada uno de estos algoritmos ha demostrado un rendimiento excelente, obteniendo, en la mayoría de los casos, los mejores resultados del conjunto de pruebas utilizado. El objetivo final de esta tesis es el desarrollo de un mecanismo que permita aprender el control de la gestión de la combinación de algoritmos. Esta tarea se ha llevado a cabo por medio de un nuevo “framework” que es capaz de generar automáticamente estrategias de hibridación para algoritmos.

(10) HRH. Para ello, analiza la información de varias medidas recogidas en varias ejecuciones, con el fin de construir un modelo que recoge los mejores patrones de combinación. Para concluir, cada una de las propuestas han sido evaluadas en un conjunto representativo de problemas de optimización continuos, comparando sus resultados, con algoritmos de referencia del estado del arte y por medio de distintos procedimientos estadísticos. Palabras clave: Computación Evolutiva, Algoritmos Híbridos, Optimización Continua, Algoritmos Distribuidos Evolutivos, Multiple Offspring Sampling..

(11) Table of Contents Table of Contents. i. List of Figures. vii. List of Tables. ix. Acronyms and Definitions. I. INTRODUCTION. Chapter 1 1.1. 1.2. II. xiii. 1. Introduction. 3. Motivations, Objectives and Roadmap . . . . . . . . . . . . . . . . . . . . . . . . .. 4. 1.1.1. Initialization technique for distributed models . . . . . . . . . . . . . . . . .. 4. 1.1.2. Analysis of the behavior of dEDAs . . . . . . . . . . . . . . . . . . . . . .. 5. 1.1.3. Designing efficient Hybrid Algorithms . . . . . . . . . . . . . . . . . . . .. 7. 1.1.4. Automatic development of hybrid strategies . . . . . . . . . . . . . . . . . .. 8. 1.1.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 10. Document Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 10. STATE OF THE ART. Chapter 2 2.1. 2.2 2.3. 11. Metaheuristics. 13. Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 16. 2.1.1. Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 16. 2.1.2. Evolution Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 18. Estimation of Distribution Algorithms . . . . . . . . . . . . . . . . . . . . . . . . .. 19. 2.2.1. Learning Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 20. Differential Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 22. 2.3.1. 22. Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i.

(12) 2.3.2. Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 23. 2.3.3. Recombination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 23. 2.3.4. Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 24. 2.4. Multiple Trajectory Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 25. 2.5. Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 25. Chapter 3. III. Adaptation and Hybridization. 30. 3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 30. 3.2. Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 31. 3.3. Hybridization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 33. 3.4. Previous Work on Adaptive and Hybrid Evolutionary Algorithms . . . . . . . . . . .. 35. 3.4.1. Low-level relay hybrid (LRH) algorithms . . . . . . . . . . . . . . . . . . .. 35. 3.4.2. High-level Relay Hybrid (HRH) algorithms . . . . . . . . . . . . . . . . . .. 35. 3.4.3. Low-level teamwork hybrid (LTH) algorithms . . . . . . . . . . . . . . . . .. 37. 3.4.4. High-level teamwork hybrid (HTH) algorithms . . . . . . . . . . . . . . . .. 37. 3.4.5. Hybrid Algorithms with Collaborative Behavior . . . . . . . . . . . . . . . .. 38. 3.4.6. Hybrid Algorithms with Competitive and Adaptive Behavior . . . . . . . . .. 38. 3.4.7. Hybrid Algorithms with Competitive and Self-Adaptive Behavior . . . . . .. 39. 3.4.8. Hybrid Algorithms with a Shared Population . . . . . . . . . . . . . . . . .. 40. 3.4.9. Heterogeneous Hybrid Algorithms with Different Operators . . . . . . . . .. 40. 3.4.10 Heterogeneous Hybrid Algorithms with Different Values for Parameters . . .. 40. 3.4.11 Heterogeneous Hybrid Algorithms with Different Encodings . . . . . . . . .. 40. 3.5. Multiple Offspring Sampling algorithm . . . . . . . . . . . . . . . . . . . . . . . .. 41. 3.6. Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 44. PROPOSAL. Chapter 4. 45. A New Initialization Procedure for the Distributed Evolutionary Algorithms. 47. 4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 47. 4.2. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 48. 4.3. Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 49. 4.4. Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 50. 4.5. Analysis of the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 52. 4.6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 59.

(13) Chapter 5. Migrating individuals and probabilistic models on dEDAs: A comparison on. continuous functions. 61. 5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 61. 5.2. Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 63. 5.3. Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 64. 5.4. Analysis of the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 65. 5.4.1. Overall analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 65. 5.4.2. Comparing the best configurations . . . . . . . . . . . . . . . . . . . . . . .. 69. 5.4.3. Comparing the information exchange methods . . . . . . . . . . . . . . . .. 70. 5.4.4. Characterization of the configurations . . . . . . . . . . . . . . . . . . . . .. 71. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 73. 5.5. Chapter 6. Heterogeneous Distributed Evolutionary Algorithms for Continuous Optimiza-. tion Functions. 83. 6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 83. 6.2. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 83. 6.3. Contribution and Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . .. 84. 6.4. Analysis of the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 86. 6.5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 91. Chapter 7. A Memetic Differential Evolution Algorithm for Continuous Optimization. 95. 7.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 95. 7.2. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 96. 7.3. Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 97. 7.4. Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 98. 7.5. Analysis of the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 98. 7.5.1 7.6. Scalability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102. Chapter 8. A MOS-based Dynamic Memetic Differential Evolution Algorithm for Contin-. uous Optimization. 103. 8.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103. 8.2. Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104. 8.3. 8.2.1. Multiple Offspring Sampling for HRH algorithms . . . . . . . . . . . . . . . 104. 8.2.2. Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105. Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 8.3.1. Benchmark Suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106.

(14) 8.4. 8.5. Analysis of the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 8.4.1. Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109. 8.4.2. Scalability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111. 8.4.3. Behavior Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114. Chapter 9. A New Methodology for the Automatic Creation of Adaptive Hybrid Algo-. rithms. IV. 117. 9.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117. 9.2. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118. 9.3. Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 9.3.1. Proposed Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120. 9.3.2. baseHRH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121. 9.3.3. smartHRH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122. 9.4. Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122. 9.5. Analysis of the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124. 9.6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128. CONCLUSIONS AND FUTURE WORK. Chapter 10 Conclusions. 135 137. 10.1 Initialization technique for distributed models . . . . . . . . . . . . . . . . . . . . . 137 10.2 Analysis of the behavior of distributed Estimation of Distribution Algorithms (dEDAs) 138 10.3 Designing efficient hybrid algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 139 10.4 Automatic development of hybrid strategies . . . . . . . . . . . . . . . . . . . . . . 140 10.5 Selection of the Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 10.6 Computational Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 10.7 Selected publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Chapter 11 Future Work. 145. 11.1 Initialization technique for distributed models . . . . . . . . . . . . . . . . . . . . . 145 11.2 Analysis of the behavior of dEDAs . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 11.3 Multiple Offspring Sampling (MOS) HRH framework . . . . . . . . . . . . . . . . . 146 11.4 Automatic development of hybrid strategies . . . . . . . . . . . . . . . . . . . . . . 147 11.5 Application Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 11.5.1 Neuroscience Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 11.5.2 Dial-A-Ride Problem for a complex scenario . . . . . . . . . . . . . . . . . 152.

(15) V. APPENDICES. 155. Appendix A Benchmarks. 157. A.1 CEC 2005 Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 A.1.1 Unimodal Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 A.1.2 Basic Multimodal Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 160 A.1.3 Expanded Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 A.1.4 Composition Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 A.2 MAEB 2009 Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 A.3 ISDA 2009 Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 A.4 Special Issue on Large Scale Continuous Optimization Problems . . . . . . . . . . . 173 Appendix B Validation Procedures. 175. B.1 The Wilcoxon matched-pairs signed ranks test . . . . . . . . . . . . . . . . . . . . . 175 B.2 nwins Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 B.3 Holm’s and Hochberg’s Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Bibliography. 179.

(16)

(17) List of Figures 2.1. Example of a graphical model for x = (A, B, C, D) . . . . . . . . . . . . . . . . . .. 20. 2.2. Binomial Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 24. 2.3. Exponential Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 24. 3.1. Taxonomy of parameter setting proposed by Eiben . . . . . . . . . . . . . . . . . .. 32. 3.2. Taxonomy of hybrid algorithms by [LaT09] . . . . . . . . . . . . . . . . . . . . . .. 33. 4.1. Initialization Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 49. 4.2. Initialization Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 51. 4.3. Evolution of the average score for the f 20 function . . . . . . . . . . . . . . . . . .. 58. 5.1. Parameter values of the best configurations on 50-D . . . . . . . . . . . . . . . . . .. 66. 5.2. Parameter values of the best configurations on 100-D . . . . . . . . . . . . . . . . .. 66. 5.3. Parameter values of the best configurations on 200-D . . . . . . . . . . . . . . . . .. 66. 5.4. Parameter values of the worst configurations on 50-D . . . . . . . . . . . . . . . . .. 67. 5.5. Parameter values of the worst configurations on 100-D . . . . . . . . . . . . . . . .. 67. 5.6. Parameter values of the worst configurations on 200-D . . . . . . . . . . . . . . . .. 67. 5.7. Induction tree from the DBScan cluster . . . . . . . . . . . . . . . . . . . . . . . .. 72. 6.1. Evolution of the average score in the 10-D f 6 function . . . . . . . . . . . . . . . .. 89. 6.2. Evolution of the average score in the 10-D f 13 function . . . . . . . . . . . . . . . .. 90. 6.3. Evolution of the average score in the 10-D f 17 function . . . . . . . . . . . . . . . .. 90. 6.4. Evolution of the average score of each island in the 10-D f 17 . . . . . . . . . . . . .. 91. 6.5. Evolution of the average score of each island in the 10-D f 15 . . . . . . . . . . . . .. 92. 7.1. Scalability plots for MDE-DC in logarithmic scale . . . . . . . . . . . . . . . . . . 102. 8.1. Comparison of the average score for f 2 in 200-D . . . . . . . . . . . . . . . . . . . 109. 8.2. Comparison of the average score for f 13 in 200-D . . . . . . . . . . . . . . . . . . 110. 8.3. Scalability plots for MOS in logarithmic scale . . . . . . . . . . . . . . . . . . . . . 112 vii.

(18) 8.4. Participation adjustment for f 2 and f 9 . . . . . . . . . . . . . . . . . . . . . . . . . 113. 8.5. Participation adjustment for f 13 and f 17 . . . . . . . . . . . . . . . . . . . . . . . 113. 8.6. Active quality function graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114. 9.1. Generation of the data records and learning procedure . . . . . . . . . . . . . . . . . 129. 9.2. Comparison of the average score for f 2 in 200-D . . . . . . . . . . . . . . . . . . . 131. 9.3. Comparison of the average score for f 8 in 200-D . . . . . . . . . . . . . . . . . . . 131. 9.4. Comparison of the average score for f 13 in 200-D . . . . . . . . . . . . . . . . . . 132. 9.5. Comparison of the average score for f 17 in 200-D . . . . . . . . . . . . . . . . . . 132. 9.6. Evolution of the score for f 17 in 200-D . . . . . . . . . . . . . . . . . . . . . . . . 133. 9.7. Scalability plots in logarithmic scale . . . . . . . . . . . . . . . . . . . . . . . . . . 134. 10.1 Temporal development of the studies carried out in this thesis . . . . . . . . . . . . . 142 11.1 P300 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 11.2 Example of a IBM-SEM image containing vesicles . . . . . . . . . . . . . . . . . . 151 11.3 Results obtained with the Circular Hough Transform method . . . . . . . . . . . . . 152 A.1 3-D plots of the Sphere function and Schwefel’s problem 1.2 . . . . . . . . . . . . . 159 A.2 3-D plots of the High Conditioned Elliptic function and Schwefel’s problem 1.2 . . . 159 A.3 3-D plot of Schwefel’s problem 2.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 A.4 3-D plots of Rosenbrock’s and Griewank’s functions . . . . . . . . . . . . . . . . . 161 A.5 3-D plots of Ackley’s and Rastrigin’s functions . . . . . . . . . . . . . . . . . . . . 162 A.6 3-D plots of Rastrigin’s and Weierstrass functions . . . . . . . . . . . . . . . . . . . 163 A.7 3-D plots of Schwefel’s problem 2.13 . . . . . . . . . . . . . . . . . . . . . . . . . 164 A.8 3-D plots of F13 and F14 functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 A.9 3-D plots of F15 and F16 functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 A.10 3-D plots of F17 and F18 functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 A.11 3-D plots of F19 and F20 functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 A.12 3-D plots of F21 and F22 functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 A.13 3-D plots of F23 and F24 functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 A.14 3-D plots of the Shifted Sphere function and Schwefel’s problem 2.21 . . . . . . . . 169 A.15 3-D plots of Shifted Rosenbrock’s and Rastrigin’s functions . . . . . . . . . . . . . . 170 A.16 3-D plots of Shifted Griewank’s and Ackley’s functions . . . . . . . . . . . . . . . . 171 A.17 3-D plots of Shifted Schwefel’s Problem 2.22 and Schwefel’s Problem 1.2 functions . 172 A.18 3-D plots of Shifted Extended f10 and Shifted Bohachevsky functions . . . . . . . . 173 A.19 3-D plot of Shifted Schaffer function . . . . . . . . . . . . . . . . . . . . . . . . . . 173.

(19) List of Tables 3.1. Some of the parameters of EAs subject to adaptation . . . . . . . . . . . . . . . . .. 31. 4.1. Parameters chosen for the Voronoi initialization experiments . . . . . . . . . . . . .. 52. 4.2. #N of EDA configurations in 10-D with significant differences . . . . . . . . . . . .. 53. 4.3. #N of EDA configurations in 30-D with significant differences . . . . . . . . . . . .. 53. 4.4. #N of GA configurations in 10-D with significant differences . . . . . . . . . . . . .. 53. 4.5. #N of GA configurations in 30-D with significant differences . . . . . . . . . . . . .. 54. 4.6. Rules that characterize the behavior of both initialization methods . . . . . . . . . .. 55. 4.7. Average ranking and nwins of the best algorithms . . . . . . . . . . . . . . . . . . .. 56. 4.8. p − values of the comparisons of the best configurations . . . . . . . . . . . . . . .. 57. 4.9. Average number of individuals exchanged between the best configurations . . . . . .. 59. 5.1. Parameters Values of the dEDAs configurations . . . . . . . . . . . . . . . . . . . .. 65. 5.2. Average Rankings and nwins 50-D . . . . . . . . . . . . . . . . . . . . . . . . . . .. 74. 5.3. Average Rankings and nwins 100-D . . . . . . . . . . . . . . . . . . . . . . . . . .. 75. 5.4. Average Rankings and nwins 200-D . . . . . . . . . . . . . . . . . . . . . . . . . .. 76. 5.5. Average error of the best configurations . . . . . . . . . . . . . . . . . . . . . . . .. 77. 5.6. Average Ranking and nwins of the best configurations . . . . . . . . . . . . . . . . .. 78. 5.7. Statistical validation of the best configurations . . . . . . . . . . . . . . . . . . . . .. 78. 5.8. Average ranking and nwins per function on 50-D functions . . . . . . . . . . . . . .. 79. 5.9. Average ranking and nwins per function on 100-D functions . . . . . . . . . . . . .. 80. 5.10 Average ranking and nwins per function on 200-d functions . . . . . . . . . . . . . .. 81. 5.11 Average error of the best sequential EDA configurations with and without elitism . .. 81. 6.1. Heterogeneous dEAs parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 85. 6.2. Distributed Model Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 86. 6.3. Average Error on 10-D Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 87. 6.4. Average Error on 30-D Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 88. 6.5. Average Ranking and nwins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 88. ix.

(20) 6.6. Statistical validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 89. 6.7. Average Ranking and nwins of the algorithms of the MAEB 2009 Session . . . . . .. 93. 7.1. Algorithm Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 98. 7.2. Average Error on 50-D Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 99. 7.3. Average Error on 100-D Functions . . . . . . . . . . . . . . . . . . . . . . . . . . .. 99. 7.4. Average Error on 200-D Functions . . . . . . . . . . . . . . . . . . . . . . . . . . .. 99. 7.5. Average Error on 500-D Functions . . . . . . . . . . . . . . . . . . . . . . . . . . .. 99. 7.6. Average Ranking and nwins for the first comparison . . . . . . . . . . . . . . . . . . 100. 7.7. Statistical validation for the first comparison . . . . . . . . . . . . . . . . . . . . . . 100. 7.8. Average Ranking and nwins for the second comparison . . . . . . . . . . . . . . . . 101. 7.9. Statistical validation for the second comparison . . . . . . . . . . . . . . . . . . . . 101. 8.1. Configuration of the MOS-based algorithm . . . . . . . . . . . . . . . . . . . . . . 107. 8.2. Average Error on 50-D Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108. 8.3. Average Error on 100-D Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 108. 8.4. Average Error on 200-D Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 108. 8.5. Average Error on 500-D Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 108. 8.6. Average Ranking and nwins for the first comparison . . . . . . . . . . . . . . . . . . 110. 8.7. Statistical validation for the first comparison . . . . . . . . . . . . . . . . . . . . . . 111. 8.8. Average Ranking and nwins for the second comparison . . . . . . . . . . . . . . . . 111. 8.9. Statistical validation for the second comparison . . . . . . . . . . . . . . . . . . . . 111. 9.1. Configuration of the algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124. 9.2. Average Error in 50-D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125. 9.3. Average Error in 100-D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126. 9.4. Average Error in 200-D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127. 9.5. Average Error in 500-D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128. 9.6. Functions in which, for each comparison, statistical differences were found . . . . . 130. 9.7. Average Ranking and nwins up to 500 dimensions . . . . . . . . . . . . . . . . . . . 133. 9.8. Statistical validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133. 9.9. Example of the rules obtained for different functions on 200-D . . . . . . . . . . . . 134. A.1 Hybrid Functions definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.

(21) List of Algorithms 1. Classic Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 18. 2. Estimation of Distribution Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . .. 19. 3. Differential Evolution Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 23. 4. Multiple Trajectory Search (MTS) Algorithm . . . . . . . . . . . . . . . . . . . . . .. 26. 5. MTS-LS1(Xk , Improve, SR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 27. 6. MTS-LS2(Xk , Improve, SR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 28. 7. MTS-LS3(Xk , SR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 29. 8. MOS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 43. 9. D2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 49. 10. Island population initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 50. 11. MDE-DC Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 97. 12. HRH MOS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105. 13. Example of a simple strategy for carrying out the combination of the algorithms . . . 118. 14. Combination of functions (Functions f 12 − f 19). xi. . . . . . . . . . . . . . . . . . . . 174.

(22)

(23) Acronyms and Definitions ACO Ant Colony Optimization BCI Brain Computer Interface BMDA Bivariate Marginal Distribution Algorithm BSC Bit-Based Simulated Crossover CBBP Cajal Blue Brain Project CBR Case Based Reasoning CeSViMa Madrid Supercomputing and Visualization Center CMA-ES Covariance Matrix Adaptation Evolution Strategy DARP Dial-A-Ride Problem IPOP-CMA-ES Incremental Population Covariance Matrix Adaptation Evolution Strategy COMIT Combining Optimizers with Mutual Information Trees DE Differential Evolution dEA distributed Evolutionary Algorithm dEDA distributed Estimation of Distribution Algorithm pEDA parallel Estimation of Distribution Algorithm dGA distributed Genetic Algorithm EA Evolutionary Algorithm EBNA Estimation of Bayesian Networks Algorithm EDA Estimation of Distribution Algorithm xiii.

(24) EGNA Estimation of Gaussian Networks Algorithm EM N Aglobal Estimation of Multivariate Normal Algorithm ES Evolution Strategy ESs FE Fitness Evaluation FWER Family-Wise Error GA Genetic Algorithm HA Hybrid Algorithm HEA Hybrid Evolutionary Algorithm HRH High-level Relay Hybrid HTH High-level teamwork hybrid ILS Iterated Local Search LRH Low-level relay hybrid LTH Low-level teamwork hybrid LS Local Search LSs MA Memetic Algorithm ML Machine Learning MIBOA Mixed-Integer Bayesian Optimization Algorithm MIMIC Mutual Information Maximization for Input Clustering MOS Multiple Offspring Sampling MTS Multiple Trajectory Search NFL No Free Lunch NM Nelder-Mead PBIL Population-Based Incremental Learning PF Participation Function.

(25) PR Participation Ratio PSO Particle Swarm Optimization PGM Probabilistic Graphical Model QF Quality Function SA Simulated Annealing TS Tabu Search TSP Traveling Salesman Problem UMDA Univariate Marginal Distribution Algorithm U M DAg Univariate Marginal Distribution Algorithm for Gaussian Models VNS Variable Neighborhood Search.

(26)

(27) Part I. INTRODUCTION. 1.

(28)

(29) Chapter. 1. Introduction In spite of the wide range of application fields and good results that Evolutionary Algorithms (EAs) have obtained in complex optimization problems, their results are not always as good as one might expect. Most researchers test usually only a few different algorithms when trying to solve a particular optimization problem. Even if the most suitable algorithm for that problem has been selected, it is hard to find its best configuration (parameters and set of operators). Additionally, different algorithms work better in different problems. This is in accordance with the No Free Lunch (NFL) theorem [WM97], which states that for any algorithm with an outstanding performance in a given problem there is always another problem in which other algorithms perform better. Although the NFL theorem is based on certain extreme theoretical considerations [DJW02, Olt04], real-world problems also show differences in the comparative performance of several algorithms. In most cases, the selection of the most appropriate algorithm is carried out through the execution of several alternative algorithms (advised by the literature or through experience) and the choosing the one reporting the best results. Supported by these arguments, Hybrid Evolutionary Algorithms (HEAs) are a promising alternative to deal with these situations. Over recent years, the best results for many practical or academic problems have been found using hybrid metaheuristics [Tal02, CNT09, MPLR09, LMP11]. By combining different heuristic optimization approaches, it is possible to profit from the benefits of the best approach, avoiding the limitations of the others. Furthermore, in the latest competitions on continuous optimization, most of the best performance algorithms have been achieved with hybrid approaches. According to the studies conducted by Sinha and Goldberg [SG03], there are three main reasons for the hybridization of EAs: 1. An improvement in the performance of the EA (for example, the convergence speed). 2. An improvement in the quality of the solutions obtained by the EA. 3. To incorporate the EA as a part of a larger system..

(30) 4. CHAPTER 1. INTRODUCTION. The work on this thesis will focus on the first two reasons for hybridization in Evolutionary Algorithms, improvement of performance of the EAs and the quality of the obtained solutions, but especially on the latter. The rest of the chapter is organized as follows: the main motivations and objectives for this research will be detailed in Section 1.1 whereas Section 1.2 summarizes the organization and the contents of this document.. 1.1. Motivations, Objectives and Roadmap. As previously mentioned, HEAs are an effective alternative when approaching the optimization of a problem by means of metaheuristic algorithms. The combination of several algorithms allows them to exploit the strength of each of the algorithms involved throughout the evolutionary process. Furthermore, it has been proven that, by means of the proper selection of algorithms and hybridization strategies, i.e., the criteria and the parameter values that define the combination, it is possible to obtain Hybrid Evolutionary Algorithms that outperform their composing algorithms thanks to the synergy relationships yielded by the hybridization [HWC00, LPRM08]. This characteristic has been the main motivation for the studies that have been carried out for the development of this thesis. Each of these studies has analyzed a different key factor in the combination of the algorithms with the aim of designing more efficient HEAs. For the evaluation of the results, standard benchmarks have been used, some of them being developed during the drawing up of this thesis, and state of the art algorithms have been chosen for comparing the results. These works are briefly detailed below along with the motivations that started them, the hypotheses that were proposed and the objectives that were planned to be accomplished.. 1.1.1. Initialization technique for distributed models. The first study that was carried out focused on the development of a new initialization technique for its combination with distributed Evolutionary Algorithms (dEAs). dEAs are an example of Hybrid Algorithms where several nodes or islands execute an evolutionary algorithm over an independent subpopulation in parallel. The islands cooperate between themselves by exchanging their information according to a migration strategy (toopology, migration frequency, migration rate, etc.). These models have been widely used in the past, improving both the numerical and runtime behavior of the sequential algorithm in many cases [AT02, RMAHL08]..

(31) 1.1. MOTIVATIONS, OBJECTIVES AND ROADMAP. 5. 1.1.1.1 Motivation The initialization of population-based Evolutionary Algorithms is hardly addressed in the literature [KS97]. Nevertheless, every expert in the field agrees that a bad initialization can make evolution converge prematurely at suboptimal solutions. In many cases, the initialization process depends on the application field if an approximate solution to the problem is known. Otherwise, if the individuals of the population can be built through certain heuristic techniques, these could be a good starting point to reach the optimum. Island models are especially sensitive to initialization because of the possible mutual dependence between the different populations of the islands. There is very little literature on this topic and, therefore, this is one of the aspects to be studied in depth. The main characteristic of the Distributed or island models is the relative independence of the algorithms and the populations which allows them to maintain, in general, a higher diversity than their sequential counterparts [ALN04]. For this work, it was proposed to extend this idea to the initialization process of a dEA by dividing the initial search space into independent regions and assigning each of the regions to a single node or island. 1.1.1.2 Hypothesis This work proposes the hypothesis that, by carrying out an appropriate partition of the search space and assigning each partition to a different node, it is possible to generate an initial supply of individuals (not necessarily better than the individuals that could be generated by a traditional method) which would contribute to improving the final performance of the distributed algorithm. Moreover, this kind of initialization could potentially obtain better results with highly multimodal algorithms due to the inherent reduction in the number of optima of each of the regions. 1.1.1.3 Objectives The main objective of this line of work is to develop a new initialization mechanism to generate independent and uniform search regions to their assignment as the initial population of the nodes of a distributed algorithm. This mechanism is going to be tested with the most studied distributed Evolutionary Algorithms in the literature: the Genetic Algorithms (GAs) and the Estimation of Distribution Algorithms (EDAs). An experimental scenario in continuous optimization will be used to validate the results.. 1.1.2. Analysis of the behavior of dEDAs. Distributed Estimation of Distribution Algorithms dEDAs are a subtype of the aforementioned dEAs where the Estimation of Distribution Algorithm (EDA) is used in the distributed or hybrid.

(32) 6. CHAPTER 1. INTRODUCTION. algorithm. In general terms, EDAs are similar to GAs, but their main characteristic is the use of probabilistic models to extract information from the most promising individuals of the current population (instead of using crossover or mutation operators) in order to create a new and presumably better population. The complexity of the different EDAs is usually related to the probabilistic model used, and the ability of that model to identify and represent the (in)dependencies between the variables. As regards the information exchanged between the distributed Estimation of Distribution Algorithms (dEDAs), two possible alternatives are available: (i) the straightforward approach of selecting a pool of individuals that will latterly be sent to the consignees and (ii) the alternative of sending the main characteristic of EDAs: the probabilistic models. These probabilistic models will be (or should be) able to represent the (in)dependencies between the variables, and, therefore, comprise more information than a group of individuals. In this approach, the method for combining the models must be also defined.. 1.1.2.1 Motivation The study of the state of the art from the previous line of work revealed that, very few studies have been carried out to analyze the influence of the migration parameters in the final performance of dEDAs and, in particular, in continuous optimization. Furthermore, a small number of these studies have tried to compare the performance of both methods for exchanging the information: individuals or models [dGP06, dGP05]. In these papers, the authors concluded that the migration of models obtains significantly better results than the migration of individuals. This fact seems surprising since, from an intuitive point of view, with highly multimodal problems (very common in most of the modern benchmarks), the migration and later combination of models from different basin of attractions could add less precise information to the new model. However, it must be taken into account that, in these studies, the experimental scenario was restricted to i) a limited number of problems with small dimensions and ii) a small number of parameters.. 1.1.2.2 Hypotheses Based on the information previously presented, two new hypotheses were proposed for this study:. • It is possible to infer some relationships between the migration parameters value of dEDAs that could determine the performance of the distributed configurations.. • The migration of individuals of dEDAs in highly multimodal continuous functions obtains better results than the migration of models..

(33) 1.1. MOTIVATIONS, OBJECTIVES AND ROADMAP. 7. 1.1.2.3 Objectives For this study it was planned to carry out a broad study into the performance of the migration parameters of the dEDAs using a reference benchmark in continuous optimization. This study should be able to infer the values that obtain the best performance and the relationships that could arise between them. Furthermore, this study should place especial emphasis on comparing both methods for exchanging information. In order to compare the results with the literature, the same EDAs configurations and model combination methods would be used.. 1.1.3. Designing efficient Hybrid Algorithms. The previous studies focused their work on different aspects of the homogeneous version of the distributed Evolutionary Algorithm where all the nodes execute the same Evolutionary Algorithm. The next step when dealing with hybrid EAs is to try to combine algorithms with different exploratory approaches in order to benefit from the different advantages of each approach. Therefore, a heterogeneous dEA was designed and presented in the competition on continuous optimization that was carried out at the MAEB 2009 conference [MPLR09]. Although this algorithm obtained one of the best results of the competition, this combination strategy is not appropriate when trying to combine an EA with a non population-based algorithm like a local search. For this reason, it was decided to redirect the main focus to a different type of hybridization: the High-level Relay Hybrid (HRH) approach. In this kind of combination, the algorithms are self-contained and applied in sequence one after another iteratively. Intensive research has been carried out into these models recent years. The most common approach, also called a Memetic Algorithm, takes a population-based EA and, after each iteration, carries out an exploitative local search like, for example, a hill-climbing search, into the best solution of the population. Thus, the EA carries out the exploratory part of the search while the local search refines the best solutions found. Following this idea, a new HRH algorithm called MDE-DC was designed and presented at the workshop on continuous optimization held at the ISDA 2009 conference [MLP09]. This algorithm combined a Differential Evolution (DE) algorithm with one of the best linear local searches of the literature, the LS1 of the Multiple Trajectory Search algorithm [TC08], obtaining the best results of the workshop. 1.1.3.1 Motivation The MDE-DC algorithm uses a static model for carrying out the combination of the algorithms, i.e., the combination scheme of the algorithms is fixed and does not change between the executions. A more sophisticated approach is to use an adaptive model where the participation of each algorithm.

(34) 8. CHAPTER 1. INTRODUCTION. is dynamically adjusted. In general, these approaches tend to obtain better results than the static approaches but need to be carefully designed by the researcher in order to create successful combination strategies. The research group I belong to has had experience in the past with the creation of adaptive hybrid frameworks. In particular, the Multiple Offspring Sampling (MOS) framework was defined for the combination of several EAs when executed in parallel on the same global population [LaT09]. In this framework, the algorithms are rated based on a quality function that analyzes the quality of the new solutions generated by each algorithm. Depending on this value, the algorithms are assigned a number of individuals from the new population to be generated in the next iteration. This framework has obtained very competitive results in the past although it has never been applied to HRH algorithms. Furthermore and, as far as the author is concerned, this kind of approach has never been applied with memetic algorithms. 1.1.3.2 Hypothesis Therefore, for the next line of work, the following hypothesis was proposed: the design of an adaptive combination strategy, based on the MOS framework for HRH algorithms, should be able to improve or, at least equal, the results obtained with a static combination strategy. 1.1.3.3 Objectives Two main objectives were defined for this study: to adapt the MOS framework for the HRH hybridization model and to design, based on this new framework, an adaptive HRH algorithm that combines the same composing algorithms as the MDE-DC algorithm. The new algorithm should be compared with the previous MDE-DC algorithm along with several other adaptive approaches from the state of the art. For the experiments, a standard (and recent) benchmark on continuous optimization should be used.. 1.1.4. Automatic development of hybrid strategies. As previously mentioned, there are two types of combination strategies when dealing with HRH algorithms: (i) a static strategy in which one of the algorithms is applied at the end of every (or some step(s) of the main algorithm or (ii) an adaptive strategy in which the execution of each algorithm depends on a measure that varies throughout the evolution. The adaptive approach involves a great improvement with respect to the static method as the researcher does not need to establish, before the execution, the combination sequence of the algorithms since it is computed dynamically . But even in this approach, the researcher still needs to design the heuristic to conduct the combination as well as the measures that the heuristic is going to use, which, in most of the cases, is a hard task to carry out..

(35) 1.1. MOTIVATIONS, OBJECTIVES AND ROADMAP. 9. 1.1.4.1 Motivation After the great results of the developed adaptive HRH algorithm (best in the competition that was presented), the next line of work that was proposed was to advance furthermore in the automatization of the generation of the HRH algorithms in order to generate a successful hybridization strategy, without assistance. The idea was to try to develop a mechanism that could avoid the researcher the burden of both the design of the heuristics and the selection of the measures. To that end, it was proposed to learn the best combination sequence of a HRH algorithm from previous executions, and use this information to build a new adaptive strategy. Few works have tried to apply a learning method as a regulatory mechanism for metaheuristics [OK04, SLL08, LMP11] but, as far as the author is concerned, none of them have tried to apply a learning procedure to determine the best hybridization strategy of a HRH algorithm. This approach could be particularly useful in those scenarios in which an optimization problem is solved multiple times. These multiple executions could include slightly different conditions that actually have an influence on the position of the optimal value, but do not change the main characteristics of the fitness landscape in which this optimization process searches. Many industrial problems have this characteristic in which the fitness function is mainly the same, but the particular conditions or other input parameters are changed at each execution, e.g., the optimization of engineering structures evaluated under different stress conditions.. 1.1.4.2 Hypothesis In this study, it is proposed that it is possible, based on the analysis of the behavior of previous executions, to learn a hybridization strategy for the algorithms of a HRH algorithm by means of learning techniques in order to select the most appropriate algorithm for each iteration of the execution.. 1.1.4.3 Objectives The first objective that was planned for this study was to design a new framework that would allow, given a HRH algorithm, to execute it and analyze its behavior in order to generate, from its results, a model of the best hybridization patterns that were observed during the execution. In order to use this model, a new HRH algorithm that could automatically incorporate this model and apply it in further executions was also planned to be developed. The MOS-based adaptive HRH algorithm, designed in the previous work, was selected as the base algorithm to compare the combination approach. The idea is that the new approach could, at least, mimic the behavior of the MOS algorithm or even discover new beneficial combination patterns..

(36) 10. CHAPTER 1. INTRODUCTION. 1.1.5. Summary. The work carried out for this thesis is motivated by the idea that it is possible to design Hybrid Evolutionary Algorithms that outperform their composing algorithms by means of a proper selection of algorithms and combination strategies. From this point of view, several studies into different aspects of the Hybrid Evolutionary Algorithms are proposed. Although most of them have been proposed as an improvement on the results obtained from a previous study, each of the proposals solves a specific problem by itself and could be applied without the modifications proposed in the previous studies. A list that summarizes the main objectives planned for this thesis is detailed below: 1. Development of a new initialization mechanism to generate independent and uniform search regions to their assignment as the initial population of the nodes of a distributed algorithm. 2. Broad study into the performance of the migration parameters of the dEAs in order to infer a model of the best performance configurations with a special emphasis on comparing both methods for exchanging information. 3. The adaptation of the MOS framework to the HRH hybridization model. 4. The design of an adaptive HRH algorithm, based on the HRH MOS framework, that combines the same composing algorithms of the MDE-DC algorithm. 5. Construction of a framework for the automatic generation of adaptive HRH strategies. 6. The design of a HRH algorithm that could automatically incorporate the model generated by the previous framework.. 1.2. Document Organization. To conclude this introduction, the structure of the document is as follows: Chapters 2 and 3 briefly review the state of the art in Metaheuristics, Hybridization and Adaptation. Chapter 4 describes and analyzes the proposed initialization method. In Chapter 5, the migration parameters of the dEDAs are thoroughly analyzed. Chapters 6 and 7 detail the design and results of two hybrid algorithms that have obtained one of the best results in the competitions in which they were presented. In Chapter 8 the extension of the MOS framework for HRH algorithms is proposed and analyzed. The last work of this thesis is described in Chapter 9. Here, an automatic procedure for the creation of HRH combination strategies is proposed and analyzed with a standard benchmark and compared against several algorithms. In Chapter 10, the main conclusions of this study are developed, whereas Chapter 11 defines the possible lines of research proposed for the continuation of this thesis. Finally, Appendices A and B detail all the benchmarks and the validation procedures used in this study..

(37) Part II. STATE OF THE ART.

(38)

(39) Chapter. 2. Metaheuristics Optimization is a field of vital importance in our daily life. It deals with finding the best (or good enough) solution among many alternatives to a given problem. Humans are constantly solving small optimization problems in their daily life such as finding the shortest path to a given location, timetable problems, . . . . In general, these problems are small enough for us to deal with them with no additional help. However, as these problems increase their size, such as the optimization of complex engineering structures, their solution leaves no choice but to solve them with the aid of computers. More formally, an optimization problem is a pair (S, f ) where S 6= ∅ represents the solutions search space of the problem, and f is the fitness function that is defined as follows: f :S→R. (2.1). The goal is then to find a solution s∗ ∈ S that, in case of a minimization problem, satisfies the following inequality: f (s∗ ) ≤ f (s),. ∀s ∈ S. (2.2). Depending on the domain of S, we can define three types of optimization problems: (i) the ones with discrete domains (i.e. the domain consists of a finite set of discrete values), (ii) continuous domains and (iii) the ones with discrete as well as continuous domains. Due to the importance of the optimization problems, many algorithms to tackle them have been developed. They can be classified as either complete or approximate algorithms [BR03]. Complete algorithms guarantee to find an optimal solution for every instance of any problem in a bounded time. The problem with these methods is that they might need computation times too high for practical purposes. Therefore, the use of approximate methods has received more and more attention by the international community in the last decades. In these methods, the guarantee of finding an optimal solution is sacrificed for the sake of getting “good” solutions in a reasonable time. Among the ap-.

(40) 14. CHAPTER 2. METAHEURISTICS. proximate methods it is usually distinguished between constructive methods, local search methods and metaheuristics. Constructive algorithms are typically the fastest methods although they often return solution of inferior quality when compared to the remaining approximate methods. They construct a solution from scratch by adding, to an initially partial solution, components until a solution is complete. Finding constructive methods that produce good solution is a hard task since they are highly dependent of the problem and, for their design, an extensive knowledge of the problem must be acquired. For example, in problems with several constraints, most of the partial solutions could only lead to non-feasible solutions. Local Search algorithms start from an initial complete solution and iteratively try to replace the current solution by a better solution from one its neighborhood which can be formally defined as follows: Definition A neighborhood structure is a function N : S → 2S that assigns to every s ∈ S a set of neighbors N (s) ⊆ S. N (s) is called the neighborhood of s. Normally, neighborhood structures are implicitly defined by specifying the set of solutions that can be reached from s by the application of an specific modification operator, commonly called a move A local optimum is a solution better or at least equal than any other from its neighborhood. In the most basic Local Search (LS) method, each move is only performed if the resulting solution is better than the current solution. In many cases, the complete exploration of the neighborhood is unviable, having to follow different strategies that depart from the generic scheme. The performance of these methods is strongly correlated with the selection of the modification operator. In the last decades, a new kind of approximation algorithms has emerged with the basic idea of combining different heuristic methods in a higher level framework with the purpose of efficiently and effectively explore the search space. These methods are commonly called metaheuristics. This term was firstly introduced by Glover in [Glo86]. Before this term was completely accepted by the scientific community, these techniques were often called as modern heuristics [Ree93]. This class of algorithms includes, but is not restricted to, Swarm Intelligence, Evolutionary Algorithms, Iterated Local Search, Simulated Annealing and Tabu Search. There is no commonly accepted definition for the term metaheuristic. From the different definitions and descriptions in the literature, certain properties that characterize these methods can be enumerated: • Metaheuristics are strategies that guide the search process. • The objective is to efficiently explore the search space to find (near-)optimal solutions. • Metaheuristics are approximate and usually non-deterministic..

(41) 15. • They may incorporate mechanisms to avoid getting trapped in non-promising regions of the search space. • Metaheuristics may use of domain-specific knowledge in the form of heuristics that are controlled by the upper level strategy To sum up, it could be said that a metaheuristic is a high level strategy that uses different methods for exploring the search space. In other words, a metaheuristic is a non-deterministic general template that needs to be filled up with domain-specific data (encoding of the solutions, operators to modify them, etc.) that allow to tackle problems of great size or complexity. In these techniques, it is of great importance the dynamic balance between diversification and intensification. Diversification refers to the evaluation of solutions in distant regions of the search space (according to previously defined distance between solutions). It is also known as exploration of the search space. Intensification, also knows as exploitation, refers to the evaluation of solutions in bounded small regions (with regard to the global search space) and centered in the neighborhood of specific solutions. The balance between these two strategies is of great importance to quickly identify regions in the search space with high quality solutions avoiding the waste of too much time in regions that have been already explored or that do not contain high quality solutions. There are several ways to classify and describe the metaheuristics. Common ways for classifying metaheuristics are summarized below: • Nature-inspired vs. non-nature inspired. Perhaps the most simple approach for classifying the metaheuristics is to analyze the existence of a biological metaphor that inspired the metaheuristic. There are nature-inspired algorithms like Evolutionary Algorithms and Ant Colony Optimization and non-nature inspired ones such as Tabu Search and Iterated Local Search. This classification could be argued not to be very meaningful due to the difficulty to clearly assign an algorithm to one of the classes. • Population-based vs. single-point search. A commonly used classification is to analyze the number of solutions used at the same time: Algorithms working on single solutions are called trajectory methods and include Local Search based metaheuristics like Tabu Search, Iterated Local Search, Variable Neighborhood Search and Multiple Trajectory Search. In all these algorithms, the search process describes a trajectory in the search space. On the contrary, population-based metaheuristics conduct search processes that describe the evolution of a set of points. Examples of these kind of algorithms are Genetic Algorithms, Estimation of Distribution Algorithms, Differential Evolutions and Covariance Matrix Adaptation Evolution Strategy. • One vs. multiple neighborhood structures. Most of the metaheuristics mentioned earlier use a single neighborhood structure, i.e., the fitness landscape topology does not change along the.

(42) 16. CHAPTER 2. METAHEURISTICS. course of the algorithm. Other approaches, such as Variable Neighborhood Search, use a set of neighborhood structures that allows it to swap between different fitness landscapes. • Memory usage vs. memory-less methods. Another important characteristic used to classify metaheuristics is the use they make of the search history, i.e., whether they use memory or not. On the one hand, algorithms that do not register the search history do not use any past information for determining the next action in the current state of the search process. On the other hand, other algorithms keep track of visited solutions, performed moves or general decisions taken for carrying out the decision. Tabu Search is one of the best representatives of this kind of algorithms. Since there exists a great number of different metaheuristic approaches, this chapter is not intended to provide a detailed explanation of all of them. For this purpose, the reader is encouraged to check some external references, such as [Bäc95, LL01, BR03, Alb05, GP10]. In the next sections, the most important metaheuristics that have been used or mentioned in thesis will be briefly reviewed.. 2.1. Evolutionary Algorithms. Evolutionary Algorithms (EAs) are generic population-based metaheuristics for optimization that use some mechanisms inspired by Natural Evolution. In general, EAs make a population of candidate solutions to a problem evolve by means of some recombination operators. The suitability of these candidate solutions is measured by a fitness function that evaluates how good an individual for that particular problem is. The fittest individuals have more chances for being selected for the next recombination phase. Evolutionary Algorithms often provide good approximate solutions to complex problems of different fields. As they do not make any assumption about the underlying fitness landscape, EAs have been successfully applied to many disciplines such as engineering, physics, biology, genetics, etc. Different approaches have been proposed in the last decades. This section briely reviews each of the methods that is relevant for the work presented in this thesis.. 2.1.1. Genetic Algorithms. Genetic Algorithms (GAs) are the most usual Evolutionary Algorithms. Although the first work on this kind of algorithms is dated from the late fifties and early sixties [Bar54, Bar57, Fra57, FB70, Cro73], GAs were popularized by the work conducted by John H. Holland and his students from the University of Michigan and, in particular, by his book Adaptation in Natural and Artificial Systems [Hol75]. Since then, GAs have experienced a deep development and have been applied to solve complex problems in many different domains..

(43) 2.1. Evolutionary Algorithms. 17. GAs are closely related and inspired by Natural Evolution. In the real world, each individual of a given species in a population tries to transmit its genetic material to its offspring. In most cases, only the most suitable and adapted individuals are able to survive in their environment and breed new individuals. In the reproduction phase, the genetic material from both ancestors is combined in some way and transferred to one or more descendants. Additionally, the genetic information of the offspring is subject to small mutations result of environmental factors that, sometimes, make these individuals more suitable for surviving. This way, they get a new chance for reproducing and transmitting their genetic material. A Genetic Algorithm implements a simplified version of this metaphor. Its objective is to find the most suitable solution to a given problem, combining candidate solutions to generate new ones and making them compete for a number of generations. The main aspects to be considered are the following: • A representation for candidate solutions to the problem. Each individual in the population represents a candidate solution. This representation is also known as the genome or the chromosome of the individual. Many different encodings have been proposed for different problems such as, for example, bit or real strings and more complex ones, like trees or lists. • A metric for the suitability of the individuals. This value is known as the fitness of the individual and is problem-specific. Mathematically, the fitness function is defined as: f itness : D → R, D being the domain in which the genome representation is defined. For example, in the classic Traveling Salesman Problem [Rob49], the tour length is normally considered as the fitness value to measure the suitability of a candidate solution. • A crossover operator that, given two individuals, is able to combine the genetic information of both ancestors to generate one or more children. Usually, two parents are combined to generate two children, what is known as sexual crossover and mathematically defined as: Crossover : D × D → D × D. • A Mutation procedure, in which the genetic information of an individual is modified in some way. Mathematically: M utation : D → D. • A selection scheme that, given the fitness of the individuals, decides which individuals in the population will take part of the reproduction process. Once all these elements have been introduced, the general behavior of a Genetic Algorithm can be described as described in Algorithm 1..

(44) 18. CHAPTER 2. METAHEURISTICS. Algorithm 1: Classic Genetic Algorithm 1: Create initial population of candidate solutions P0 2: Evaluate initial population P0 3: while termination criterion not reached do 4: while select individuals from current population Pi do 5: Cross selected individuals with certain probability to generate new offspring 6: Mutate descendants with some probability 7: Evaluate new individuals 8: Add new individuals to the auxiliary population Pi0 9: end while 10: Combine populations Pi and Pi0 according to a pre-established criterion to generate Pi+1 11: Evaluate population Pi+1 12: end while This simple description of a Genetic Algorithm allows several configurations, depending on the selection scheme, the recombination operators (crossover and mutation) or the elitism mechanisms that are actually used.. 2.1.2. Evolution Strategies. Evolution Strategies (ESs) were proposed by P. Bienert, Ingo Rechenberg and Hans P. Schwefel in the mid sixties [Rec71, Sch74]. These techniques are general optimizers that can be applied to problems of different domains [BS02]. For this purpose, a quality function F must be provided. This quality function operates on a set of decision variables y := (y1 , . . . , yn ): F (y) → R, y ∈ Y Y must a set data of finite but not necessarily fixed length as, for example, the n-dimensional real, integer or combinatorial search space. Evolution Strategies work with populations of individuals. Each individual comprises not only the set of decision variables yk but also a set of endogenous strategy parameters sk . Therefore, an individual ak in a population Pi is defined as: ak := (yk , sk , F (yk )) The set of endogenous parameters are typical of Evolution Strategies and may be adapted through the evolutionary process. There are other parameters that remain constant through the search process that are known as exogenous parameters. These parameters give rise to two canonical versions of ESs: (µ/ρ, λ) − ES and (µ/ρ + λ) − ES. In this nomenclature, µ denotes the number of parents in the current population, ρ ≤ µ the mixing number, i.e., the number of parents involved in the creation of a descendant and λ the number of new individuals to be generated. The main difference between both types of ESs is how individuals.

(45) 2.2. Estimation of Distribution Algorithms. 19. are selected for the next generation. In the first case, the ’,’ recombination, offspring individuals completely replace those in the current population. Thus, µ < λ must hold, so that the convergence to an optimal solution is guaranteed. In the second case, the ’+’ recombination, individuals in current and offspring populations are combined and the best individuals from both of them are selected as parents for the next generation. Today, Evolution Strategies are one of the most powerful techniques in real number optimization, especially one of its variants, the Covariance Matrix Adaptation Evolution Strategy (CMA-ES), which has obtained remarkable success on complex optimization problems in the recent years [AH05].. 2.2. Estimation of Distribution Algorithms. Estimation of Distribution Algorithms (EDAs) show a similar behavior to the Genetic Algorithms presented earlier and, in fact, several authors consider EDAs as a new type of EAs. In EDAs, instead of using recombination operators to produce new offspring, a probabilistic model is learned from explored solutions and new solutions are sampled from this model. Algorithm 2: Estimation of Distribution Algorithm 1: Create initial population P0 2: Evaluate initial population P0 3: while termination criterion not reached do 4: Select a subset of the current population P̂i ⊂ Pi 5: Estimate the probability distribution of the subset P̂i : pi+1 (x), where x ∈ P̂i 6: Sample the probability distribution pi+1 (x) to generate Pi+1 7: end while The general scheme of an EDA can be observed in Algorithm 2. In the step 5 of the algorithm, it is necessary to estimate the probability distribution pi+1 (x), where x is an individual of the population. In general, the genome of an individual contains values for a set of variables. Therefore, x = (x1 , x2 , x3 . . . ) and pi+1 (x) = pi+1 (x1 , x2 , x3 . . . ). The complexity of computing the joint probability distribution (x1 , x2 , x3 . . . ) increases, in the worst case of dependency among all the variables, exponentially depending on the number of variables of x. To avoid such an expensive computational cost, Estimation of Distribution Algorithms use a Probabilistic Graphical Model (PGM). The use of a PGM reduces the computing time of the joint probability distribution in exchange for estimating that distribution by means of a conditional causal model among the variables, based on a dependency/causality graph. As a result of this assumption, a simplified distribution is actually computed as an approximation to the real joint distribution. Graphically, a PGM is an acyclic directed graph. Each node of the graph represents a variable and each arc a conditional dependency between variables. Figure 2.1 shows an example of a PGM..

(46) 20. CHAPTER 2. METAHEURISTICS. Figure 2.1: Example of a graphical model for x = (A, B, C, D) In principle, to compute the joint probability distribution of x = (A, B, C, D), Equation 2.3 should be used. p(A, B, C, D) = p(A|B, C, D) ∗ p(B|C, D) ∗ p(C|D) ∗ p(D). (2.3). The computation of this equation would involve calculations with fifteen parameters. However, the PGM in Figure 2.1 shows that conditional independence among certain variables can be considered. Consequently, the computation of the joint probability distribution could be simplified, as shown in Equation 2.4, with just eight parameters. p(A, B, C, D) = p(A|C, D) ∗ p(B|D) ∗ p(C) ∗ p(D). (2.4). Using a PGM is not always possible, as it deeply depends on the domain of the problem and, so, on the representation used to encode the variables in the individuals. If those variables are discrete, then Bayesian Networks should be used [Pea88]. On the other hand, if the variables are continuous, Gaussian Networks should be used instead [SK89]. Some approaches allow mixed variables, such as the Mixed-Integer Bayesian Optimization Algorithm (MIBOA) [ELZ+ 08].. 2.2.1. Learning Heuristics. An important aspect of Estimation of Distribution Algorithms is how the structure of the PGM, i.e., the dependency among variables, is generated. Without any specific knowledge of the problem, the only way for defining these dependencies is by means of statistical analysis. This process is known as structure learning and several different methods exist for this purpose. Some of the most usual heuristics for the structure learning phase will be briefly reviewed in the next sections. 2.2.1.1 No Interdependencies Model This is the simplest model, in which independence among variables is assumed. From the point of view of the graphical model, it means that the graph will have no arcs. Therefore, the joint probability.