Proposals for efficient management of fpgas within cloud computing environments

Texto completo

(1)Proposals for Efficient Management of FPGAs within Cloud Computing Environments. Julio Ricardo Proaño Orellana Advisors: PhD. Blanca Caminero PhD. Carmen Carrión Department of Computing Systems University of Castilla La-Mancha. This dissertation is submitted for the degree of Doctor of Philosophy April 2017.

(2)

(3) I would like to dedicate this thesis to my loving parents Marcia and Darwin and my brothers Ely and Fernando. . ..

(4)

(5) Acknowledgements. It is very hard for me to name everyone that helped me to complete this work. Thus, in summary, I would like to thank my supervisors Carmen and Blanca for their most sincere support, dedication, help, and advice during the development of all the PhD process. I thank them for all the tips, good wishes, suggestions, reviews and other kind thoughts, and also for all the moments we have shared together. I will never forget all the things they have done for me. I am truly grateful. This work would not have been possible without their support. I also would like to acknowledge Fernando Rincón and, Julio Dondo from the ARCOS research team at the University of Castilla-La Mancha for all the FPGA support they provided, and Javier Conejero for his help at the initial stages of this work. Thanks to my colleagues at the Research Institute of Informatics (I 3 A), and, in particular, thanks to all the RAAP research team. Also, thanks to all the UCLM lecturers of the ESII for the opportunity to share this wonderful experience. I would like to give special thanks to Peter Jackson for helping me in the writing of this document. I must not forget to mention Erik Elmroth, Johan Tordsson, Luis Tomás and Selome Kontestinos and all the people that I met during my research stay at Umeå University in Sweden. Some of their ideas are included in this work. Finally, I would like to say special thanks to Olalla and her family for all their help, patience and for supporting me in the entire process of this life-adventure. Moreover, I would like to acknowledge my parents and my brothers for their trust, support and their unconditional love..

(6)

(7) Agradecimientos. Es muy difícil para mí nombrar a todas las personas que me ayudaron a culminar este trabajo. En resumen, quiero agradecer a mis supervisoras Carmen y Blanca todo su apoyo, ayuda, dedicación y tiempo durante el desarrollo de todo el proceso de investigación. Gracias por todos los consejos, buenos deseos, sugerencias, comentarios entre otros gestos de amabilidad y también por todos los momentos que hemos compartido. Este trabajo no hubiera sido posible sin su colaboración. También quiero agradecer a Fernando Rincón, Julio Dondo y a todas las personas del equipo de investigación de ARCOS de la Universidad de Castilla-La Mancha (Ciudad Real) todo el apoyo técnico que nos han brindado en cuanto a la gestión y manejo de la “FPGA”. Gracias a mis compañeros del Instituto de Investigación de Informática de Albacete (I 3 A). En particular, al equipo de RAAP y a todo el personal de asistencia técnica. Gracias a todos los profesores de la ESII por la oportunidad de compartir esta maravillosa experiencia. En particular, agradezco a Peter Jackson su ayuda con la escritura de este documento. No debo olvidar mencionar a Erik Elmroth, Johan Tordsson, Luis Tomás, Selome Kontestinos y a toda la gente que conocí durante mi estancia de investigación en la Universidad Umeå de Suecia. Algunas de sus ideas están incluidas en este trabajo. Por último, gracias a Olalla y a toda su familia por su ayuda, paciencia y apoyo en esta aventura de vida. Además, quisiera agradecer a mis padres y a mis hermanos su confianza, apoyo y amor incondicional..

(8)

(9) Abstract. The popularity of Cloud computing services has dramatically increased in the last few years. As a consequence, this demand has caused the development of large data centre that consume lots of energy. In order to tackle this problem, Cloud providers have become interested in introducing more power-efficient resources into their infrastructures. In this way, more and diverse clients can be serviced, without impacting power consumption negatively. One example is FPGAs (Field-Programmable Gate Arrays), which exhibit an excellent performance/energy consumption ratio. Moreover, they can provide significant business value in Cloud environments due to their enormous computing capacity with predictable latency and higher parallelism level. However, their introduction and management is not a trivial task. The aim of this thesis is to use FPGA accelerators to improve QoS in Cloud platforms and reduce their energy cost. To this end, the thesis presents several proposals focused on the integration and efficient management of FPGAs as accelerators in Cloud platforms. The motivation is not only to enable their use by clients as another infrastructure resource, following an IaaS (Infrastructure as a Service) model. We also aim to get the best from them when used to support applications delivered as a service by the Cloud provider, that is, when used to support a SaaS (Software as a Service) model. In this case, the goals pursued are related to QoS (Quality of Service) support, with a positive side effect on energy consumption. First, we designed a novel architecture in which Cloud clients decide if they need to use FPGAs and for how long when purchasing their access to computational resources. This idea.

(10) x is very similar in concept to the Amazon’s F1 instance [10] family (virtual machines which include an FPGA together with the software needed to program them), presented by Amazon in late November 2016 and currently in a preview state. The next goal, providing efficient support to QoS for SaaS offerings, involves making the system responsible for matching the best resources to each client request. Allocation and scheduling algorithms based on classification and prediction models have been developed and integrated into our architecture to tackle this problem. Results show that the smart use of FPGAs integrated with conventional computational resources leads to a higher percentage of clients serviced with their QoS fulfilled. Furthermore, the proposal also contributes to the reduction of data centres’ energy footprint. These proposals have been implemented in a real proof-of-concept prototype, which includes two servers and only one FPGA. A use case based on an image processing service has been used for the evaluation. In order to extrapolate the benefits of adding FPGAs to the system without increasing money expenses, we have developed a simulation tool to study the impact on performance and energy consumption of scaling the number of FPGAs in the system. This simulation tool is based on statistical models of processing time and power consumption, and it was statistically validated against previous experiments. Results confirm a positive trend in the ratio of successfully serviced user requests while also improving energy consumption. As a result of the work conducted on the research visit, we have established that additional benefits can be provided by deploying a time division multiplexing strategy among the client requests which need the FPGA as an accelerator. Moreover, a strategy has been developed that uses both the applications’ performance and deadlines to control the assignment of FPGAs to applications that would consume the most energy. Furthermore, an optimizer aimed at handling the remaining VMs in the server (those which run workloads which do not use the FPGA), through vertical scaling and CPU frequency adjustments, has been integrated into the proposed architecture. In this way, the power consumption considering the entire Cloud system is further optimized. Results confirm that energy savings are possible while maintaining application performance..

(11) Resumen. La popularidad de los servicios de computación en la nube ha aumentado drásticamente en los últimos años, lo que ha provocado el desarrollo de grandes centros de datos que consumen mucha energía. Con el fin de abordar este problema, los proveedores de Cloud se han interesado en el uso de recursos energéticamente más eficientes. De este modo, se puede atender a más clientes sin que ello afecte negativamente al consumo de energía. Un ejemplo es la introducción de dispositivos FPGA (Field-Programmable Gate Arrays) en los centros de datos. Estos dispositivos exhiben una excelente relación rendimiento / consumo de energía y además proporcionan un valor de negocio significativo gracias a su gran capacidad de cómputo con una latencia predecible y a su alto nivel de paralelismo. Sin embargo, su integración y gestión es aún un reto para la investigación. El objetivo de esta tesis es integrar y gestionar dispositivos (FPGA) como aceleradores de código mejorando así la calidad de servicio en plataformas Cloud y reduciendo significativamente su consumo energético. En otras palabras, la motivación no es solomente permitir el uso de FPGAs por los clientes como otro recurso más de infraestructura, sino aprovechar al máximo sus características. Para ello, en este trabajo se presentan varias propuestas. En primer lugar, hemos diseñado una arquitectura novedosa que integra y ofrece a los clientes FPGAs como infraestructura de Cloud Computing como un servicio bajo demanda. En este caso, los clientes deciden si necesitan usar FPGAs y por cuanto tiempo. De forma similar a las instancias F1 ofrecidas por Amazon [10]. En segundo lugar, la arquitectura de infraestructura como servicio es extendida para proveer soporte de aplicación como servicio. En esta extensión el sistema es responsable en.

(12) xii la asignación de los mejores recursos para cada solicitud del cliente. El criterio de asignación está basado en algoritmos de planificación que usan modelos estadísticos predictivos. Esta propuesta se ha implementado sobre un prototipo real. Los resultados obtenidos muestran que el uso inteligente de FPGAs integrados con recursos computacionales conduce a un mayor porcentaje de clientes atendidos manteniendo la calidad de servicio. Además, la propuesta también contribuye a la reducción de la huella energética del centro de datos. Las propuestas anteriores sólo incluyen el uso de una FPGA. Con el fin de extrapolar los beneficios de escalar FPGAs al sistema sin aumentar los gastos de dinero inherentes a la adquisición de infraestructura, hemos desarrollado una herramienta de simulación para estudiar el impacto en el rendimiento y el consumo de energía. Esta herramienta se basa en modelos de tiempo de procesamiento y consumo de energía. Los resultados confirman una tendencia positiva en la proporción de solicitudes de los clientes atendidas con éxito mientras se mejora el consumo de energía. Finalmente, como resultado de una estancia de investigación, hemos identificado que el uso de una estrategia de multiplexación por división de tiempo entre las solicitudes de los clientes que necesitan FPGA como un acelerador puede proporcionar beneficios adicionales. Para complementar el trabajo de investigación presentamos el desarrollo y evaluación de un optimizador destinado a gestionar las máquinas virtuales restantes en el servidor (las que ejecutan cargas de trabajo y que no utilizan el FPGA). Este sistema utiliza un escalado vertical de CPUs y un ajuste de frecuencia en función de la carga aplicada al sistema. De esta forma, se optimiza el consumo de energía considerando todo el sistema Cloud. Los resultados confirman que es posible ahorrar energía al tiempo que se mantiene el rendimiento de la aplicación..

(13) Table of contents. List of figures. xvii. List of tables. xxi. 1. Introduction. 1. 1.1. Cloud Computing challenges . . . . . . . . . . . . . . . . . . . . . . . . .. 1. 1.1.1. The energy challenge . . . . . . . . . . . . . . . . . . . . . . . . .. 3. 1.1.2. Heterogeneity and its management challenges . . . . . . . . . . . .. 4. Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 5. 1.2.1. The increase of Cloud computing services . . . . . . . . . . . . . .. 5. 1.2.2. Power consumption concerns . . . . . . . . . . . . . . . . . . . .. 6. 1.2.3. The energy efficiency perspective . . . . . . . . . . . . . . . . . .. 7. 1.2.4. The introduction of FPGAs into the Cloud . . . . . . . . . . . . . .. 8. 1.3. Proposals: a summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 9. 1.4. Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 10. 1.5. Organization of this document . . . . . . . . . . . . . . . . . . . . . . . .. 11. 1.2. 2. Background and Related Work. 13.

(14) xiv. Table of contents 2.1. 2.2. 2.3. 2.4 3. Cloud computing and virtualization overview . . . . . . . . . . . . . . . .. 13. 2.1.1. Cloud definition . . . . . . . . . . . . . . . . . . . . . . . . . . .. 14. 2.1.2. Cloud delivery models . . . . . . . . . . . . . . . . . . . . . . . .. 15. 2.1.3. Cloud deployment models . . . . . . . . . . . . . . . . . . . . . .. 16. 2.1.4. Cloud architecture . . . . . . . . . . . . . . . . . . . . . . . . . .. 19. QoS and resource management in Cloud . . . . . . . . . . . . . . . . . . .. 22. 2.2.1. Resource management . . . . . . . . . . . . . . . . . . . . . . . .. 23. 2.2.2. Resource scheduling techniques . . . . . . . . . . . . . . . . . . .. 25. Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 28. 2.3.1. Heterogeneous resource scheduling . . . . . . . . . . . . . . . . .. 28. 2.3.2. FPGAs within Cloud environments . . . . . . . . . . . . . . . . .. 30. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 32. FPGAs in IaaS and SaaS. 35. 3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 35. 3.2. Target Cloud environment . . . . . . . . . . . . . . . . . . . . . . . . . .. 36. 3.3. The Management layer . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 38. 3.3.1. The Hardware Acceleration Manager (HAM) . . . . . . . . . . . .. 39. 3.3.2. The Catalog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 42. 3.4. Providing IaaS with HAM . . . . . . . . . . . . . . . . . . . . . . . . . .. 43. 3.5. Providing SaaS with HAM . . . . . . . . . . . . . . . . . . . . . . . . . .. 47. 3.5.1. HAM architecture for providing SaaS . . . . . . . . . . . . . . . .. 48. 3.5.2. Implementation details . . . . . . . . . . . . . . . . . . . . . . . .. 49. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 54. 3.6.

(15) xv. Table of contents. 3.7 4. Testbed setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 55. 3.6.2. Cloud Services . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 57. 3.6.3. Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 58. 3.6.4. Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . .. 59. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 67. FPGA Scalability Study. 69. 4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 69. 4.2. Prototyping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 70. 4.2.1. Data collection for modelling . . . . . . . . . . . . . . . . . . . .. 70. 4.2.2. Processing time estimation model . . . . . . . . . . . . . . . . . .. 72. 4.2.3. Energy estimation model . . . . . . . . . . . . . . . . . . . . . . .. 73. 4.2.4. Model validation . . . . . . . . . . . . . . . . . . . . . . . . . . .. 74. Simulation tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 76. 4.3.1. HECCoSIM: Simulator Architecture . . . . . . . . . . . . . . . . .. 76. Scalability Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 80. 4.4.1. Simulation setup . . . . . . . . . . . . . . . . . . . . . . . . . . .. 80. 4.4.2. Evaluation of the study . . . . . . . . . . . . . . . . . . . . . . . .. 81. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 85. 4.3. 4.4. 4.5 5. 3.6.1. Node Level Scheduling. 87. 5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 88. 5.2. Target architecture and assumptions . . . . . . . . . . . . . . . . . . . . .. 89. 5.3. Node level scheduling algorithm . . . . . . . . . . . . . . . . . . . . . . .. 91.

(16) xvi. Table of contents 5.3.1 5.4. 5.5. 5.6 6. Implementation Details . . . . . . . . . . . . . . . . . . . . . . . .. 92. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 95. 5.4.1. Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 95. 5.4.2. Scheduling criteria comparison and metrics . . . . . . . . . . . . .. 97. 5.4.3. Evaluation of coarse-grained scheduling . . . . . . . . . . . . . . .. 97. 5.4.4. Evaluation of fine-grained scheduling . . . . . . . . . . . . . . . . 100. Extended architecture with energy optimization . . . . . . . . . . . . . . . 105 5.5.1. Heterogeneous Power-aware Controller (HePaC) . . . . . . . . . . 106. 5.5.2. HePaC implementation details . . . . . . . . . . . . . . . . . . . . 107. 5.5.3. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113. Conclusions, Contributions and Future Work 6.1. 6.2. 115. Conclusions and Contributions . . . . . . . . . . . . . . . . . . . . . . . . 115 6.1.1. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115. 6.1.2. Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117. Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.2.1. Journal papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119. 6.2.2. International conferences . . . . . . . . . . . . . . . . . . . . . . . 121. 6.2.3. National conferences . . . . . . . . . . . . . . . . . . . . . . . . . 123. 6.2.4. Other contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 124. 6.3. Research collaborations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125. 6.4. Funds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125. 6.5. Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126.

(17) Table of contents. xvii. Bibliography. 129.

(18)

(19) List of figures. 2.1. Cloud computing environment . . . . . . . . . . . . . . . . . . . . . . . .. 16. 2.2. Private Cloud. Source: http://whatiscloud.com/cloud_deployment_models/. 17. 2.3. Public Cloud. Source: http://whatiscloud.com/cloud_deployment_models/ .. 17. 2.4. Community Cloud. Source: http://whatiscloud.com/cloud_deployment_models/ 18. 2.5. Virtual environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 18. 3.1. General architecture overview . . . . . . . . . . . . . . . . . . . . . . . .. 36. 3.2. Components of the HECCO component . . . . . . . . . . . . . . . . . . .. 39. 3.3. HAM architecture details . . . . . . . . . . . . . . . . . . . . . . . . . . .. 40. 3.4. The deployment of the virtual environment . . . . . . . . . . . . . . . . . .. 45. 3.5. Attaching an FPGA to a VM . . . . . . . . . . . . . . . . . . . . . . . . .. 46. 3.6. Loading a bitstream to an FPGA . . . . . . . . . . . . . . . . . . . . . . .. 46. 3.7. Detaching an FPGA and stopping a VM . . . . . . . . . . . . . . . . . . .. 47. 3.8. The Job Mapper Controller components . . . . . . . . . . . . . . . . . . .. 49. 3.9. Cloud Computing Environment . . . . . . . . . . . . . . . . . . . . . . . .. 56. 3.10 Percentage of accepted requests in comparison with Standard clients . . . .. 60. 3.11 Service level compliance comparison with Standard clients . . . . . . . . .. 61.

(20) xx. List of figures 3.12 Distribution of Requests Allocation for FFT service with “Standard” clients. 62. 3.13 Distribution of Requests Allocation for ICS service with “Standard” clients. 62. 3.14 Energy Consumption Comparison for Standard clients . . . . . . . . . . .. 63. 3.15 Percentage of accepted requests comparison with Real-Time clients . . . .. 64. 3.16 Service level compliance comparison with Real-Time clients . . . . . . . .. 64. 3.17 Distribution of Requests Allocation for FFT service with “Real-time” clients 65 3.18 Distribution of Requests Allocation for ICS service with “Real-time” clients. 65. 3.19 Energy consumption comparison with Real-Time clients . . . . . . . . . .. 66. 4.1. Processing time for CPU . . . . . . . . . . . . . . . . . . . . . . . . . . .. 74. 4.2. Processing time for FPGA . . . . . . . . . . . . . . . . . . . . . . . . . .. 75. 4.3. Heterogeneous Cloud Simulator blocks diagram . . . . . . . . . . . . . . .. 77. 4.4. Time slots reservation example . . . . . . . . . . . . . . . . . . . . . . . .. 78. 4.5. Energy estimation example . . . . . . . . . . . . . . . . . . . . . . . . . .. 79. 4.6. Energy consumption for different arrival rates (request-per-minute) with only standard clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4.7. 82. Energy consumption for different arrival rates (request-per-minute) with 10% of real-time clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 82. 4.8. Utilization of the system for standard clients . . . . . . . . . . . . . . . . .. 83. 4.9. Utilization of the system for real-time clients . . . . . . . . . . . . . . . .. 83. 4.10 Percentage of accepted requests for different arrival rates (request-per-minute) with only standard clients . . . . . . . . . . . . . . . . . . . . . . . . . . .. 84. 4.11 Total time for processing the total requests for real-time clients . . . . . . .. 85.

(21) xxi. List of figures 4.12 Percentage of accepted requests for different arrival rates (request-per-minute) with 10% of real-time clients . . . . . . . . . . . . . . . . . . . . . . . . .. 85. 4.13 Total time for processing the total requests for standard clients . . . . . . .. 86. 5.1. Target Cloud architecture . . . . . . . . . . . . . . . . . . . . . . . . . . .. 89. 5.2. Division of a task into chunks . . . . . . . . . . . . . . . . . . . . . . . . .. 90. 5.3. Scheduler by chunks . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 92. 5.4. Node Level Scheduler architecture details . . . . . . . . . . . . . . . . . .. 93. 5.5. FPGA Synchronization Service Example . . . . . . . . . . . . . . . . . . .. 94. 5.6. Energy (KJoules) per number of fulfilled SLAs. . . . . . . . . . . . . . . .. 99. 5.7. Number of successfully processed frames per second. . . . . . . . . . . . .. 99. 5.8. Energy (KJoules) per number of fulfilled SLAs (fine-grained). 5.9. Number of successfully processed frames per second (fine-grained) . . . . . . . . 102. . . . . . . . . . . 101. 5.10 Energy (KJoules) per number of SLAs fulfilled (optimized fine-grained) . . . . . 103 5.11 Number of successfully processed frames per second (optimized fine-grained) . . 104 5.12 Number of frames processed frames belonging to unsuccessful SLAs (wasted)104 5.13 Target Cloud environment with heterogeneous hardware architecture . . . . 107 5.14 General architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.15 Example of resource allocation for video applications on runtime (frequency behaviour). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.16 Example of resource allocation for video applications on runtime (cores assignation). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.17 Example of resource allocation for video applications on runtime (FPGA assignation). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.

(22)

(23) List of tables. 3.1. VID input parameters example . . . . . . . . . . . . . . . . . . . . . . . .. 42. 3.2. Example of the parameters for IaaS requests (Template A) . . . . . . . . .. 44. 3.3. Example of parameters for SaaS requests (template) . . . . . . . . . . . . .. 50. 3.4. Notation and parameters used in the Classification algorithm . . . . . . . .. 52. 3.5. Cloud environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 55. 3.6. Percentage of energy wasted for “Standard” clients . . . . . . . . . . . . .. 66. 3.7. Percentage of energy wasted for “Real-time” clients . . . . . . . . . . . . .. 67. 4.1. Input dataset (minutes) . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 71. 4.2. Frames-per-second ratio values (N(µr , σr )) for ICS application . . . . . . .. 73. 4.3. Power max and min values . . . . . . . . . . . . . . . . . . . . . . . . . .. 74. 4.4. Prediction accuracy of processing time for running ICS application in CPU and FPGA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 76. 5.1. Cloud environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 95. 5.2. Datasets (minutes) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 96. 5.3. Energy (KJoules) for different scheduling criteria . . . . . . . . . . . . . .. 97. 5.4. Percentage of fulfilled SLAs for different scheduling criteria . . . . . . . .. 98.

(24) xxiv. List of tables. 5.5. Energy (KJoules) for different scheduling criteria (fine-grained) . . . . . . 100. 5.6. Percentage of fulfilled SLAs for different FPGA assignation criteria (finegrained) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100. 5.7. Energy (KJoules) for different scheduling criteria (optimized fine-grained) . 103. 5.8. Percentage of SLAs fulfilled for different FPGA assignation criteria (optimized fine-grained) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103. 5.9. Summary of Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106. 5.10 Percentage of improvement . . . . . . . . . . . . . . . . . . . . . . . . . . 112.

(25) List of Algorithms 1. Classification algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 52. 2. Scheduling algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 53. 3. Node Level Scheduling Algorithm . . . . . . . . . . . . . . . . . . . . . .. 93.

(26)

(27) Chapter 1 Introduction Quality of Service (QoS) support is an open challenge of today’s cloud computing environments, together with energy consumption concerns. Heterogeneous resources can provide a solution to address these challenges. This is the main motivation of the research work presented in this thesis. This chapter briefly details the context for this research. In addition, the specific objectives pursued are described. Finally, we outline the organization of this document.. 1.1. Cloud Computing challenges. The Cloud computing paradigm has attracted the interest of both academia and industry. The Cloud providers industries with a feasible way to achieve greater flexibility and agility in their Information Technology (IT) infrastructure. However, there are still major obstacles for this technology which are the focus of many new academic research projects. Efficient resource management [132], energy consumption [17], security, privacy concerns [28, 113, 200], standardization [7, 47], heterogeneity [47, 149, 44] are just few examples. In this work, we have focused on heterogeneity and energy consumption issues. In particular, we address the issue of the integration and management of devices that exhibit a better performance per watt ratio. The motivation is to harness the computational power-efficient capacity of these.

(28) 2. Introduction. elements to provide QoS support in an efficient way. Before explaining our proposals for addressing these issues, we introduce the Cloud architecture considered in this work. Cloud computing environments are composed of two main actors: providers and clients. Providers are owners of a pool of resources, and offer them as a service to clients on a pay-as-you-go model, while clients seek computational resources in elastically and without any operational costs. Generally speaking, the interaction between these two actors follows a simple protocol. Firstly, clients send requests to have access to providers’ resources. Once the providers have received these requests, they select and assign the necessary set of resources to meet the client requests. Next, clients can use the assigned resources to run their applications. In most cases, providers use Virtual Machines (VMs) for encapsulating and easily allocating the client requests in the infrastructure [99]. Thus, clients only need access to these VMs for running their applications, developing their applications or using applications offered by providers as a service. Finally, when clients have finished using the resources, they are released. Thus, the resources can be reassigned to other clients. Providers are responsible for ensuring the level of quality of the services offered such as the availability of the services, application performance, or latency. In order to ensure that clients receive the expected QoS, both clients and providers negotiate a Service Level Agreement (SLA). The SLA may feature different types of attributes, such as performance, availability or, utility, in order to describe a level of service for a client. Within an SLA, each of these parameters has an associated compliance level, known as a Service Level Objective (SLO). In order to guarantee the agreements, service providers must measure and monitor relevant metrics because a violation of these levels may derive in financial penalties for Cloud providers. Clients and providers pursue different objectives. Clients are interested in fulfilling their tasks as soon as possible with the minimal cost. In contrast, providers are looking to generate as much profit as possible with minimum investment and keeping an acceptable Quality of Service (QoS). However, attending too many clients can mean an important increment in infrastructure provisioning and, consequently, more power consumption. On the other hand,.

(29) 1.1 Cloud Computing challenges. 3. due to the variability of the Cloud environments, it is complicated to deliver a particular level of QoS for each client [194]. Based on this context, a number of challenges related to QoS have to be resolve by providers. We now present an overview of some of these challenges.. 1.1.1. The energy challenge. While infrastructure investment is a major expense for providers, they also face operational expenses related to power consumption and cooling. For example, in [68] the authors show that Amazon EC2 data centres generate 42% of the total budget related to energy costs, of which 19% is direct power consumption, and 23% is for cooling. Additionally, power consumption and cooling are parameters directly related to the Cloud workload. This workload can show spikes and peaks throughout the day. Consequently, providers may incur SLA violations or performance degradation. As a possible solution, some providers tend to over-dimension their infrastructure. However, this can result in a waste of energy and money because the energy is used for keeping the physical resources in idle status. In fact, according to the McKinsey & Company report published by the New York Times [64], on average 88% of the electricity power of data centres is used only to keep servers idling or ready in case of a possible activity, and only 12% is used for computing actions. Thus, not only should providers be concerned about investment in new infrastructure but they should also find mechanisms for using their resources efficiently. Different strategies can contribute to addressing the energy issue such as the use of more efficient hardware in the infrastructure [38], and efficient resource management [172]. In this work, we focus on the integration of Field-programmable Gate Arrays (FPGAs) because these devices offer a set of features, such as good performance per watt ratio, a higher level of parallelism, and elasticity, which fit perfectly in a Cloud setting. However, it is not simple to manage FPGAs as a service, while also getting the best from them..

(30) 4. Introduction. 1.1.2. Heterogeneity and its management challenges. Cloud computing offers an attractive workspace for different users such as scientists, developers, students and, companies. An increasing number of users (clients) see the Cloud as a way to execute their applications more rapidly and avoiding operational costs. Frequently, some applications assume a set of homogeneous Cloud resources. However, some applications demand different requirements such as a higher level of parallelism, real-time performance or low latency which cannot be completely satisfied by homogeneous resources. A possible strategy to address this issue is adding heterogeneity. This heterogeneity can be understood as a proper integration of different types of elements with different features to improve the Cloud computing services. The advantage of using heterogeneous resources is that they allow us to use specialized, even customizable, devices to optimize the use of resources for specific kinds of applications. Heterogeneous systems have more potential to address throughput and energy efficiency for the different types of workloads by matching a certain type of resources to each particular need. Today’s data centrers include different types of devices in their infrastructures. For example, there are processors such as ARM [133] or ATOM [1], which exhibit a better energy-efficiency ratio in comparison with traditional server processors. Moreover, new components such as Solid State Drive (SSD) or Phase Change Random Access Memory (PCRAM) are also used because of their lower latency. Newer network architectures and various accelerators such as the Graphic Processor Unit (GPU), FPGA are becoming more popular in servers to speed up the execution of certain types of applications [33]. Providers can use devices such as FPGAs to enhance their services by maximizing resource utilization and cost-performance. FPGAs contain thousands of programmable logic blocks that can be interconnected in different configurations to perform complex computational functions such as encryption [69], video processing [121], or genomics [127] applications which generate large data sets requiring a vast amount of computing resources to be processed [75]. These types of applications would greatly benefit from the use of such massive parallelism. Another interesting use case would be applications that do not.

(31) 1.2 Motivations. 5. process such immense data sets but exhibit near real-time performance requirements. These applications can be found in finance, where traders would obtain a competitive advantage if they could compute the models for complex trading options faster [152]. Another example is, medical image processing applications [5] where FPGAs could be used to offer quick response time in an emergency room scenario. However, utilization them efficiently and, furthermore, in a Cloud environment is still an open challenge. The allocation, scheduling, monitoring and control of heterogeneous resources in a Cloud is not a trivial task [17]. Thus, when a client wants to use a resource, the provider must not only be aware of the availability of resources to satisfy the request but should also carefully select the best suitable resource at the right time for the right application. In addition, some devices are not designed to be used in a virtual environment. Thus, the integration of these types of devices in order to get the best from them is challenging because the idea of using FPGAs in a virtualized environment is a relatively recent development. Moreover, the deployment of complex applications is more difficult than traditional applications that run only in CPUs and GPUs. Thus, the management of these devices is key for Cloud computing environments.. 1.2. Motivations. 1.2.1. The increase of Cloud computing services. Cloud computing has been grown significantly in recent decades. Many IT experts argue that the key factor that has enabled Cloud computing to evolve is universal Internet access. In fact, in June 2016, 49.2% of the world population were Internet users, according to Internet World Stats.. 1. In particular, regions such as Africa, Middle East, and Latin America / Caribbean. have seen the highest growth rates (7415%, 3936%, 2029%) in Internet users during the period 2000-2016. As a result, the popularity of using Cloud computing services has also increased considerably in recent years. For example, Amazon EC2 had more than one million 1 see. url:http://www.internetworldstats.com/stats.htm.

(32) 6. Introduction. people using its Cloud services in 2014, according to a Bloomberg report.. 2. In the near. future, according to Statista website, 3 it is expected that approximately 3.6 billion Internet users will be Cloud computing clients in 2018.. 1.2.2. Power consumption concerns. The high demand for Cloud computing services has caused the development of enormous data centres that consume a great amount of energy. In fact, according to the Natural Resource Defense Council (NRDS), 4 in 2013 U.S. data centres consumed an estimated 91 billion kilowatt-hours of electricity, and for 2020 it is expected to increase to 140 billion kilowatt-hours annually. Thus, concerns on global heating and how data centres contribute to it have made energy efficiency a subject of intense debate, not only for economic reasons (i.e., lowering the electricity bill) but also for ecological reasons (Green IT concept). Fortunately, a new report from June 2016 released by the U.S. Department of Energy’s Lawrence Berkeley National Laboratory [157] shows that efficient computing strategies could cut data centrer energy 45% percent by 2020. This report mentions that efficiency improvements in storage, network, and infrastructure also influence power consumption. Additionally, the report highlights that the most significant infrastructure impact is observed in hyperscale data centres that are designed to maximum infrastructure efficiency. Finally, as a future challenge, the report mentions that a management system that allows distributing workloads through various types of servers with specialized associated hardware will facilitate energy saving associated with optimizing hardware for specific workloads. Given this context, finding a strategy for energy-efficient computing is crucial for making Cloud computing sustainable. 2 see. url:http://www.bloomberg.com/news/2014-11-14/5-numbers-that-illustrate-the-mind-bending-size-ofamazon-s-cloud.html 3 see url:https://www.statista.com/statistics/321215/global-consumer-cloud-computing-users/ 4 https://www.nrdc.org/resources/americas-data-centers-consuming-and-wasting-growing-amounts-energy.

(33) 1.2 Motivations. 1.2.3. 7. The energy efficiency perspective. Power efficiency in data centres is usually measured using the Power Usage Effectiveness (PUE) metric [118]. PUE measures the ratio between the power usage of the computing resources and the power usage of the whole facility (that is, including cooling). Thus, much effort is focused on minimizing the power used to cool data centres [146, 115], by organizing racks into hot-cold aisles, use of free cooling techniques, or fine-tuning the speed of fans, among other methods [137]. Moreover, other strategies are aimed at lowering the power consumption of the IT equipment itself, such as adjusting the voltage supplied to servers according to their workload [66] or even creating specific hardware designs [61]. In particular, devices such as FPGAs can help to achieve a good performance over cost trade-off for Cloud providers [9]. Intel CEO Brian Krzanich said that in 2020 up to a third of Cloud computing providers could be using hybrid CPU-FPGA architecture within their server nodes. In fact, Intel plans to integrate its Xeon processor to an Altera FPGA on the same chip package and ramp its production through 2017 [123]. In early 2016, at the GPU Technology Conference Mile Strickland, who directs the compute and storage group at Intel/Altera, said that although FPGAs have the reputation of being expensive, at high volume they are on a par with other accelerators. However, FPGAs exhibit an advantage in terms of performance per watt. In particular, FPGAs offer additional special features that can be exploited by Cloud providers. Some of them are listed below: • FPGAs are a faster order of magnitude for non-floating point operations than state of the art CPUs. Thus, they can contribute to meet stringent Quality of Service (QoS) requirements in an SLA. • FPGAs provide an intrinsic degree of parallelism able to be exploited by certain applications, leading to lower and more predictable response times than those achievable with conventional CPUs..

(34) 8. Introduction • FPGAs exhibit a high computation/power consumption ratio, which makes them an interesting option to lower operational expenses as well as to minimize the carbon footprint of the data centre. • FPGAs can be customized on demand, which allows exploiting most of its resources to support computations directly.. 1.2.4. The introduction of FPGAs into the Cloud. From the industry perspective, several attempts have been made to enable FPGAs in the Cloud. The recently launched Amazon EC2 F1 instance [10] offers clients an Amazon FPGA Image (AFI) in order to use it to deploy their FPGA as a service. The Amazon F1 nodes can have up to eight 16nm UltraScale+ FPGAs by Xilinx that are connected by a dedicated PCIe interface with the 2.3 GHz base-speed Broadwell E5-2686 CPU and four DDR4 channels. However, at this moment F1 instances are only oriented to expert users (clients), because their applications must be written using either VHDL or Verilog programming language. Microsoft released the Catapult project [134] in 2015. In this project, FPGAs are used for accelerating Bing web search engine. Catapult infrastructure is composed not only of FPGAs but also of GPUs or by a mix of them. Another major project that integrates FPGAs within Cloud environment is Nimbix [124]. This company is focused on including FPGAs as part of their supercomputing Cloud to be used by clients for HPC applications, but there has not been such a large public push as with, Amazon EC2 F1. Academia has produced several works, to integrate and manage FPGAs in the Cloud. In [180] the authors use FPGA-based acceleration architecture with the MapReduce [42] framework for resolving genome sequencing problem. However, this approach focuses on the acceleration only of some specific applications without considering power consumption. There are also other proposed frameworks [33, 26, 181] to include virtual FPGAs into the OpenStack Cloud management platform [138]. Despite these approaches attempting to.

(35) 1.3 Proposals: a summary. 9. integrate FPGAs into Cloud, the scheduling in order to find a balance between performance and power consumption issue has not been fully addressed. In light of the above, even when it could sound contradictory because more hardware can mean more power consumption, a question arises: What is the best way to achieve a balance between performance acceleration and power consumption by integrating FPGAs into Cloud environments?. This thesis attempts to provide some answers.. 1.3. Proposals: a summary. In this thesis, we present different proposal to address both heterogeneity and energyefficiency issues from a hardware perspective. Firstly, we focus on the integration of FPGA devices and their management for IaaS support. In this case, we propose an architecture that allows using FPGAs as a service, like Amazon EC2 F1 instances. In summary, providers offer clients a list of VMs with FPGA support. Thus, clients can access FPGAs to run their applications in a transparent way during a period of time. Secondly, we extend the previous IaaS framework, adding more control and management of FPGA devices as accelerators into a Software as a Service (SaaS) Cloud delivery model. In this case, we tackle the problem of FPGA management in which the system automatically has to manage FPGAs through allocation and scheduling strategies to select the most suitable set of resources in order to meet the client requests. These strategies are based on prediction models and classification decisions to foresee the behaviour of the system, and to allow choosing the most suitable set of resources to meet the client request. Furthermore, we carry out a study of FPGA scalability for the SaaS delivery model through a simulation tool to address the energy-efficient issue. This tool is developed by using statistical models. These models are based on previous experiments and profiling techniques. The aim of this study is to measure the effect of the addition of FPGAs into the system in terms of performance and energy consumption..

(36) 10. Introduction Finally, we propose an FPGA fine-grained strategy for further optimizing the use of. FPGAs within a node of a Cloud environment. This proposal consists of discretization of the use of the FPGAs located in the same node through a time division multiplexing technique (TDM). This technique is widely used in telecommunication environments for improving the use of resources. As a complement of this fine-grained proposal, we use a feedback controller that combines the use of FPGAs at fine-grained level with a CPU vertical scaling and dynamic voltage frequency scaling (DVFS) [66] previously proposed in [88] for optimizing the resources that are not using FPGAs.. 1.4. Objectives. In this thesis, we propose different strategies focused on several proposals regarding the integration and management of heterogeneous devices such as FPGAs into the Cloud environment. These proposals consider FPGAs not only to offer them as a new type of resource but also to attempt to get the best of them. Thus, providers can maintain a QoS level while the energy is saved and clients can use heterogeneous Cloud capability as a service with more hardware control. This general objective is achieved through the following goals: • Reviewing approaches in the literature focusing on the integration and management of FPGAs into virtualized environments. • Developing an architecture able to provide access to heterogeneous Cloud IaaS scenario. • Adding more functionality to Infrastructure as a Service (IaaS) framework for supporting SaaS service model with QoS. • Studying the effect of different scheduling algorithms in terms of energy consumption and performance behaviour where FPGAs are included within a virtualized environment, in comparison to conventional Cloud scheduling algorithms..

(37) 1.5 Organization of this document. 11. • Studying the effect of adding more FPGAs in terms of performance and energy consumption through the development of a simulation tool. • Developing an FPGA-aware fine-grained level scheduling strategy on the top of the hypervisor to support dynamic control of FPGA devices as Cloud computational resources. • Evaluating different FPGA-aware scheduling techniques on a real testbed, comparing them to some popular scheduling techniques. • Developing a system that enables the use of both a fine-grained FPGA scheduling and CPU optimizer to improve the energy consumption of the entire Cloud system.. 1.5. Organization of this document. This thesis is organized in six chapters, which are briefly described as follows. Chapter 1. Introduces the most relevant issues that have motivated this research work. In particular, it presents the motivations and the objectives we wish to accomplish. Chapter 2. Presents fundamental concepts required to understand subsequent chapters. Additionally, it reviews the most relevant approaches related to the integration and management of FPGAs into Cloud environments. It begins by discussing general topics regarding the management of resources and Quality of Service, and continues with more specific aspects such as FPGAs in IaaS and SaaS usage models. Chapter 3. Once we have defined the frame of reference of this work, this chapter presents the general architecture of our proposed Heterogeneous Cloud Computing Controller (HECCO), framework, which allows the integration and management of FPGAs in a Cloud environment at cluster level. Firstly, an IaaS proposal is presented. Secondly, the IaaS architecture is extended in order to also support the SaaS usage model. Finally, an evaluation of the SaaS proposal on a real testbed is presented..

(38) 12. Introduction Chapter 4. This chapter presents a study focused on FPGA scalability. It is based on. statistical models that were created through an empirical prototyping strategy. In particular, it shows the process of developing a simulation tool based on processing time and energy models used to study the impact of increasing the number of FPGAs in terms of performance and energy consumption. Chapter 5. Presents an architecture and an FPGA-aware scheduling strategy to manage FPGAs at fine-grained level. In particular, this strategy allows to assign and reassign FPGAs to different VMs located at the same local node by using a Time Division Multiplexing mechanism. Finally, a combination of the adapted FPGA-aware architecture and a dynamic vertical CPU scaling and frequency controller is evaluated. Chapter 6. Concludes this thesis. Also summarizes the contributions, publications achieved. Finally, we discuss future directions of research..

(39) Chapter 2 Background and Related Work In this chapter, we describe a number of concepts and definitions related to Cloud computing and its architecture. We also present an overview of resource allocation and scheduling techniques in heterogeneous environments. All these concepts are necessary to fully understand the proposals presented in the following chapters. Additionally, to put our proposals in context, we discuss relevant related works on the integration and management of FPGAs within IaaS and SaaS Cloud delivery models. The last section presents conclusions.. 2.1. Cloud computing and virtualization overview. Despite the fact that Cloud computing has grown significantly in recent years, this idea is not new. In fact, in 1969, Leonard Kleinrock, who worked in the Advanced Research Projects Agency Network (ARPANET), predicted: “Computer networks are still in their infancy, but as they grow up and become sophisticated, we will probably see the spread of ‘computer utilities’ which, like present electric and telephone utilities, will service individual homes and offices across the country” 1 . In the early 90s, thanks to the change in communication networks, from offering only dedicated point-to-point data circuits to providing virtual private network services, Cloud computing began to grow. In the late 90s, Professor Ramnath Chel1 http://www.lk.cs.ucla.edu/LK/Bib/REPORT/press.html.

(40) 14. Background and Related Work. lapa of Emory University and the University of South California defined Cloud computing as “the new computing paradigm where the boundaries of computing will be determined by economic rationale rather than technical limits alone." 2 In general, two main factors have allowed the development of Cloud computing, namely, the evolution of communication networks and the development of virtualization. Enterprises such as Intel and AMD are currently working at hardware level to improve virtualization performance through the introduction of virtualization extensions into their processors (Intel VT-x, AMD Pacifica) [2, 162]. The first company to use the concept of virtualization for delivering applications as a service via an Internet network was Salesforce [20]. Recently, in the early 2000s enterprises such as Amazon began to exploit their infrastructure as a service and on demand. Thus, Amazon founded a new business opportunity for selling its infrastructure capacity. Cloud computing is embraced by a vast number of companies such as Google, Amazon, Apple, Microsoft, Facebook, etc. who use the economy of scale and Cloud for exploiting large data centres profitably.. 2.1.1. Cloud definition. According to the National Institute of Standards and Technology (NIST) [110], Cloud computing is "a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction." Cloud computing requires a particular set of features to enable the remote provisioning of scalable resources in an efficient manner. These characteristics are: • On-demand: Cloud providers allow self-provisioning resources by clients without human intervention. 2 http://www.eci.com/cloudforum/cloud-computing-history.htm.

(41) 2.1 Cloud computing and virtualization overview. 15. • Ubiquitous access: capacity of Cloud applications to be accessed through a network such as Internet. • Multi-tenancy: Cloud systems for using the same computational resource by different clients (tenants) transparently. This feature enables the dynamic assignment and reassignment of resources according to Cloud demands. • Elasticity and scalability: capacity of dynamically adding and releasing resources according to clients’ requests. • Measured usage: monitoring the Cloud system, not only for billing purposes but also for controlling and optimizing tasks.. 2.1.2. Cloud delivery models. As depicted in Figure 2.1, two main actors can be distinguished in a Cloud computing setting: service consumers (clients) and service providers. On the one hand, providers are responsible for making Cloud services available to clients. On the other hand, clients send requests to use the provider’s infrastructure. These requests are dynamically attended by providers using independent and isolated virtual environments (VMs). Depending on the service, there are three types of Cloud delivery models (also depicted in Figure 2.1); Software-as-a-Service (SaaS), Platform-as-a-Service (PaaS) and Infrastructure-as-a-Service (IaaS). SaaS applications are hosted in the Cloud and can be offered over a network. Some examples of SaaS applications are: Twitter [95], Facebook [53], Flickr [114], Google Docs [177], etc. PaaS are focused on developers, and their aim is to offer a configured and customized environment for building applications. Some examples of PaaS are: Amazon Web Service Elastic Beanstalk [178], Windows Azure [148], Google App Engine [193]..

(42) 16. Background and Related Work. Provider. Consumer. Delivery models SaaS. Client-1. PaaS IaaS. Requests. Resources abstraction. Client-2. Client-n. Resources. Fig. 2.1 Cloud computing environment IaaS provides access to scalable resources (processing elements, storage capacity, network), allowing clients to deploy the infrastructure needed for running their applications. Examples of IaaS are: Amazon AWS [6], Microsoft Azure [148], Rackspace [173],. 2.1.3. Cloud deployment models. Cloud services can be deployed in Public, Private, Community, and Hybrid scenarios. Figure 2.2 shows the infrastructure available only for an organization. This architecture corresponds to a Private Cloud. This type of Cloud allows clients the highest degree of control over resources. In contrast, in a Public Cloud the infrastructure is available for general users, companies in general, as depicted in Figure 2.3. Thus, Public Cloud requires no initial capital investment on infrastructure from clients. However, this type of Cloud does not allow fine-grained control over physical resources. Figure 2.4 shows the infrastructure of a Community Cloud. It is exclusively shared among members of a particular organization. Finally, a Hybrid Cloud is the combination of public and private/community as mentioned above, meaning Hybrid clouds can offer more flexibility. However, the management of a.

(43) 2.1 Cloud computing and virtualization overview. 17. Hybrid Cloud is complex because it is challenging to find the best partition between public and private components.. Fig. 2.2 Private Cloud. Source: http://whatiscloud.com/cloud_deployment_models/. Fig. 2.3 Public Cloud. Source: http://whatiscloud.com/cloud_deployment_models/.

(44) 18. Background and Related Work. Fig. 2.4 Community Cloud. Source: http://whatiscloud.com/cloud_deployment_models/ The Platform Layer. Virtual Infrastructure Manager (VIM) VM-1. VM-2. VM-3. Application. Application. Application. Guest OS-1. Guest OS-2. Guest OS-3. Hypervisor-1. Hardware. The Virtualization Layer. Hypervisor-n. The Hardware Layer. Fig. 2.5 Virtual environment.

(45) 2.1 Cloud computing and virtualization overview. 2.1.4. 19. Cloud architecture. From a distributed computing point of view, Cloud settings follow a layered model composed of three main layers: the hardware layer, the virtualization layer, and the platform layer [196]. An overview of each is provided below.. The hardware layer The Hardware layer consists of pools of physical resources such as physical servers, interconnection elements (routers, switches, etc.), power, and cooling systems. The hardware can be homogeneous or heterogeneous. Physical resources with similar capabilities such as CPUs, memory, network interfaces are considered homogeneous hardware. Heterogeneous hardware is related to resources with different capabilities or features such as accelerators, specific processing elements, and various network architectures. Nowadays, the most widely used devices included in Clouds are GPUs, and more recently, FPGAs. In particular, FPGAs are programmable devices that contain thousands of logic blocks that can be interconnected to develop complex computational functions such as encryption [51], video processing [185], and genomics classification [127]. Nowadays, FPGAs offer a significant improvement to many applications as co-processor elements within standard computing environment [49, 167, 30]. However, the complexity of developing FPGA-based applications in comparison to more traditional software engineering (CPUs or GPUs) poses a large technical barrier. Although the programming of FPGAs can be addressed by using high-level languages such as C/C++ and OpenCL [161, 120, 11, 153] it is still an open challenge. To help developers use FPGAs easily, the industry provides solutions such as Pico Computing [90], Convey [119] and Xillybus [140]. These products connect software to FPGAs via a proprietary interface with their languages and development environments. Furthermore, there are open source approaches such as the Reusable Integration Framework for FPGA Application (RIFFA) proposed by M. Jacobsen et al. in [78]. This framework allows efficient.

(46) 20. Background and Related Work. communication and synchronization elements for FPGA-accelerated applications. It also uses simple interfaces for hardware and software to speed up the development of applications. The PCI Express (PCIe) bus is used for connecting FPGAs to CPUs in RIFFA. However, there are still some barriers for using FPGAs in conventional computing systems such as the large initial investment required for using FPGA solutions, or their underutilization in comparison to CPUs because FPGAs are less suitable for general-purpose applications. Cloud computing offers an appropriate scenario for more efficient exploitation of FPGA capabilities since it allows to reduce initial investment, and resources can be shared between tenants through scheduling strategies that can enhance efficiency.. The virtualization layer The Virtualization layer uses virtualization technology as the key for partitioning physical resources to manage them effectively. The virtualization technology allows the creation of a secure and isolated environment for running different applications without any interference between them. Because physical resources are shared by clients, it is necessary to protect and isolate them. Clients are usually working individually. Providers need to find a strategy to help resource management while maintaining the scalability. Virtualization technology is widely used to deliver resources to clients following the on-demand model. The objective of virtualization is to convert physical resources into virtual ones. Despite the term virtualization being frequently attributed to hardware emulation, this technology provides an efficient way for sharing storage, processing, and networking resources. Moreover, virtualization is fundamental for simplifying resource management, addressing the resource underutilization issue and allowing server consolidation [196, 18]. In Cloud environments, clients may require different OS for their VMs. As a solution, IBM introduced a management level called Virtual Machine Monitor (VMM) (also called hypervisor) [65]. There are different VMMs, the most widely used in Linux systems are Kernel-based Virtual Machine (KVM) [94] and, XEN [14] and in Windows, Hyper-V [176] and, VMware.

(47) 2.1 Cloud computing and virtualization overview. 21. vSphere [179]. In particular, KVM hypervisor uses a loadable kernel module that provides the core virtualization infrastructure, while XEN uses a microkernel that allows to run multiple OS on the same computer hardware concurrently. Despite the fact that virtualization allows an excellent elasticity of the system, there is a performance penalty because the hypervisor multiplexes operating systems and the overheads of context changing are higher than a conventional system without virtualization. There are software solutions to reduce this overhead such as binary translation [3] and paravirtualization [192]. Additionally, companies such as Intel and AMD have developed virtualization extensions for x86 architecture processors as a hardware solution, not only to address performance issues but also due to security reasons. In particular, Intel VT-x [74] and AMD Secure Virtual Machine (SVM) [162] were developed. Recently, Intel and AMD have complemented their architecture by adding support to I/O Memory Management Unit (I/O MMU) that allows VMs to access hardware I/O, such as the PCIe bus, directly. It consists of a mapping device-visible virtual address to physical address-similar to a conventional Virtual Machine Manager (VMM) [1].. The platform layer The virtualization layer operates on a node level basis. However, a Cloud platform often includes a large amount of computing nodes. Thus, an orchestration and management layer is needed at data centre level (cluster level). The Platform layer is composed of operating systems and application frameworks built on top of the virtualization layer. The aim of this layer is to offer clients tuned environments for the development of applications, which are encapsulated into virtual machines (VMs). It also provides management and orchestration of resources within a Cloud environment. The platform layer consists of a pool of Cloud applications on demand that can be easily scaled to achieve better capabilities such as throughput, performance, energy-efficient and.

(48) 22. Background and Related Work. lower operating cost. In a computing distributed environment, the Virtual Infrastructure Manager (VIM) is a platform responsible for building and managing computing infrastructure for public, private and hybrid Clouds. The VIM allows Cloud clients to deploy VM instances that handle different tasks for managing a Cloud environment [60]. In summary, the VIM platform orchestrates storage, network, virtualization, monitoring, and security strategies to deploy multi-tier services as VMs, combining both data centre resources and remote Cloud resources, according to allocation and scheduling policies. The most widely studied in the literature on VIMs are OpenNebula [112], OpenStack [128] and Eucalyptus [126]. An overview of these VIMs and their different capabilities is presented in [159, 126]. There is also a comparative study of Eucalyptus and OpenNebula in [155], while in [45] the authors outline the differences between Eucalyptus, OpenNebula, and OpenStack.. 2.2. QoS and resource management in Cloud. Cloud environments offer their clients unlimited computing resources available on-demand following a pay-as-you-go model [60, 188]. In this model, clients only pay for the resources used, thus avoiding unnecessary expense. From the provider’s point of view, Cloud services need to be reliable, elastic, scalable, and accessible. However, it is impossible to meet all clients’ requirements because of Cloud complexity and unexpected behaviour Thus, to ensure a service level, both clients and providers establish a contract, known as a Service Level Agreement (SLA). A service level is negotiated through QoS parameters such as performance, latency of development, tracking, and monitoring [136]. SLAs also include legal issues such as legal compliance, resolution of conflicts, or responsibilities. SLAs specify financial penalties in the case of non compliance. In Cloud systems, service parameters associated with an SLA, such as response time or throughput, are regularly changed. Thus, it is necessary to define a measurable level of goals in a given period. These metrics or service levels are known as Service Level Objectives (SLOs). According to the Cloud service, an SLA can contain different SLOs in which each of them represents the quantitative value associated with the agreement..

(49) 2.2 QoS and resource management in Cloud. 23. QoS in Cloud is closely related to the resource management of the system. Resource management helps providers to perform actions by depending on criteria such as performance, functionality, and cost. The Cloud environment is a complex system with resources shared through virtualization technology and subject to unpredictable requests. Thus, the resource allocation problem is related to different factors such as the monitoring of resource availability, mapping and tracking QoS requirements, and monitoring client requests. The key elements for the management of resources are the VIMs and hypervisors. As previously outlined, each of these elements acts at different levels. The VIM has an open view of all the system (entire Cloud view), while the hypervisor interacts directly with the physical pool of resources at the node level (local view).. 2.2.1. Resource management. The VIM component needs to take management decisions automatically based on different criteria or policies. In Cloud computing, the most widely explored strategies for resource management include admission control, energy efficiency criteria, modelling strategies, load balancing, SLA-aware mechanisms, and the use of specialized hardware. Admission control is a preventive policy to avoid overloading of resources and SLA violations. This policy is based on previous knowledge about the behaviour of the system according to the load. The admission control can be complicated to implement in dynamic scenarios, and the resources may become under-utilized. However, approaches [169, 170, 116], which use overbooking policies, allow to find a way to minimize SLA violations and resource utilization waste. Energy efficient modelling strategies are related to finding a balance between an acceptable number of fulfilled SLAs and power consumption through statistical models. The TANGO project focuses on exploiting parallelism for delivering higher performance and improving energy efficiency [48]. However, it is only oriented to a Cluster distributed environment..

(50) 24. Background and Related Work Concerning Cloud settings, other works use energy-aware algorithms [21, 17, 16] to. improve the power consumption of Cloud systems. Berl et al. [21], review methods and technologies used for energy-efficient operation based on hardware integration, while Beloglazav et al. [17, 16] propose energy-aware allocation and scheduling policies based on heuristic resource provisioning and evaluated by using CloudSim toolkit [27]. Despite heuristic mechanisms providing a feasible way to avoid waste resources, the proposals do not consider the introduction of more energy efficient hardware such as FPGAs. The aim of load balancing is to distribute the client’s requests throughout the system by migration of VMs according to a particular criterion [71]. For example, if the system detects that a server is overloaded and another is underutilized, the client’s requests to the overloaded server are migrated to that which is underutilized. The load balancing policy can also be used for saving energy [12]. However, a significant amount of migrations can result in network degradation affecting the remainder of the system. There are more sophisticated approaches for load balancing inspired by Honeybee Foraging Behaviour [145], Biased Random Sampling [144], and Active Clustering [125]. The first algorithm is nature-inspired, and it uses a self-organization mechanism for achieving global load balancing through local server actions. The second algorithm is also for self-organization, but is based on random sampling of the whole system domain for achieving a balanced load across all the nodes. The last one relies on restructuring server to optimize job assignment. In SLA-aware strategies [174, 25, 24, 55], the objective is focused on mapping the client requirements to ensure SLA fulfilment with some variability. Workload predictors are used to forecast future workload requirements, and based on these resultant predictions, the decisions are made. In a specialized hardware-aware strategy the client requests are classified into different categories depending on the requirements, and then they are submitted to specific resources. These resources offer some unique features such as best performance and power consumption ratio or higher reliability. For example, if the system is aware that some applications exhibit.

(51) 2.2 QoS and resource management in Cloud. 25. better performance if they run over an accelerator such as GPU, FPGA the system uses this resource for the request. Requests can be allocated in two ways: on demand and in advance reservation. Both scenarios apply techniques such as predictive resource allocation analysis, neural networks [50, 92], genetic algorithms [199, 111, 101], reinforcement learning [15, 73], which can be used to anticipate the relationship between QoS targets and current hardware resource allocation in order to match them appropriately. These approaches use past performance history to make predictions about workload patterns. However, this type of techniques can fail due to two factors. The first one is related to the lack of sufficient previous data information to train the system. The second one is caused by workloads that do not follow any specific distribution. These factors can affect the accuracy of the prediction. In advance reservation, pre-configured actions are defined following certain criteria such as utilization thresholds, hardware availability, energy consumption. Unlike on demand, in these approaches, the management actions are taken before any resource allocation occurs, and predictive techniques can also be used for reserving resources that will be used in advance. However, in long-term reservation, it is difficult to avoid some resources being in idle state. Thus, the system leads to under-provisioning of resources, so increasing execution cost and energy consumption. To tackle this problem, Cloud providers use scheduling strategies to minimize the trade-offs among SLA fulfilled, resource allocation and energy efficient.. 2.2.2. Resource scheduling techniques. Scheduling techniques are designed to find a feasible way to maximize providers’ profit and Return On Investment (ROI) [7]. Thus, provisioning and sharing resources are necessary for the Cloud environment because either an underestimation or overestimation of resources would result in more SLA violations and resource underutilization [46]. The constant expansion of Cloud demand has caused an increase in energy consumption. As a result, Cloud providers are aware of the need to improve not only system performance and QoS but.

(52) 26. Background and Related Work. also energy consumption expenses and environmental issues. In [109] the author suggests that a scheduling strategy should be efficient, fair-enough, smart, and offer minimal starvation. Cloud scheduling algorithms can use queues (a set of tasks for scheduling) to determine the order of resource allocation. These records are ordered according to priorities related to client requirements. The scheduling algorithms select the tasks from the schedule list by following a policy [50]. Then, the algorithms select the resource to allocate the selected task. The schedule list can be constructed either statically or dynamically, and the priorities are defined through criteria (metrics). For dynamic lists, the priority criteria are recomputed on-line (running time). In contrast, for static lists, the priorities are established before task allocation (off-line) [130]. Scheduling techniques can be applied to several layers such as the application layer, virtualization layer, and deployment layer [195]. In the application layer, the aim is to schedule physical or virtual resources for supporting client applications, tasks, or jobs with a QoS. The scheduler in the virtualization layer is responsible for mapping virtual resources into physical resources to find an efficient load balancing and cost effectiveness. In the deployment layer, the scheduler target is focused on multi-cloud environments to support optimal and strategic infrastructure placement [171]. The resource scheduling issue can be expressed in two optimization problems: minimizing cost under deadline constraint or reducing schedule length (makespan) under a budget constraint. In [168] the authors classify the scheduling strategies as performance-related, and cost-based. The scheduling strategies that consider performance as the optimization target are subdivided into First-come, First-serve (FCFS), load balancers, and reliability improvement. Moreover, strategies related to cost are divided into improving overall utilization, maximum profit, minimum operation costs, or combined strategies. In FCFS [4], also known as First-In-First-Out (FIFO), the system automatically processes queued requests by order of arrival without considering any utilization of resources [23]. In contrast, load balance strategies are based on resource use, and the aim is to create a balance of resource use of physical resources. To this end, when a request arrives at the system,.