Por usar sombrero de varita o paja de trigo.

 Por indicación de comisión deportiva.

Artículo 99.- Los Coleadores se colocarán pegados a la barda del

B. De acuerdo a la forma de montar:

XII. Por usar sombrero de varita o paja de trigo.

In the literature, many techniques are used. For example, queuing theory [BFF90], control theory [Åst12], reinforcement learning [Bar98], time-series analysis [Ham94] and rule-based system [HR85]. Techniques will be firstly introduced in this section. Literature will be dis- cussed in next part.

Rule-based method is the simplest but most widely used runtime scaling method. Famous cloud providers like Amazon AWS and RightScale provided rule-based runtime scaling meth- ods. The typical rule-based runtime scaling method defines a couple of pre-defined rules on a set of target metrics for a group of VMs so that under rule-based conditions, certain actions

4.4. RUNTIME SCALING LITERATURE REVIEW 47

will be triggered. To be more precise, one rule for describing when and how to scaling up and another for scaling down. For example, when the average RAM usage of application server group is above 80%, the scaling action “add a new VM" for application server group should be performed.

Threshold based runtime scaling method is based on a set of rules. Each rule is expressed in the form of "if... then... ".

In " then ", scaling actions are defined. Theoretically, scaling action can be either horizontal scaling or vertical scaling. In practice, vertical scaling is not supported by most of the IaaS providers. A cooling down time is set for each scaling action to tell after executing current scaling action, after how long time corresponding preconditions should not be able to trigger this scaling action. This cooling down time is designed to avoid the situation that scaling actions are continuously triggered before previous scaling action contributes on the performance metrics. It’s because the boot up time of a VM takes several minutes.

In "if ", the preconditions to execute the corresponding scaling action are defined. Pre- conditions are usually a logical expression composed of one or more comparison expressions Each comparison expression defines the threshold of metric. For one metric, at least a couple of thresholds should be defined. One for upper bounder of the acceptable range of the metric value. Another one for lower bounder. For each precondition, a time value is set to tell how long the logic expression must be met to trigger the corresponding scaling actions. This time value is designed to avoid the situation that the threshold based runtime control system is too sensitive to the threshold.

The metrics used in the comparison expression can be different for different applications. For different applications and different server of applications, the key metrics for application performances are variational. Key metric is a resource utilization indicator which instructs that application system is under pressure of increasing allocated resource to guarantee the QoS or decreasing resource allocation to eliminate unnecessary cost. Potential metrics can be related to storage capacity, server processing power, RAM capacity, Network bandwidth , Database transactions per second and storage input/output operations per second [GoG10].

These metrics can be collected and computed periodically by monitoring system at runtime. For example, AWS Auto Scaling service makes the scaling up or scaling down decisions according to the performance metrics provided by CloudWatch which is a monitoring tool provided by AWS. CloudWatch collects performance data, such as CPU utilization, from AWS resources like instances every 5 minutes (performance data can also be collected every 1 minute with extra charges). These performance data can be used in defining the Auto-Scaling strategy. For example, if average CPU utilization is more than 70% , then start one more instance.

Reinforcement learning (RL) technical can help system to automate understanding, learning and making decision without any priori knowledge. RL can be used to solve runtime scaling problems by capturing performance informations about application and infrastructure from monitoring system. Typically, RL has an agent to interact with its environment and learn to establish its theory for decision making based on the rewards or punishments from environment. Agent should always try to maximize their rewards and minimize their punishment. In the context of runtime scaling, a scaling engine performs as the agent to make the decision of increasing or decreasing the allocated amount of resource. The aforementioned environment is an application deployed in the cloud infrastructure. The metrics such as metrics collected by monitoring system or the coming workload of the application environment are provided to the scaling engine as the state. It will be used as the input of scaling engine. The application envi-

48 CHAPTER 4. STATE OF THE ART-RUNTIME CONTROL

ronment also provides the scaling engine a reward or punishment for scaling engine to optimize its decision to maximize the reward and minimize the punishment.

Traditionally, queuing theory is widely used to model application for analysing application performance, for example the inter-arrival time, the average number of requests, the average waiting time. Typically, when request arrives at the system, it joins the system and waits in the queue until it is processed by one of the servers. There can be several or only one servers in the queuing model which is not necessary the same as the real application system.

The aforementioned typical three tier application can be modelled by using queuing theory in two different way. One way is using one single queue for the load balancer and several VMs as servers to process requests. The other way is using several queues in one complicated queuing model. This complicated queuing model uses one queue to depict each tier of the application and one or more VMs are depicted together as one server in the model.

The inconstant coming requests and changing ability of servers are the big problems of adopting queuing theory for our problem. For example, the amount of allocated resources for one tier or the application can be changed dynamically at runtime. These will leads to different number of servers in the queuing model. Therefore, the queuing model needs to be calculated for all possibility of deployment for each workload condition in advance or periodically recal- culated at runtime which makes queuing theory difficult to use in practice for runtime scaling problem.

Control theory has also been used to solve runtime scaling problems. The simplest control system is open loop control system. Open loop controller only computes the input of the target system based on the current state of the system by using the model of the system but do not have a look at the outputs of the target system and not to speak of correcting the model with the undesirable output as what feedback controller did. An even better developed control system is feed-forward controller. Based on feedback controller, it predicts the potential error of the output of target system and adjusts the output of controller, which is also the input of the target system, to avoid errors in advance. Fuzzy controller which means a control system based on a fuzzy model can also be used for runtime scaling problems. Fuzzy models can help to locate the required resource for workload according to predefined rules in the form of fuzzy sets. These rules have one precondition and corresponding consequence. The fuzzy model should be defined before runtime which makes fuzzy controller based runtime scaling method not well adapted to dynamic workloads.

Time-series represent a series of data recorded according to one specific time interval, for example the average response time of the whole application calculated at every one minute. Time-series analysis means the method which has been intensely used to represent and analyse the change of a measurement with time. For runtime scaling problem, time-series, such as average memory usage or average requests arrival rate periodically collected by monitoring system, can be analysed to predict the future data. This prediction will be further used by proactive runtime scaling method to make the scaling decision. The repeated patterns hidden in time-series can be dig based on the past observations, and then be used to predict the future workload or metrics.

In document FEDERACIÓN MEXICANA DE CHARRERÍA, A.C. (página 62-66)