Extremum seeking for evolutionary game theory

Texto completo

(1)UNDERGRADUATE THESIS Presented to. UNIVERSIDAD DE LOS ANDES FACULTY OF ENGINEERING DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING. In Partial Fulfillment of the Requierements for the Degree of. BACHELOR IN ELECTRONICS ENGINEERING. by. Jorge Ivan Poveda Fonseca. EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORY. Presented on December 7, 2011 to:. Examining commitee members Advisor:. Nicanor Quijano, PhD. Associate Professor, Universidad de los Andes. Jury:. Carlos Rodriguez, Dr. Associate Professor, Universidad de los Andes.

(2) Contents 1 Introduction. 5. 2 Objectives 2.1 General Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Specific Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 7 7 7. 3 Project Justification. 8. 4 Theoretical Framework 4.1 Game Theory: Basic Concepts . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Non-Cooperative Games . . . . . . . . . . . . . . . . . . . . . 4.1.2 Evolutionary Game Theory and Replicator Dynamics equation 4.2 Asymptotic Methods for Analizing non Linear Systems . . . . . . . . 4.2.1 Averaging Techniques . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Singular Perturbation Theory . . . . . . . . . . . . . . . . . . 4.3 Nash Equilibrium Seeking . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. 10 10 10 12 15 16 18 20. 5 Methodology. 22. 6 Results 6.1 On-line Optimization of Distributed Generation Using Extremum Seeking Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.3 Problem Framework . . . . . . . . . . . . . . . . . . . . . . . . 6.1.4 Dispatch Strategies . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.5 Simulation and Discussion . . . . . . . . . . . . . . . . . . . . . 6.1.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 A Shahshahani gradient based modified Extremum Seeking Control for non-model online optimization of multivariable problems with constraints 6.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 23. 1. 23 23 23 25 28 40 48 48 48.

(3) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES 6.2.2. Shahshahani gradient based Scheme . . . . . . . . . . . . . . . .. 2 49. 7 Discussion of Results. 54. 8 Conlusions. 55. 9 Acknowledgements. 56. Bibliography. 59.

(4) List of Figures 4.1. Scheme of extremum seeking control [28] . . . . . . . . . . . . . . . . .. 21. 6.1 6.2. Scheme of extremum seeking control for 4 DGs playing a Nash game. . Dispatch for 2 DGs. P n1 = 70[kW ]. P n2 = 50[kW ]. c1 = 0.9. c2 = 0.8. a) P d = 105[kW ], b) P d = 200[kW ] . . . . . . . . . . . . . . . . . Dispatch for 3 DGs. P n1 = 20[kW ]. P n2 = 30[kW ]. P n3 = 50[kW ]. c1 = 1. c2 = 0.8. c3 = 0.6. Pd = 50[kW ] f or 0 < t < 5h. Pd = 20[kW ] f or 5 < t < 10h, Pd = 70[kW ] f or 10 < t < 20h . . . . . . . . . . . . . . . . . . a) Dispatch for 4 DGs,P n1 = 172[kW ]. P n2 = 47[kW ]. P n3 = 66[kW ]. P n4 106[kW ]. c1 = 0.2. c2 = 0.1. c3 = 0.8. c4 = 1. b) %Error in the power dispatched . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dispatch for 3 DGs,P n1 = 70[kW ]. P n2 = 50[kW ]. P n3 = 66[kW ]. c1 = 1. c2 = 0.8. c3 = 0.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dispatch for 4 DGs,P n1 = 172[kW ]. P n2 = 47[kW ]. P n3 = 66[kW ]. P n4 = 106[kW ]. c1 = 1. c2 = 0.7. c3 = 0.4. c4 = 0.8 . . . . . . . . . . . . . . . Dispatch for 4 DGs,P n1 = 172[kW ]. P n2 = 47[kW ]. P n3 = 66[kW ]. P n4 = 106[kW ]. c1 = 1. c2 = 0.7. c3 = 0.4. c4 = 0.8 . . . . . . . . . . . . . . . Dispatch for 4 DGs,P n1 = 172[kW ]. P n2 = 47[kW ]. P n3 = 66[kW ]. P n4 = 106[kW ]. c1 = 1. c2 = 0.7. c3 = 0.4. c4 = 0.8 . . . . . . . . . . . . . . . A1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dispatch for 3 DGs with dynamics response,P n1 = 70[kW ]. P n2 = 80[kW ]. P n3 = 66[kW ],c1 = 1. c2 = 0.9. c3 = 0.6 . . . . . . . . . . . . . Distribution of total demanded power among 2 Microgrids, for Pt = 60[kW ]. a)Microgrid 1: Pn11 = 40[kW ], Pn12 = 27[kW ], c11 = 1, c12 = 0.9, b)Pn21 = 30[kW ], Pn22 = 27[kW ], Pn23 = 25[kW ], c21 = 0.9, c22 = 0.7, c32 = 0.8, The production cost for the microgrid 1 is c1 =0.8, and for microgrid 2 is c2 = 0.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . Scheme for Shahshahani gradient based ESC . . . . . . . . . . . . . . . Trajectory of the vehicle coverging to the maximum of the function estimated inside the simplex . . . . . . . . . . . . . . . . . . . . . . . . . .. 30. 6.3. 6.4. 6.5 6.6 6.7 6.8 6.9 6.10 6.11. 6.12 6.13. 3. 41. 42 = 42 43 43 44 45 45 47. 47 49 50.

(5) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 6.14 Trajectory of the vehicle coverging to the maximum (x=7, y=6) . . . . 6.15 Trajectory in function of the time of a vehicle coverging to the maximum (x=7, y=6) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.16 ESC based on shahshahani gradient for a vehicle tracking a static source emiting an unknown signal . . . . . . . . . . . . . . . . . . . . . . . . . 6.17 ESC based on shahshahani gradient for a vehicle tracking a dynamic source emiting an unknown signal. . . . . . . . . . . . . . . . . . . . .. 4. 51 51 52 53.

(6) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. Chapter 1 Introduction Extremum seeking control (ESC) is a type of local controller that aims to maximize (minimize) in a closed loop form a specific payoff function, which is generally defined as the output of a dynamic system. Using a small sinusoidal perturbation, ESC calculates the set point of the input of system that will guarantee the output signal to be in its extremum point. Due to the fact that this type of controller is non-model based, it is considered a type of adaptive controller, and it can be used in complex problems where the model of the plant is not available. The ESC was first introduced in 1920s in rail road applications. However, it was not until the year 2000 when in [16] it was shown the first proof of stability of the ESC based on singular perturbation theory and averaging methods. Some applications of ESC mentioned in [1] are ABS break control, flight formation control, bioreactor control, and combustion instabilities control. Later, in [28] a simplified ESC was proposed. In this scheme the high pass filter is not included and semi-global practical asymptotical stability is shown. The nature of the classical ESC can be seen as a gradient descendent based system. In [26] an application of ESC for distributed sensors aiming to attain a Nash Equilibrium is introduced. In [15] the validity of ESC for solving static non-cooperative games in real-time with quadratic payoff functions (imposing the existence of only one Nash equilibrium) was shown. This scheme has the advantage that the players need only to measure their own payoff functions, without any mathematical knowledge of these functions or the game structure, to converge to the Nash equilibrium. In [7], this approach has been extended to static games with non-quadratic payoff functions, which allows the existence of multiple Nash equilibriums. In this case, the actions of the players who implement a strategy based on ESC converge to a neighborhood biased away from the Nash equilibrium in proportion to the third derivatives and the amplitude of the perturbation signals. In [8] have been introduced dynamics to the actions of the players. This scheme is analyzed via classical singular perturbation theory and averaging methods, showing convergence to the Nash equilibrium (or at least to a neighborhood biased away) of the non- cooperative game.. 5.

(7) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. Many engineering applications of ESC have been proposed until now. In [10], a biochemical reactor has been controlled by ESC. In [25] ESC is applied to electromechanical valves. In [32] mobile robots are controlled by ESC and in [33] a human exercise machine is analyzed also by ESC. In [27] a complete survey of ESC until year 2010 is presented. The fact that ESC can be used to achieve a Nash equilibrium in non-cooperative games has opened the doors to the developtment of new research of ESC related to the game theory area. Many engineering problems can be formulated or model in terms of games. From non-cooperative games, where the players compite to maximize their individual profit, until cooperative games and evolutionary games, several problems can be accurately modeled and solved by game theory ideas. The evolutionary game theory is one of the areas of game theory that deals with games where the concept of evolution, mostly based on natural selection, is applied. Population dynamics fit in this area and the replicator dynamics equation is the most famous mathematical model to analyze the behaviour of a population living in a certain habitat. This dynamics can succesfully solve problems of resource allocation, where a fixed amount of resources must be allocated among a fixed number of agents, following a certain criterion. The replicator dynamic equation can also be used for multivariable maximization under constraints. The nature of the replicator dynamcs is based on the Shashahani gradient, which is closely related (at least in purpose) to the descendent gradient method in which is based ESC. Other characteristics as adaptability and estability are common in both algoritms. Following this ideas, in this work it has been analyzed how ESC and population dynamics ideas can be related or can be used to desing optimal solutions to engineering problems. The rest of this document is organized as follows, Chapter 2 presents the principal objectives of the project. Chapter 3 introduces the Project Justification. Chapter 4 presents the general ideas behing game theory and ESC. Chapter 5 presents the methodology associated to this project, and finally in Chapter 6 the main results are presented.. 6.

(8) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. Chapter 2 Objectives 2.1. General Objective. Analyze the implementation of Extremum Seeking Control in problems modeled as non-cooperative games with population’s dynamics.. 2.2. Specific Objectives. The three specific objectives associated to this project are: 1. Analyze the static problems modeled as non-cooperative games, based on the Extremum Seeking approach. 2. Analyze the dynamic problems modeled as non-cooperative games, based on the Extremum Seeking approach. 3. Analyze and verify the validity of Extremum Seeking Control in the solution of problems modeled as non-cooperative games with population dynamics.. 7.

(9) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. Chapter 3 Project Justification Most of the engineering systems are characterized by the presence of optimal operation points. These optimal operation points are described as ’optimal’ in the sense that they generate the ’best’ response of the system in terms of a given cost function, which is generally described as an efficiency or economic profit function. Based on this fact, the operation of the systems on this optimal point is a critical factor for the industries. Classical control algorithms have been used to ensure the operation of a system on a given set point, which has been previously found (or calculated) by the characterization of the plant. This scheme has the disadvantage that the intrinsic operation point of the plant can change over time, leading the system to operate on an non-optimal point. Optimal control algorithms can be used to find in a closed-loop way the input that will guarantee the system to extremize a given cost function. These algorithms have been used successfully in many applications where the plant has linear dynamics, but on the other hand, for non-linear complex dynamics the optimal control algorithms are difficult to apply. This problem is even more notorious when the model of the plant is not available and the controller can only perform the control actions based on the measure of the output of the plant. Extremum seeking control can be used to carry out real-time optimization of complex non-linear dynamic systems. The basic idea behind extremum seeking control is the calculation of the optimal input value of the system (’optimal operation point’), using a small perturbation signal and the derivatives of the measured cost function (generally defined as the output of the system). The optimization process is carried out in a closed loop form by classical gradient based methods. The advantage of this algorithm is the non-model requirement of the plant to be controlled. This relaxation allows extremum seeking control to solve many dynamic complex optimization engineering problems. One of the areas that fits into this description and that has recently received an increase attention is the control (and solution) of problems modeled as Nash games. Coordination of entities and real-time negotiations are some examples on this. 8.

(10) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. area that has strong applications on communications, power systems, smart grids and defense. The advantage of using extremum seeking for solving Nash games is based on the fact that players who participate on the game do not need to have knowledge of the model of the game, and just by measuring their own cost function (which they aim to extremize) they decide the value of their respective actions. Given these ideas it is of interest to analyze the possibility of using extremum seeking control to solve another type of games closely related to Nash games, Evolutionary games. The concept of evolutionary games can be used to analyze populations dynamics. These dynamics describe how the frequency of strategies implemented by the players of a population change in time according to a given payoff function. Different types of controllers for engineering problems have been developed based on this concepts and the possibility of solving this engineering problems using extremum seeking control generates an open area of research.. 9.

(11) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 10. Chapter 4 Theoretical Framework 4.1 4.1.1. Game Theory: Basic Concepts Non-Cooperative Games. Games are models that describre conflict of interests between agents that participate in them [30]. These agents are called players and their actions generally influence the behaviour of the other players. Game theory (GT) studies the ways of analyzing these problems and in some cases it suggest a ”solution” to the game, that is the best way of playing the game for each person involved. Formaly the concept of GT was introduced from the mathematics perspective in [31]. After this, a great interest emerged specially from mathematician and biologist, thanks to the fact that many of the behaviours of the life follow these ideas. For the present study let P1 and P2 be two players who can take decitions. These decitions are defined as θ1 and θ2 and they belong to a set which have a maximum number of posible decitions D1 and D2 . θ1 ∈ {1, 2, . . . , D1 } θ2 ∈ {1, 2, . . . , D2 }. (4.1). A strategy for each player is defined as the plan determined at the start of the game, that describes what a player would do in every possible situation [3]. So, if player Pi always takes the same decitions in a game then we say that P1 implements always a pure strategy. Given a decition taken by players P1 and P2 this would be reflected on a profit for each of them, given by, J1 = J1 (θ1 , θ2 ) J2 = J2 (θ1 , θ2 ). (4.2). So, their individual profit is affected not only by their individual actions but also by the actions of the other players. If what player P1 wins, player P2 loses, then we are.

(12) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 11. talking about a zero-sum game. In this type of games generally the payoff associated to the players is defined in terms of payoff matrices of dimension D1 × D2, so if we define as i, j the elements of the vector θ1 and θ2 respectively, the payoff of the player P1 when he implements an action (strategy) i and P2 implements an action (strategy) j, is given by the value of the payoff matrix corresponding to the row i and the column j.. Nash Equilibrium Assuming that the players are racional, the fact that the profit of each player depends also on the actions of the other players presents a difficulty for each player to maximize their individual profit. However in some cases an ”equilibrium” can be achieved, where if each player takes a determined decision, then they would not regret about their decision thanks to the fact that if they would had taken another one, they would had gained less or in the best case the same profit, that they earned using this determined decision. We will refer tho the set of this decitions as a N̈ash equilibrium.̈ A pair of decitions (θ1 , θ2 ) is called a Nash equilibrium if for a two player game holds the following inequality, J1 (θ1∗ , θ2∗ ) ≥ J1 (θ1 , θ2∗ )∀θ1 ∈ {1, . . . , D1 } J2 (θ1∗ , θ2∗ ) ≥ J1 (θ1∗ , θ2 )∀θ2 ∈ {1, . . . , D2 }. (4.3). The Nash equilibrium can be unique, multiple or do not even exist and indicates that if any player deviates from playing on this Nash equilibrium, then the other player can take advantage and improve his payoff [4], so each part of a Nash equilibrium is a best response to the other. Mixed strategies In many type of games do not exist an equilibrium in terms of pure strategies. Sometimes the decitions are changed during the game aiming to maximize their individual profit and arrived at an equilibrium. In this case we are not talking any more about a pure strategy as the one exposed before, but about a mixed strategy. In this mixed strategy the players can chose between each decition giving a certain probability to each one of them. The Nash equilibriums can also be defined in terms of mixed strategies. For a two player game with a finite maximum number of decitions D1 and D2 , the stratey for player P1 is defined as π1 and the strategy for the player P2 is defined as π2 . Each of these strategies is a vector whose elements πji for j = {1, 2} and i = {1, . . . , Dj } give the probability associated to take each of the actions of the sets θi . So we have PDj that 0 ≤ πi ≤ 1 and i=1 πji = 1. The expected payoff value associated to the player.

(13) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 12. P1 is given by, w1 (π1 , π2 ) =. D1 X D2 X. π1i J1i,j π2j = π1T J1 π2. (4.4). i=1 j=1. and for player P2 w2 (π1 , π2 ) = π1T J2 π2. (4.5). and the pair (π1∗ , π2∗ ) is a Nash mixed equilibrium if, w1 (π1∗ , π2∗ ) = w1 (π1 , π2∗ ) ∀π1 ∈ Π1 w2 (π1∗ , π2∗ ) = w1 (π1∗ , π2 ) ∀π2 ∈ Π2. (4.6). where Πj is a simplex that containes all the posssible strategies associated to Pj .. 4.1.2. Evolutionary Game Theory and Replicator Dynamics equation. The concept of evolutionary game theory was introduced in [19], based on the idea that if we looked at interactions among players as a game, then the better strategies will eventually dominate in the population. The main concept of evolutionary game theory is the ”evolutionary stable strategy” (ESS), which is a strategy that would not be invaded by any mutant so that bad mutations would not overtake the population [4]. A strategy X ∗ is an ESS against strategies X1 , . . . , Xs if either of the following conditions hold, 1. u(x∗ , x∗ ) > u(xk , x∗ ), for each k = 1, 2, . . . , s 2. for any xk such that u(x∗ , x∗ ) = u(xk , x∗ ), we must have u(x∗ , xj ) > u(xk , xj ), for all j = 1, 2, . . . , s Properties of an ESS 1. If X ∗ is an ESS then (X ∗ , X ∗ )is a Nash equilibrium and only the symmetric Nash equilibria of a game are candidate for ESSs. 2. If (X ∗ , X ∗ ) is a strict Nash equilibrium, then X ∗ is an ESS. 3. A symmetric Nash equilibrium X ∗ is an ESS for a symmetric game if and only if u(x∗ , y) > u(y, y) for every strategy y 6= x∗ that is a best response strategy to x∗ ..

(14) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 13. The proof of this properties can be find in [4]. One of the important ideas related to the ESS is that eventually, players will choose strategies that produce a better-than-average payoff. This idea is modeled in a first order equation called the frequency dynamics or replicator dynamics. It is based on the fact that the growth rate at wich the population percentage using a strategy i changes is measured by how much greater (or less) the expected payoff using the ith strategy is, compared with the expected payoff using all strategies in the population [4]. This dynamics were introduced in [29] and thanks to their simplicity they have been used in many engineering applications [24], [20], [23]. For the mathematical definition of the replicator dynamics as an evolutinoary game, H = {1, 2, . . . , N } is defined as the set of possible pure strategies and xi (t) ≥ as the total amount of individuals playing the strategy i ∈ H in a time t. The population state is defined as x(t) = [x1 (t), . . . , xN (t)]T where xi is the proportion of individuals genetically programmed for using pure strategis. The size of the population is given by, Pd =. N X. xi (t). (4.7). i=1. The state of the population is given by, pi =. xi Pd. (4.8). so the population state satisfies that x(t) ∈ 4 where, 4 = {pi ∈. RN +. :. N X. pi = 1. (4.9). i=1. 4 is defined as the simplex or constrained set, where the population states are confined. Note that the definition of the population state has the same form of a mixed strategy in this game. If we define π̄i as a vector whose elements are zero except from the ith element (vector of a pure strategy), then the expected value of the payoff obainted for selecting the ith strategy is given by, N X w(π̄i , p) = (4.10) J1ij pj j=1. where J1 is the payoff matrix of the first player. The population average payoff is defined as, N X N X w̄(p, p) = pi J1ij pj (4.11) i=1 j=1.

(15) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 14. If it is assumed that the payoff w(π̄i , p) represents an increased in the fitness function associated to each strategy, then if κ and λ are the per capital rate of birth and mortaliy, the rate of change of the numbers o individuals who use a strategy i is given by, dxi = (κ + f (i) − λ)xi (4.12) dt for all i ∈ H. The rate of change of the total population is given by, dP = (κ − λ + f¯)P (4.13) dt P where f¯ = N j=1 f (j)pj . From the game theory perspective it is of interest to analyze how the proportions pi (t) change in time, d(P pi ) dxi = dt dt. (4.14). which yields in, dpi = pi (f (i) − f¯) (4.15) dt so, the proportion of individuals using a strategy i will grow if their payoff is bigger than the average payoff of the population. Properties of the Replicator Dynamics Equation Some of the most interesting properties of the replicator dynamics equation are given by, 1. If some initial condition is zero (pi (0) = 0), this population will not evolve. 2. If p(0) ∈ 4, all populations (pure strategies) will be present at any time 3. If p(0) ∈ int(4) is not an equilibrium point of the replicator dynamics equation, a case of convergence of the trajectories to a stationary state can be achieved asymptotically only when t → ∞. 4. If x(0) ∈ int(4) some population states can tend to zero as t → ∞. If a pure strategy become extinct, the replicator dynamics prevent any population quantity from becoming negative and this situation is called truncation of a population state. The selection of the fitness function plays an important role in the behaviour of the solutions of the replicator dynamics equation. According to this the definition of f i for most of the applications is stated as a Lipschitz continuons mapping in 4, f i : 4 → R, and strictly decreasing in pi . By defining the fitness function in this way it is assured that the trajectories p(t) that solves the replicator equation are unique and continuous through every initial condition x(0) ∈ 4 and all t ∈ R..

(16) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 15. Equilibrium and Stability of the Replicator Dynamics The equilibrium of the replicator dynamics will by given by, pi (f (i) − f¯) = 0. (4.16). From 4.16 it is clear than p∗ = 0 is an equilibrium of the RDs equation. This equilibrium is called a degenerate equilibrium point, since 0 ∈ / 4. Now, when there is no truncation the equilibrium of 4.16 is given by, f (pi∗ ) = f¯(p∗ ). (4.17). This indicates that at the equilibrium all population states earn the same fitness. A special case happens when there is truncation (some of the strategis become exctinct). In this case an aditional condition for the fitness functions is defined,ie., f i(0) = Bi , Bi > 0 for all i. Thanks to the fact that fi (xi ) is assumed strictly decreasing and that xi > 0, Bi is the maximum value of the ith fitness. The equilibrium in this cases is given by, fi (p∗i ) = f¯∗ fj (p∗j ) = Bj. (4.18). for all the j th strategies that become extinct. If the fitness function fi (pi ) is a scalar Lipschitz continous mapping in 4, is stricly decreaing and fi (0) = Bi , Bi > 0, then if p(0) ∈ int(4) and Bi > f¯∗ for all i ∈ H the equilibrium point p∗ ∈ int(4) that satisfies 4.16 for the replicator dynamics is asumpotically stable with region of attraction 4. If p(0) ∈ int(4) and Bi ≤ f¯i for some i ∈ H, the equilibrium point x∗ ∈ bd(4) is asymptotically stable with region of attraction 4.. 4.2. Asymptotic Methods for Analizing non Linear Systems. Only some non linear differential equations present simple exact anlaytic solutions. One of the tools for analyzing the complex equations in which is very complicated to find an analytic solution are the asymptotic methods. This idea is based on the analysis of a simplified system that retains the convergence and stability properties of the original complex system. For the present case of study two methods are of interest: Averaging and Singular Perturbation. The type of problems that can be managed with these theories are generally problems with small perturbations, which in general can be defined as, ẋ = f (t, x, ) (4.19).

(17) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 16. where is a small scalar parameter. Suppose 4.19 has a solution given by x(t, ). The idea of these methods is to find an aproximate solution x e(t, ) such that the aproximate error x(t, ) − x e(t, ) will be bounded in some norm. One of the principle characteristics of these asymptotyc methods is that they reveal multiple-time scales present in the original systems. This multiple time scales are reflect in the fact that some variables move faster in the time with respect to others. This leads to the general clasification of ”slow” and ”fast” variables. Consider a system of the form, ẋ = f (t, x, ). x(t0 ) = η(). (4.20). As the goal of these methods is to exploit the ”smallness” of the perturbation parameter to construct approximate solutions that are valid for sufficiently small || [11], the simplest case is to set to zero the perturbation parameter leading to the unperturbed system, ẋ = f (t, x, 0) x(t0 ) = η0 (4.21) If we suppose that f is continuous in (t, x, ) and locally Lipschitz in (x, ), uniformly in t, and η is locally Lipschitz in for (t, x, ) in [t0 , t1 ] × D × [−0 , 0 ], then there is a positive constant k such that, kx(t, ) − x0 (t)k ≤ k,. ∀|| < 1 , ∀t ∈ [t0 , t1 ]. (4.22). when the aproximation error satisfies the bound of 4.22, it is said that the error is of order O() and it is written as, x(t, ) − x0 (t) = O(). (4.23). The following definition will be useful for the present study, Definition 4.2.1. δ1 () = O(δ2 ()) if there exist positive constants k and c such that, |δ1 ()| ≤ k|δ2 ()|,. ∀|| < c. (4.24). the fact that k is independent of || guarantees that the bound k|| decreases monotonically as || decreases. Note that for a given suficiently small value of , an O(n ) error will be smaller than an O(m ) error for n > m.. 4.2.1. Averaging Techniques. This method applies for systems of the form, ẋ = f (t, x, ). (4.25).

(18) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 17. where is a small positive parameter. This method aproximates the solution of ( 4.25) to the solution of an average system. Suppose f and its partial derivatives with respect to (x, ) up to the second order are continuous and bounded for (t, x, ) ∈ [0, ∞] × D0 × [0, 0 ], for every compact set D0 ⊂ D, where D ⊂ Rn is a domain and the paremeter is positive. The averaging method applies to systems in wich the function f (t, x, 0) has a well-defined average according to the next definition, Definition 4.2.1. A continuous, bounded function g : [0, ∞) × D → Rn is said to have an average gav (x) if the limit Z 1 t+T g(τ, x)dτ exists and, gav (x) = lim T →∞ T t (4.26) Z 1 t+T g(τ, x)dτ − gav (x) ≤ kσ(T ), ∀ (t, x) ∈ [0, ∞) × D0 T t for every compact set D0 ⊂ D, where k is a positive constant and σ : [0, ∞) → [0, ∞) is a strictly decreasing, continuous, bounded function such that σ(T ) → 0 as T → ∞. According to this definition it is of interest to ennunciate the general averaging theorem [11], Theorem 4.2.1. Let f (t, x, )and its partial derivatives with respect to (x, ) up to the second order be continuous and bounded for (t, x, ) ∈ [0, ∞) × D0 × [0, 0 ], for every compact set D0 ⊂ D, where > 0 and D ⊂ Rn is a domain. Suppose f (t, x, 0) has the average function fav (x)on [0, ∞) × D and the Jacobian of h(t, x) = f (t, x, 0) − fav (x) has zero average with the same convergence function as f . Let x(t, ) and xav (t)denote the solutions of the original system ẋ = f (t, x, ) and the average sistem ẋ = fav (x), respectively, and α be a class κ function, • If xav (t) ∈ D ∀ t ∈ [0, 1 ]and x(0, ) − xav (0) = O(α()), then there exists ∗ such that for all 0 < < ∗ , x(t, ) is defined and. x(t, ) − xav (t) = O(α()). b on 0, . (4.27). • If the origin x = 0 ∈ D is an exponentially stable equilibrium point of the average system, ω ⊂ D is a compact subset of its region of attraction, xav (0) ∈ ω, and x(0, ) − xav (0, ) = O(α()), then there exists ∗ > 0 such that for all 0 < < ∗ , x(t, ) is defined and. x(t, ) − xav (t) = O(α())∀t ∈ [0, ∞). (4.28).

(19) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 18. • If the origin x = 0 ∈ D is an exponentially stable equilibrium point of the average system and f (t, 0, ) = 0 for all (t, ) ∈ [0, ∞) × [0, 0 ], then there exists ∗ > 0 such that for all 0 < < ∗ , the origin is an exponentially stable equilibrium point of the original system. According to this, if the averaged system has an equilibrium point expontentially stable, then the original system will converge to a neighborhood of that equilibrium, so that the error is of order O(), given an constant sufficiently small. Based on this, the averaging method is a powerful tool to analyze complex systems in which an average system can be formulated.. 4.2.2. Singular Perturbation Theory. Another kind of perturbed system is the one stated as, ẋ = f (t, x, z, ) ż = g(t, x, z, ). (4.29). in this kind of system if = 0 an abrupt change in the dynamics properties of the system happens, as the differential equation ż = g degenerates into the algebraic equation, 0 = g(t, x, z, 0). (4.30). This discontinuity can be avoided by study the system 4.29 in multiple time scales. For instance it can be seen that if = 0 then ż = g will evolve fast (actually instantaneusly) to the equilibrium z = h(t, x). If this is true then the variable x will evolve in the steady state of z, that is, ẋ = f (t, x, h(t, x, ), 0) (4.31) The system described by 4.31 is called the quasi-steady state model and describes the behaviour of the original system 4.29 when the fast variable (z) has already converged to its equilibrium. Even when this system is a good approach of the original one, an extra model is needed to describe the transient behaviour of z. If a change of variables is defined as, y = z − h(t, x) and t = τ (4.32) then the fast model or boundary layer model can be defined as, dy = g(t, x, y + h(t, x), 0) dτ. (4.33). This model describes the behaviour of the transient response of the difference between z and its equilibrium h(t, x). The basic property needed in the singular perturbation theory is the stability of the bounday layer model, wich is defined as,.

(20) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 19. Definition 4.2.1. The equilibrium point y = 0 of the boundary-layer system is exponentially stable, uniformly in (t, x) ∈ [0, t1 ] × Dx , if there exist positive constants k, γ, and ρ0 such that the solutions of the bounray layer sustem satisfy, ky(τ )k ≤ kky(0)k exp(−γτ ), ∀ky(0)k ≤ ρ0 , ∀(t, x) ∈ [0, t1 ] × Dx , ∀τ ≥ 0 (4.34) The folloing theorem (Thikonov Theorem on the Infinite Interval) gives an insight of the way that singular perturbed problems can be analyzed, Theorem 4.2.1. Consider the singular perturbation problem and let z = h(t, z) be an isolated roof of 4.30. Assume that the following conditions are satisfied all. [t, x, z − h(t, z), ] ∈ [9, ∞) × Dx × Dy × [0, 0 ]. (4.35). for some domains Dx ⊂ Rn and Dy ⊂ Rm , which cotains their respective origins: • On any compact subset of Dx × Dy , the functions f, g, their first partial derivatives with respect to (x, z, ), and the first partial derivative of g with respect to t are continous and bounded, h(t, x) and [∂g(t, x, z, 0)/∂z]have bounded first partial derivatives with respect to their arguments,and [∂f (t, x, h(t, x), 0)/∂x] is Lipschitz in x, uniformly in t;the initial data ζ() and η() are smooth functions of ; • the origin is an exponentially stable equilibrium point of the reduced system; • the origin is an exponentially stable equilibrium point of the boundary-layer system, uniformly in (t, x). Then, for each compact set ωx ⊂ {W2 (x) ≤ ρc, 0 < ρ < 1} there is a positive constant ∗ such that for all t0 ≥ 0, ζ0 ∈ ωx , η0 − h(t0 , ζ0 ) ∈ ωy , and 0 < < ∗ , the singular perturbation problem has unique solution x(t, ), z(t, ) on [t0 , ∞], and x(t, ) − x̄(t) = O(). (4.36). z(t, ) − h(t, x̄(t)) − ŷ(t/) = O(). (4.37). hold uniformly for t ∈ [t0 , ∞), where x̄(t) and ŷ(τ )are the solutions of the reduced and boundary layer problems. Moreover, given any tb > t0 , there is ∗∗ ≤ ∗ such that z(t, ) − h(t, x̄(t)) = O() hold uniformly for t ∈ [tb , ∞)whenever < ∗∗. (4.38).

(21) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 4.3. 20. Nash Equilibrium Seeking. For the present study a non cooperative game with N players and a dynamic mapping from the player’s actions ui and their payoff values Ji , which the players wish to mazimize is considered. This system can be modeled by, ẋ = f (x, u) Ji = hi (x),. (4.39). i = 1, . . . , N. (4.40). where x ∈ Rn is the state,u ∈N is a vector of the players actions, Ji ∈ R is the payoff value of a player i, and f : Rn × RN → Rn and hi : Rn → R are smooth. In order to analyze this system the following assumptions are defined, Assumption 1. There exists a smooth function l : RN → Rn such that, f (x, u) = 0. if and only if. x = l(u). (4.41). Assumption 2. For each u ∈ R,the equilibrium x = l(u)of the system 4.39,where u ∈ U ⊆ RN is the action set of the players, is locally exponentially stable Assumption 3. There exists at least one, possibly multiple, isolated stable Nash equilibria u∗ = [u∗1 , . . . , u∗N ]such that, ∂pi ∗ (u ) = 0, ∂ui. ∂pi 2 < 0, ∂u2i. (4.42). for all i ∈ {1, . . . , N }, where pi (u) = hi (l(u)) Assumption 4. The Hessian matrix of pi (u∗ ) is diagonally dominant and hence, nonsingular. By using an extremum seeking control it is of interest to coverge to a Nash equilibrium u∗ that does not requiere the players to known the actions of the other players, the mathematical form of the payoff functions hi , or the dynamical system f . A scheme of the clasical extremum seeking algorithm is shown in Figure 4.1. Player i implements the following Nash seeking strategy: ui (t) = ûi (t) + µi t,. û˙ i = ki µi (t)Ji (t). (4.43). where µi (t) = ai sin wi t + φ and ai , ki , wi > 0. If ki is defined as, ki = w̄Ki = O(w̄) where w̄ = mini {wi } and are small positive constants.. (4.44).

(22) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 21. Figure 4.1: Scheme of extremum seeking control [28] By denoting the error relative to the Nash equilibrium as ũi (t) = ui (t) − µi (t) − u∗i and changing the time scales to τ = w̄t, the error system is formulated as, w̄. dx = f (x, u∗ + ũ + µ(τ )) dτ. (4.45). dũi = Ki µi (τ )hi (x), (4.46) dτ Note that for w̄ small this system is in the standard singular perturbation form, and thanks to the fact that is also small, the reduced model will be in the averaging form. Theorem 4.3.1. Consider the system 4.45,4.46 with 4.43 for an N -player game under Assumptionns 1-4 and where wi 6= wj , wi 6= wj + wk , 2wi 6= wj + wk and wi 6= 2wj + wk for al distinct i, j, k ∈ {1, . . . , N }. There exists constants w̄, ¯ and ā such that for all mini wi ∈ (0, w̄), ∈ (0, ¯) and maxi ai ∈ (0, ā), the solution (x(t)u1 (t), . . . , uN (t)) converges exponentially to an O(mini wi ++maxi ai neighborhood of the point (l(u∗ ), u∗1 , . . . , u∗N ), provided the initial conditions are sufficiently close to this point. The proof of Theroem 4.3.1 and some extensions of it, can be found in [15],[7] and [8]..

(23) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. Chapter 5 Methodology The methodology for this project is based on an extensive literature review on ESC, EGT and non linear systems. Based on this review, the results presented in recent literature have been replicated via simulation and new schemes have been analyzed theoretically and by simulation also. The methodology associated to this project is resumed with the following stages: • Study of Non linear systems • Study of Game Theory • Study of classical Extremum Seeking schemes. • Reproduction of previous results on classical Extremum Seeking control. • Study and reproduction of results on Extremum Seeking control for solving Nash Games with cuadratic payoff functions. • Study and reproduction of results on Extremum Seeking control for solving Nash Games with non-cuadratic payoff functions. • Analysis of extremum seeking control for solving zero sum games, non-zero zum games and evolutionary games.. 22.

(24) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. Chapter 6 Results 6.1 6.1.1. On-line Optimization of Distributed Generation Using Extremum Seeking Control Abstract. We propose an extremum seeking approach for solving the resource allocation problem of power dispatch in a distributed generation process, which we modeled as a classical non-cooperative game, where the players need only to measured their own individual payoff functions in order to attain a Nash equilibrium that coincides with the optimal and feasible solution of the dispatch problem. The power distribution system is analyzed at two different levels: at the low level, the dispatch of distribtued generators inside each microgrid is done aiming to maximize the utility of the microgrid, and at the upper level, a fixed demanded power is allocated among different Microgrids aiming to maximize the utility of the distribution network. Under the structure of the payoff function proposed, asymptotical convergence to a Nash equilibrium is achieved. Dynamics are included into the actions of the players, modeling their dynamic response. The results of the application of the extremum seeking control to solve the non-cooperative game are ilustrated via simulations.. 6.1.2. Introduction. Power distribution systems can be subdivided in small groups or subsystems composed of distributed generators (DGs), local loads, and storage devices. These subsystems are called microgrids and they are part of a hierarchical and distributed structure that offers several advantages over the classical schemes. One of these advantages is the possibility of controlling the power dispatched by each entity by using local information and resource allocation techniques.. 23.

(25) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. In general, we are interested in analyzing the special problem of power allocation in power distribution systems at two different levels: In the lower level, a given amount of demanded power asigned to a microgrid must be generated by the DGs that belongs to it, aiming to maximize the utility of the microgrid, and at the upper level a fixed demanded power of a given geographical zone must be covered by the power generation of different microgrids aiming to maximize the total utility of the Energy Network. These two allocation problems can be analyzed separately or together by separation of time scales. Most of the algorithms developed until now to deal with the solution of these problems are based on the knowledge of the structural form of the system. For instance, the market-based technique [23] uses concepts of economic theory to model the interactions in a market where the participants attempt to achieve an optimal equilibrium. However, in some cases, the commonly used market based control can not fullfil the constraints associated to the problem and new control strategies are needed[17]. In [23], a replicator dynamics approach has been implemented to solve the dispatch problem of distributing a fixed demanded power among the distributed generators that composed microgrid. This replicator dynamics approach has shown to be superior to the classical market-based control thanks to the adaptability of the replicator dynamics and the complete fulfill of the constraints associated to this optimization problem. This has been achieved because of the structural formulation of the fitness function. However, the adaptability of the replicator dynamics is subject to the adaptability of the fitness function, which is designed by the engineers based on the model of the plan to solve the specific problem. Conversely, extremum seeking control (ESC) has emerged as one of the most interesting schemes for on-line optimization of non-model dynamical systems. Several applications on ESC have been proposed in the last ten years after the publication of [16], where the first formal proof of the scheme, based on singular perturbation theory and averaging was presented. The idea behind the ESC proposed in [16] is to estimate the gradient of a function by introducing a small periodic perturbation signal that is used in a modulation/demodulation process in the dynamical system. A complete explanation of ESC can be found in [1] and [13]. Several contributions have been made in the last years to understand ESC and formulate new feasible applications, as motion control and coordination of different type of vehicles [14], [32], [9], control of power systems as wind turbines, photovoltaic systems and fuel cells, [5],[21],[34], and mechanical structure desing [12]. A complete survey on the continuous ESC can be found in [27]. However, it was not until the publication of [15] that the possibility of using ESC for solving non-cooperative games have been shown. In [15], the ESC has been proved to solve non-cooperative games where the players need only to know their. 24.

(26) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. individual payoffs in order to control their actions aiming to selfishly maximize their individual proffit. This idea is extended in [7] for non-cooperative games where the payoff functions where not necessary quadratic, leading to the posibility of existence of multiple Nash equilibria. Also it was shown, how using ESC as a strategy the actions of the players converge to a neigborhood of a stable Nash equilibrium. Later, in [8], the authors introduced some dynamics to the actions of the players and the convergence to a neigborhood of a Nash equilibrium was still achieved. Both levels of the power allocation problem in power distribution systems can be viewed as optimization problems with different constraints that can be alternatively modeled based on game theory ideas. In fact, the study of consensus algorithms, multi-agent control, and coordination problems from the game theory perspective offers an alternative way to solve optimally the power allocation problem. Based on these ideas, we propose an new scheme for solving the power dispatch allocation problem, which we have modeled as a non-cooperative game played by N players (local controllers of the DGs or the microgrids). In this case we deal with the question of designing the payoff functions of the players of non-cooperative game, so that the Nash equilibrium coincides with the solution of an optimization problem. This area can fit into the definition of distributed welfare games, which are the formalization of a resource allocation game with a specific structure enforced on player utility functions [18]. By implementing and ESC strategy the game is solved and just by only measuring their individual payoff function, the players achieve a Nash equilibrium which coincides with the optimal solution of the resource allocation problem. Modeling this system as a non-cooperative game has several adavantages,i.e. more robust and scalables systems as well as simpler communication protocols. Following the ideas of [8] the scheme proposed allows the inclusion of dynamics into the actions of the players. This work is organized as follows, in Section 7.1.3 the framework of the problem at the two different levels is formaly exposed. In Section 7.1.4 the control strategies with and without dynamics are introduced and section 7.1.5 illustrates the results by simulation. Finally, Section 7.1.5 ends with some conclusions.. 6.1.3. Problem Framework. Dispatch of DGs in Microgrids A power distribution system formed by M microgrids can be seen as a hierarchical system where the distribution network operator and the market operator assign a desired amount of power for each microgrid. This assigned power corresponds to the demanded power that the N DGs associated to a microgrid have to produce. A local controller (LC) is associated to each of the N DGs who belong to a given mirogrid. These LCs will receive the information of the demanded power that has been assigned to the entire microgrid. The LC will calculate the setpoint of the power to be produced. 25.

(27) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 26. by its DG, aiming to maximize the entire profit of the microgrid. This is done due to the fact that each DG has a power production cost and a nominal power associated. In other words, the idea is that the total demanded power assigned to the microgrid should be split into the DGs following a certain rule or policy. For the present study, only controllable DGs are considered. For the present study is assumed that each DGs has been optimally spatially located. The optimization problem in a dispatch, where the goal is to maximize the total utility of the microgrid constrained to a maximum amount of power availbale, for N DGs, can be specified by, max utot (p1 , p2 , . . . , pN ) =. N X. uj (pj ). j=1. s.t.. N X. pj = Pmax =. j=1. N X. (6.1) Pnj. j=1. where pi corresponds to the power supplied by the ith DG, and ui (pi ) corresponds to its utility for i = 1, . . . , N , and utot represents the total utility of the microgrid. If the utility function ui : R+ 7→ R+ is defined as a strictly concave function, then the maximization problem (6.1) is separable and has a unique optimal solution, which is obtained when all the marginal utilities are equal, ∂ui (p∗i ) =d ∂pi. ∀ i = 1, . . . , N. (6.2). P ∗ where d > 0, such that N j=1 pj = Pd According to common cost functions [2], the following quadratic utility function associated to the generation of each unit is used, ui (pi ) =. −pi (pi − 2Pni ), for i = 1, . . . , N. ci Pni. (6.3). where ci corresponds to the cost of generation associated to the ith DG, and Pni corresponds to its nominal power. It can be seen that the maximum utility is achieved when the DG generates its nominal power leading to a utility, ui (Pni ) =. Pni ci. (6.4). The maximization problem 6.1 is separable and concave and it can be solved ??.

(28) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 27. leading to the optimal power to be dispacthed, p∗i = Pni −. ci Pni (Pmax − Pd ) PN j=1 cj Pnj. (6.5). The goal is to achieve this optimal solution respecting the constraints of the problem. Dispatch of Demanded Power in Energy Networks The scheme shown before can be generalized to a case in where N Microgrids or power distribution systems that belong to a network system compete in a geographic space where there is a total power demanded Pt . In this scheme, each microgrid has an operation cost Cj and a nominal power Pnj , gived by the sum of the nominal powers of its DGs Pnji . As shown in the previous section each of these DGs has also a cost of operation cji , for j = 1, . . . , N and i = 1, . . . , M Given N microgrids, the amount of demanded power that the j th microgrid can handle is given by, Pdj =. M X. (6.6). Pnji. i=1. The total power demanded in the area Pt must be distributed among the microgrids that cover the zone, aiming to maximize the total utility of the network by controlling the amount of demanded power Pdj that is assigned to the j th microgrid. In this case the maximization problem can be formulated in the same form as the dispatch problem, max utot (Pd1 , Pd2 , ...., PdN ) =. N X. uj (Pdj ). j=1. s.t.. N X j=1. Pdj = Pmax =. N X. (6.7) Pnomj. j=1. where utot represents the total utility of the network where the microgrids operate, and uj represents the utility of each microgrid. Having in mind that the j th microgrid has an operation cost Cj and a maximum power generation capability Pnj , the individual utility of each microgrid can also be stated as (6.3), but in therms of the variables Pnj and Cj , uj (Pdj ) =. −Pdj (Pdj − 2Pnj ), for i = 1, . . . , N. Cj Pnj. (6.8). which means that the utility of the microgrid is a function of the demanded power Pdj that she decides to handle, where the total demanded power in the geographical area.

(29) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 28. is given by Pt = sumN j=1 Pdj . The maximization problem of 6.7 is the same as 6.1 and the optimal solution is the same but in terms of the demanded power by each microgrid Pd , the nominal power of each microgrid Pnj and the cost of operation of each microgrid Cj , PN Pnj Cj j=1 Pnj − Pt ∗ Pdj = Pnj − (6.9) PN j=1 Pnj Cj The goal is again to achieve this optimal solution respecting the constraints of the problem.. 6.1.4. Dispatch Strategies. On-line Optimization of Dispatch of DGs in Microgrids Dispatch of Distributed Generators modeled as a Nash Games without dynamics The dispatch of distributed generators is modeled as a non-cooperative game, where the LC of the DGs are defined as the players, who aim to maximize their individual payoff function, defined as Ji by controlling their own actions pi . Based on this competition idea, under an appropiate definition of the payoff function of each player, convergence to a Nash equilibrium can be achieved. We define fi as a continuous, Lipschitz, bounded, strictly decreasing function, where f : D 7→ R, where D is the set of action of the players and, fi (pi ) = β. ∂ui ∂pi. (6.10). where β ∈ R+ is a constant parameter. Recalling the mathematical definition of a Nash equilibria (p∗i )in a non-cooperative game played by N players, i.e., ∂Ji (pi∗ ) =0 ∂dpi. fori = 1, . . . , N. (6.11). where Ji is the payoff function associated to the ith player, the payoff function for a non cooperative game played by N players is defined as, Z J = (fi − f̄i )dpi (6.12).

(30) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 29. P t where f¯ = N j=1 fi pi If the utility function ui of the i h DG is defined as (6.3), the function fi for the i DG will be given by, 1 pi dui = fi = (1 − ) dpi ci Pni. (6.13). where β = 1. Replacing (6.13) in (6.12) for a power distribution system with N DGs, the proposed payoff function associated to the ith DG can be formulated in terms of the utility function, Z N 1 X uj (pj )pj − uj (pj ) dpj . Ji = ui (pi ) − P d j=1. (6.14). Note that due to the fact that fi (pi ) function is strictly decreasing on pi , in the Nash equilibrium given by f racdJi (p∗j )pi the second derivative of the payoff function will be negative and the first derivative will be zero. Replacing ui by the definition of the utility function (6.3), we obtain the payoff function for each player (DG) given by, n pi p2i pi X pj pj Ji = − (3Pd − 2pi + 3Pni ) − 1− ci 6ci Pd Pni Pd j=1 cj Pnj. (6.15). j6=i. Note that this is a third order equation, so multiple Nash equilibria are present in this game. To attain a Nash equilibrium each player will implement the following deterministic extremum seeking strategy pi = p̂i + ai sin wi t (6.16) where p̂i is given by the solution of, p̂˙i = ki ai sin wi t Ji (pj ). ∀j = 1, . . . , N. (6.17). where ai and wi are positive parameters to be tuned, and ki = i w̄, where i is a small positive constant and w̄ = mini (wi ). A scheme of a power distribution system modeled as a non-cooperative game played by four DGs using extremum seeking control is shown in Figure 6.1. In order to analyze the optimality of the equilibrium point for this system, the next proposition shows the relation between the market based approach, and this noncooperative game approach solved by ESC. Proposition 7.1: If pi ∈ D, where D = {pi ∈ R+ : pi ≤ Pni }, and Pd ≤. PN. j=1. Pni ,.

(31) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 30. Figure 6.1: Scheme of extremum seeking control for 4 DGs playing a Nash game. then the power dispatched problem modeled as a non-cooperative game where the players aim to maximize their payoff function given by (6.15) implementing the extremum seeking strategy (6.16) and (6.17), converges asymptotically to a neighborhood of a pure Nash equilibrium, which corresponds to the optimal solution of the market based control. Proof: Changing the time scale to τ = tw̄ the closed loop system can be analyzed by clasical averaging techniques [11], where i acts as a small constant perturbation parameter. For N DGs the average system is given by, Z T wi 1 ave p̂˙i = lim i ai sin τ (Ji (pj )) dτ ∀i = 1, . . . , N (6.18) T →∞ T 0 w̄ which yields the average system for the ith player, p̂˙ave i. ai i = 2. N N 1 p̂i 1 X p̄j pˆj 1 X a2j a2 (1 − )− (1 − )+ + i ci Pni P d j=1 cj Pn j 2Pd j=1 2Pni ci Pni ci. ! (6.19).

(32) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 31. The equilibrium point of (6.19) for N players is given by, α. p̂ave∗ = i. 2Pd Pni ci Pni (cN − ci ) − [Pd + PnN cN ρ. α2 zP }|1 { z }| { N −1 2 ρa2N j=1 ai + + 2Pd 8Pd2 PnN cN. α. }|3 { N −1 X cj a2N 2Pni ci p + Pj (1 + cj + )] ± l1 + l2 + l3 + l4 4PnN cN Pd2 PN cN ρ j=1. (6.20). z. where l1 ,l2 ,l3 and l4 are given by, l1 = Pd2 Pn2N c2N. N X.  Pnj.  α4 z}|{ − a2j . (6.21). j=1 α. l2 = 2Pd2 Pn3N c2N. N −1 X. z Pn j +. j=1. 1. }|5. N −j. X. 4Pd2 P n k=1. aj aj+k −. {. a2j 2Pd. −. cN a4j 8Pd2 PnN cj. α6. }|. z. (6.22). {. cN a2j cN a2j + − 2Pd c2j Pn j cj. α. }|7 { N −1 N X X 1 l3 = −2Pd3 Pn2N c2N Pnj (1 + ak ) 8Pd k=1 j=1 z. (6.23). k6=j. α. l4 = 2Pd2 Pn2N c2N. N −1 X. Pnj. NX −j−1. j=1. k=1. }|8 { N −1 N −1 2 4 c j X ak c j X ak (Pnj+k ) + ( 2)+ ( ) 2Pd k=1 ck 8Pd2 k=1 ck Pnk z. k6=j. k6=j. (6.24). α9. z. N −1 X. N −1. k6=j. k6=j. }|. {. X a2 a2 cj a4N cj a2N cj a2N −cj ( k ) + cj + − ( k )+ Pn k ck Pnk ck 8Pd2 PnN c2N 2Pd cN PnN cN k=1 k=1. . and ρ is given by, ρ = 4Pd. N X i=1. Pni ci. (6.25).

(33) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 32. Note that for ai sufficiently small, the terms α1 , α2 , α3 , α4 , α5 , α6 , α7 , α8 and α9 in Equations (6.20)- (6.24) are close to zero, leading the following two aproximate equilibrium points: ! N −1 X 2P P c P (c − c ) d ni i n N i − Pd + PnN + pj (1 + cj ) p̂aprox,∗ = i i cN ρ j=1 v (6.26) !2 u N −1 u X Pni ci t Pd PN2 cN + Pd PN cN Pnj − Pd2 PnN cN ±2 P N cN ρ j=1 Expanding and reorganizing, p̂aprox,∗ i. N −1 X Pni ci Pni ci PnN Pni ci = Pni − Pnj cj + + PN P cN PnN cN N i=1 Pni ci j=1 i=1 Pni ci. The first aproximate equilibrium point of the average system is obtained, ! P −1 PN cN + N Pni ci Pn i ci j=1 PN cj aprox,∗ p̂i = Pni − + PN cN cN i=1 Pni ci = Pni p̂aprox,∗ i. (6.27). (6.28) (6.29). In the same way, the second aproximate equilibrium point is obtained, p̂aprox,∗ i. N −1 N −1 X X Pd Pni ci Pn i ci Pni ci Pni ci P Pnj + PN + = Pni − − PN Pn cj cN cN i = 1N Pni ci j=1 j i=1 Pni ci j=1 i=1 Pni ci (6.30) ! N X Pn ci Pnj − Pd (6.31) p̂aprox,∗ = Pni − PN i i i=1 Pni ci j=1. And for suficiently small ai , the terms α1 , α2 , α3 , α4 , α5 , α6 , α7 , α8 , α9 are aproximatly zero and the equilibrium point of the averaged system (6.19) is aproximatley given by, p̂∗i ≈ p̂aprox,∗ i. (6.32). Lighting up the importance of using small amplitudes ai in the perturbation signals. For simplicity the stability analyis is carried out for a 2 player Nash Game. The Jacobian evaluated at the aproximate equilibrium point p̂∗i ≈ Pni is given by, ! a12 21 Pn1 −Pd a21 1 ( ) 1 2c P d P n1 2Pd c2 J= (6.33) a22 2 a22 22 Pn2 −Pd ( ) 2 2Pd c1 2c Pd Pn 2.

(34) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 33. Whose characterisc equation is, 2 a21 a22 1 2 Pn1 − Pd Pn2 − Pd a1 1 Pd − Pn1 a22 2 Pd − Pn2 2 + + −1 λ +λ 2c1 Pd Pn1 2c2 Pd Pn2 4c1 c2 Pd2 Pn1 Pn2 (6.34) which yields in the following two conditions for the Jacobian matrix to be Hurwitz: Pn2 −Pd Pn1 −Pd a21 a22 1 epsilon2 −1 >0 1. Condition 1: Pn Pn 4c1 c2 P 2 1. d. 2. Condition 2:. a21 1 2c1. . Pd −Pn1 Pd Pn1. . +. a22 2 2c2. 2. . Pd −Pn2 Pd Pn2. . >0. However Pn1 , Pn2 , c1 , c2 , 1 , 2 , a1 and a2 are positive by definition, therefore Condition 1 can be seen as, Pn1 Pn2 − Pd Pn1 − Pd Pn2 + Pd2 −1>0 (6.35) Pn1 Pn2 Pd2 − Pd Pn1 − Pd Pn2 >0 Pn1 Pn2. (6.36). Pd (Pd − Pn1 − Pn2 ) > 0 Pn1 Pn2. (6.37). Pd > Pn1 + Pn2. (6.38). And Condition 2 holds if Condition 1 is achieved. So, the aproximate equilibrium point of the average system given by p̂aprox,∗ ≈ Pni will be locally asymptotically stable. The i second equilibrium point generates the following Jacobian matrix, ! P +P 2 −Pd Pn1 +Pn2 −Pd a21 1 1 a21 1 Pn1 −Pd ( Pd Pn c1 − 2 Pd (Pn1n c1n+P ) ( − 2 ) 2 c ) 2 c P c +P c n n n 2 1 1 2 2 1 1 2 2 (6.39) J= Pn1 +Pn2 −Pd Pn1 +Pn2 −Pd a22 2 Pn2 −Pd a22 2 1 ) ) − 2 ( − 2 ( 2 c1 Pn c1 +Pn c2 2 Pd Pn c2 Pd (Pn c1 +Pn c2 ) 1. 2. 2. 1. 2. which generates the characteristic equation given by, 1 0 = λ2 + λ 3Pn1 Pn2 c1 c2 (a21 1 + a22 2 )(Pn1 + Pn2 )− 2 2 2 2 Pd Pn1 Pn2 c1 c2 + Pd Pn1 Pn2 c1 c2 Pn1 Pn2 (Pn1 c1 a22 2 − Pn2 c2 a21 1 ) + Pd (Pn21 c21 a22 ∗ 2 + Pn22 c22 a21 1 ) (P2 Pn cj P2 Pn − Pd P2 Pn ci ) a2 a2 j i i j=1 i=1 i=1 1 1 2 2 2 2 − 2Pd Pn1 Pn2 c1 c2 (a1 1 + a2 2 ) + Pd Pn1 Pn2 c1 c2 (c1 + c2 ) 4 (6.40) Two conditions are needed in order to get a Hurwitz Jacobian,i.e, P P P 1. Condition 1: 2i=1 Pni ci 2j=1 Pnj − Pd 2i=1 Pni ci > 0.

(35) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 34. 2. Condition 2: 3Pn1 Pn2 c1 c2 (a21 1 +a22 2 )(Pn1 +Pn2 )−Pn1 Pn2 (Pn1 c1 a22 2 −Pn2 c2 a21 1 )+ Pd (Pn21 c21 a22 ∗ 2 + Pn22 c22 a21 1 ) − 2Pd Pn1 Pn2 c1 c2 (a21 1 + a22 2 ) > 0 Note tha Condition 1 can be see as, 2 X i=1. Pni ci. 2 X. Pnj − Pd > 0. (6.41). j=1. which can be simplified in to, 2 X. Pnj > Pd. (6.42). i=j. Because P n1 , Pn2 , c1 , c2 , 1 , 2 , a1 and a2 are positive constants by definition, Condition 2 can be seen as, Pn1 Pn2 c1 c2 (a21 1 + a22 2 )(3. 2 X. Pni − 2Pd ) > 0. (6.43). j=1. P So Codition 2 is satisfied if Condition 1 holds. So, if N i=1 Pni > Pd then, the equilibrium will be locally exponentially stable for the average system. Given given by p̂aprox,∗ i the stability properties of the aproximate equilibrium point of the average system, by general averaging theory [11], for suficiently small values of ai , p̂i will exponentially converge to a neigborhood of the equilibrium point given by, p̂∗i ≈ paprox,∗ + O() i. (6.44). pi = p̂i + ai sin wi t. (6.45). and as pi is given by, for a set of initial condition suficietly close to the equilibrium point, the original system will converge to a O(maxi ai + ) neighborhood of the equilibrium point of the average system (Nash equilibrium). This fact ilustrates the importance of using a small ai and ki . The stability proof for N players could be done using Lyapunov methods, where the candidate Lyapunov function can defined as the one in [22] . According to these results, if the demanded power is bigger than the sum of the nominal powers of the DGs in the microgrid, the equilibrium point given by p̂∗i = Pni , however, PN given an impossible Pd > j=1 Pnj for the microgrid, the DGs will be saturated on their nomimal power and the rest of the demanded power won’t be dispatched. This result is coherent with the phisical constraints of the system..

(36) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 35. Dispatch of Distributed Generators modeled as a Nash Games with Dynamics Even when the dispatch of DGs can be modeled as a static game, this not a very practic approximation of real power generation. In practice, each DGs does not respond instantaneusly to a demanded power. Therefore so a transiet in the DG response is generally asocciated to the generation process. The extremum seeking control can manage a power distribution system modeled as a non-cooperative game, where the actions of the players (the power dispatch to each generator) have dynamics associated. For the present case,it is assumed that each DG has a local controller that ensures a exponentially stable equilibrium of each DG, given a comanded power from the LC. Suppose a game played by N locally controled DGs, where the payoof function of each player is given by, Z N 1 X uj (poutj )pj − uj (poutj ) dpoutj . Ji = ui (pouti ) − P d j=1. (6.46). which can be expressed as, n p2outi poutj pouti pouti X poutj Ji = − (3Pd − 2pouti + 3Pni ) − (1 − ) ci 6ci Pd Pni Pd j=1 cj Pnj. (6.47). j6=i. where ṗouti = −Ai [pouti − pini ]. (6.48). where Ai ∈ R+ . Equation (6.48) represents the first order dynamics that model the dynamic response of most of the DGs (combustion motors, fuell cells, wind energy, and photovoltaic systems) under a given control law (however, any type of dynamics can be included in the DGs if the stability conditions are guaranteed). pouti represents the measured power generated by the ith DG and pini is the commanded power given by the extremum seeking strategy, pini = p̂ini + ai sin wi t. (6.49). where p̂ini is given by the solution of, p̂˙ini. = ki ai sin wi t Ji (poutj ) ∀j = 1, . . . , N.. (6.50). This scheme can be analyzed by classical singular perturbation theory, where if the dynamics associated to the transient between pouti and pin in each DG are fast enought compare to the learning dynamics of the extremum seeking control, the dispatch will be done in the quasi-stady state of the fast variable pout ..

(37) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 36. In general, changing again the time scale to τ = w̄t where ŵ = mini wi , and recalling the fact that ki = i w̄i the system is represented as, dpouti = Ai − pouti + pini (6.51) w̄ dτ w dp̂ini i = i ai sin τ Ji (poutj ) ∀j = 1, . . . , N (6.52) dτ ŵ For the fast dynamics given by (6.51) the equilibrium is, p∗outi = pini. (6.53). Replacing (6.53) in (6.52) the reduced system is obtained, wi dp̂ini = i ai sin τ Ji (pini−1 , pini , pini+1 ...pinN ) (6.54) dτ ŵ This reduced system is exactly as (6.17), which was shown via averaging analysis to converge to a estable equilibrium point given by, P N X Pni ci ( N j=1 )Pnj − Pd ∗ + O( + max a ) if P < Pn j (6.55) pini = Pni − PN i i d (P c ) n j j j=1 j=1 p∗ini. = Pni + O( + maxi ai ). if P d >. N X. (6.56). Pnj. j=1. Now, the boundary layer model, in the t time scale is given by, dy = −Ai y (6.57) dt and recalling the fact that Ai ∈ R+ , 6.57 has the origin as a stable equilibrium point. By Tikhonovś Theorem on the Infinite Interval [11], it can be concluded that the system 6.51 and 6.52 converges to a stable equilibrium point given by, p∗outi = pini p∗ini. P N X Pni ci ( N j=1 )Pnj − Pd = Pn i − + O( + maxi ai + mini wi ) if Pd ≤ Pn j PN j=1 (Pnj cj ) j=1 (6.58). or p∗outi = pini p∗ini. = Pni + O( + maxi ai + mini wi ) if Pd ≥. N X j=1. Pnj. (6.59).

(38) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 37. On-line optimization of Dispatch of Demanded Power in Energy Networks In [23] the authors shown that the dispatch problem of DGs could be modeled from a populations dynamics perspective, where the ith strategy corresponds to choose one of the N DGs in the microgrid, and the amount of power assiged to the it h DG P was defined as pi , leading Pd to be the fixed power demanded from the microgrid (i.e., N i=1 pi = Pd ). In this model a fitness function that represents the payoff of an individual by choosing a given strategy is compared with an average fitness function of the entire population. Based on the difference bewteen the individual fitness and the average fitness of the entire population the number of individuals living in a certain habitat (using some strategies) decrease or increase. This idea has been expresed mathematically in terms of the replicator dynamics equation, ṗi = pi fi (pi ) − f¯ (6.60) where fi (pi ) represents the payoff function associated to the it h strategy (DGs) and f¯ = P N 1 j=1 pj fj represents the average fitness of the population. Clearly, the equilibrium Pd of (6.60) is given when p∗i = 0 or when fi (p∗i ) = f¯, but the definition of the average fitness (f¯) guarantees the invariance of the constrained set 4p defined as, ( ) N X 4p = p ∈ RN+ : p i = Pd (6.61) i=1. so if the initial condition pi (0) is inside the simplex 4p , then pi (t) will be in it for all t ≥ 0. The fitness function implemented in [23] was given by, fij =. pij 1 (1 − ) cij Pinj. (6.62). P Using 6.62 as a fitness function in [23] was shown that if M j=1 Pinj ≤ Pdi given the invariance of the constraint set ∆p , the equilibrium of the RDs was shown to be ??, where in the equilibrium the following condition holds, N X. p i = Pd. (6.63). j=1. This important property allows the parameterization of pi in terms of Pd (actually this was the origin of the control problem). This parameterization can be used to control Pdi in N microgrids, in a manner that if the variation of Pdi is sufficiently small compared to the convergence velocity of the RDs given in 6.60, then the optimization of Pdi will take place in a quasi-steady state (Pdi sees pi always in its equilibrium p∗i )..

(39) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 38. This system can then be analyzed as a clasical singular perturbation problem, where the fast dynamics are associated to the RDs that control the dispatch of the DGs inside each microgrid. As this disptach is parameterized by the demanded power Pd:i to the microgrid, if the control of Pdi is sufficiently slow compared to the RDs the system can be analyzed using the reduced (or quasi-steady) model and the boundary layer model (See [11] for a complete explanation of singular perturbatio theory). Due to the fact that the RDs convergence velocity can be controlled by adjusting the β factor, the separation of time scales will be evident. Based on these ideas, we are interested in maximizing the utility of a network energy system composed by N microgrids by controlling the amount of the total demanded power Pt that is managed by each of the j th microgrids, where j = [1, . . . , N ]. For the present case, the allocation of the demanded power inside each microgrid is done via the RDs scheme 6.60 where the fitness function is chosen as 6.62. The allocation of Pdj into the N microgrids is modeled as a non-cooperative game where the microgrids act as players aiming to maximize their individual utility by controlling the demaned power Pdj they want to handle. According to this, the general system will present two different time scales, where the first one (faster) is related to the dispatch of the DGs inside each microgrid, and the second (slow) corresponds to the control of the demanded power that each microgrid has to managed. As the payoff function of each player Jj must reflect their actions (Pdj ) in the quasi steady state, the following payoff funcion is proposed for the j th microgrid (player) with M DGs associated, PM Jj =. i=1. pji. Cj. PM P PT M N PT 2 X X plk ( M i=1 pji k=1 plk i=1 pji ) (3Pt −( pji )+3Pnj )− − (1− k=1 ) 6Cj Pt Pnj Pt Cl Pnl i=1 l=1 l6=j. (6.64) where the l microgrid has T DGs and k = 1, . . . , T . Note that this payoff function has the same form as (6.15) with the difference that in this case p̄i is replaced by the P M j=1 pj . The Nash seeking strategy implemented by each player is again given by, th. Pdj = Pˆdj + aj sin wj t ∀j = 1, . . . , N. (6.65). where Pˆdj is given by the solution of, ˙ P̂dj = kj aj sin w̄j tJj (pji ). ∀i = 1, . . . , M. (6.66).

(40) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 39. Defining τ = w̄j t, where w̄j = mini wij and kj = j w̄j , the following singular perturbation problem can be formulated w̄j. dpi j = pi j(fi j − f¯) ∀i = 1, . . . , Mj dτ. (6.67). dPˆdj wj = j aj sin tJi (pj ) ∀j = 1, . . . , N (6.68) dτ w̄ where Mj represents the number of DGs that the j th microgrid has,pij is the ith DG associated to the j th microgrid, Pdj is the demanded power of the j th microgrid and w̄j is the minimum frequency of perturbation of the dither signal of the ESC used in the j th microgrid. It is important to note that the number of DGs M does not necessary needs to be the same inside each microgrid. The RDs of 6.67 has the following mapping in the equilibrium, N X. pij = Pdi .. (6.69). j=1. replacing 6.69 in 6.70, the reduced model is obtained,  dPˆdj w j  Pd Pd = j aj sin t  j − (3Pt − 2Pdj + 3Pnj ) − i dτ w̄ Cj 6Cj Pt Pnj Pt Pd2j. n X k=1 k6=j.  Pd k Pd  1 − k  ∀j = 1, . . . , N Ck Pnk. (6.70) which is the exaclty the same reduced system analyzed in the problem of dispatching of DGs, but changing the variable pi by Pdj . It has been shown that the average system of the quasy-steady state model of 6.70 converges to an a Nash equilibrium that has been shown to coincided with the solution of the market based problem, so that the original reduced model converges to a O() neigborhood of the equilibrium given by, PM Pnj Cj i=1 Pji − Pt Pdj = Pnj − (6.71) PM i=1 Pji Cj However, the convergence of the quasy-steady state model 6.70 is not sufficient to show stability of the overall system 6.676.70. The boundary layer model has to be analyzed and its origin must be an asymptotically stable equilibrium point. According to 4.33, the boundary layer model of this system is stated as, dy = (p∗i + y)(fi (p∗i + y) − f¯i (p∗i + y))∀i = 1, . . . , M dt. (6.72).

(41) EXTREMUM SEEKING FOR EVOLUTIONARY GAME THEORIES. 40. For y = 0, dy = (p∗i )(fi (p∗i ) − f¯i (p∗i )) = 0 (6.73) dt and since p∗i is an stable equilibrium of the RDs, the right hand side of 6.73is equatl to = 0, the origin is an equilibrium point of 6.73 and it will retain zero and since dy(y=0) dt the stability properties of the RDs, where p∗i is an exponentially stable equilibrium point as shown in [22]. Using again the Tikhonov’s theorem (4.2.2) it can be seen that the original system will converge to a neigborhood biased away of the Nash equilibrium, in proportion to the amplitude of the perturbation such that, PM P c ( ni i j=1 )Pnij − Pt + O( + maxi ai + mini wi ) (6.74) Pdi∗ = Pni − PM j=1 Pnij. 6.1.5. Simulation and Discussion. Dispatch of DGs without Dynamics Figure 6.2b shows the reaction curves RC1 and RC2 of equations (6.75) (6.76) associated to a game played by two DGs and the vector fields associated to the closed loop system P using Extremum Seeking Control as well as the trayectories superimposed when Pd < N j=1 Pnj . Figure 6.2a shows the convergence of the system to the equiP librium point. Figure 6.2d shows the reaction curves when Pd > N j=1 Pnj and the dispatched power is shown in Figure 6.2c. RC1 (p2 ) = s. Pd + Pn1 ± 2 Pn2 c2 Pd2 − 2Pn2 c2 Pd Pn1 + Pn2 c2 Pn21 − 4c1 Pn1 p22 + 4Pn2 c1 Pn1 p2 Pn2 c2. Pd + Pn1 RC2 (p1 ) = RC1 (p2 ) = ± 2 s Pn2 c2 Pd2 − 2Pn2 c2 Pd Pn1 + Pn2 c2 Pn21 − 4c1 Pn1 p21 + 4Pn2 c1 Pn1 p1 Pn1 c1. (6.75). (6.76). Figure 6.3 shows the dispatch of 3 DGs for a low demand, which changes with in time. In this situation when the demanded power is low, the optimal solution leads to negative powers dispatched. In the present scheme the players (LC) who generate the dispatch of negative powers (DG1 for 6.3) are truncated to zero and forced to abandon the game. When this truncation occurs, the game is automatically reorganized in terms of the remaining players who maintain feasible actions. Once the actions of the truncated players (LCs) tend to be positive again, the truncated players are automatically included.