• No se han encontrado resultados

3. METODOLOGÍA

3.2 DESARROLLO DEL CUADRO DE MANDO DE TALENTO HUMANO PARA

3.2.1 ESTRATEGIA INSTITUCIONAL CLARAMENTE DEFINIDA

In this chapter we introduced the QNG filter that agents can run locally to update their beliefs and select equilibrium actions in Bayesian network games with Gaussian information and quadratic payoffs. The QNG filter provides a mechanism to update beliefs in a Bayes’ way when agents’ initial prior over the state of the world is Gaussian. We began by showing that when the prior estimates of private signals are Gaussian with means equal to a linear combination of private signals, and the equilibrium strategies of agents are linear combination of mean estimates of private signals, Bayesian updates of estimates of private signals and the underlying state

follow a sequential LMMSE estimator. This meant that the estimates remain linear combinations of private signals, and hence, Gaussian. By induction, estimates remain Gaussian for all times if equilibrium actions that are linear in mean of the estimates exist at all the stages. Further, we derived an explicit recursion for tracking of estimates of private signals and calculating equilibrium actions which we leverage to develop the QNG filter. We then extended the QNG filter to the case when the state of the world is a vector. We exemplified the QNG filter in Cournot competition game and coordination of mobile agents on 3-dimensional space. In the former the state of the world, effective profit, was a scalar, whereas in the latter the state of the world was a vector including heading and take-off angles. In both examples, the QNG filter converged to the BNE of the game in number of steps that is equal to the order of the diameter of the network. This meant that rational agents learn the sufficient statistic of the state while not necessarily learning all the individual private signals.

Chapter 3

Distributed Fictitious Play

3.1

Introduction

Based on the fictitious play algorithm, we introduce a decentralized decision-making model in unknown environments with networked interactions which we call the dis- tributed fictitious play algorithm. In fictitious play algorithms, each agent builds a model of future behavior of other agents by forming a histogram on observed actions of the past and best responds to its expected payoff [77, 78]. As per the setup in pre- vious chapters, each agent in a network receives a payoff that depends on own action, actions of others and an unknown state of the world. In a networked setting, agents have access to information via their neighbors, that is, all of the past actions is not available. Therefore, agents need to reason about the behavior of non-neighboring agents based on past observations of their neighbors only. In addition, agents have uncertainty on the state of the world and update their beliefs on the state using private or local information. Our analysis shows that the agents can do the two processes, namely, reasoning about others’ behavior and learning about the state, independently and converge to a Nash equilibrium of a potential game, a game with

identical payoffs [48].

We consider two models of belief formation on other agents’ behavior based on the type of local information exchanged. In the first model, agents share only their actions with their neighbors and assume all the other agents follow a ‘centroid’ em- pirical distribution which they estimate by keeping account of frequency of observed neighboring actions [59]. In the second information exchange model, agents share their estimate empirical distribution that they keep on all the other agents with their neighbors. Agents average their observations of their neighbors’ estimate empirical distributions to get their estimate empirical distributions in the next time step. In both models, agents take actions that maximize the expected utility at each stage. In the action sharing model, expected utility is computed assuming all the other agents independently follow the estimated ‘centroid’ empirical distribution. In the histogram sharing model, agents can keep estimate of each agent so they take ex- pectation over the joint distribution of the estimated empirical frequencies of all the agents. We analyze the convergence rate of the two models in Lemmas 3.2 and 3.5. For both models, we show that agents approach to the true empirical distribution that they estimate at a rate of O(logt/t) irrespective of the state learning and agent response rules.

The equilibrium convergence results for the two models assume that agents use a local state learning process in which agents agree asymptotically on a distribution on the state of the world at a rate faster than or equal toO(logt/t). Various decen- tralized learning models exist in the literature that achieve the desired convergence rate under different assumptions [20, 43, 79, 80]. The main convergence result for the action sharing model states that agents asymptotically reach a consensus Nash equilibrium of a symmetric potential game in which agents have identical beliefs on the state (Theorem 3.4). At a consensus Nash equilibrium strategy, all agents use

the same strategy and play optimal with respect to others’ equilibrium strategy. For the estimate empirical distributions sharing model, the process converges to a Nash equilibrium of a potential game in which agents have identical beliefs on the state of the world (Theorem 3.6).

We numerically analyze the transient and asymptotic equilibrium properties of the decentralized fictitious play in the beauty contest and the target covering games (Section 3.5). In the beauty contest game, a team of robots tradeoff between moving toward a target direction on which they receive noisy information about and moving in coordination with each other. In the target covering game, a team of robots would like to coordinate on covering a given set of targets and receive payoffs from covering a target that is inversely proportional to their positions. In both of the settings, the communication constraints among robots limit their information sources to their local neighborhood. In addition, robots have asymmetric and incomplete information on the state of the world.

The setup of this work falls under the literature of learning in games that considers dynamic processes that lead to equilibrium in games [81, 82]. Fictitious play in which all agents is assumed to observe past history of the game is one such simple update mechanism that has been shown to converge to a Nash equilibrium strategy

in zero sum [81], certain 2 × 2 [52] and identical interest (potential) games[78].

Recently, the convergence results of the fictitious play algorithm has been shown to hold for potential games in a setting where agents only make local observations [59]. Our results leverage on their results and incorporate incomplete and asymmetrical information to the considered environment which is of importance for technological settings. Our motivation stems from the fact that computational burden of Bayesian Nash equilibrium strategies on each agent, optimal decision for each selfish agent given uncertainty about others and state, is not realistic even when the computation

is possible [64]. However, the impossibility of learning ‘Bayesian equilibria’ strategies in games of incomplete information has been demonstrated in [60]. We circumvent this issue by forcing asymptotic agreement among agents’ belief on the state of the world. We use the fact that an identical interest game with common belief on the state of the world is an identical interest game with complete information with agents’ payoffs equal to the expectation over the potential function of the original game with respect to the belief over the state.

Other variations of the fictitious play algorithm [50, 51] and payoff based learning algorithms, e.g., reinforcement learning, [58] and their combinations [49] are also pertinent to the work here. The focus in these works is to either extend the scope of types of games that admit convergence to its Nash equilibrium through the dynamics proposed [51], or generate dynamics that lead to certain types of Nash equilibrium, e.g., pure (deterministic) Nash equilibrium [49], or optimal equilibrium [83].

Notation: For any finite set X, we use 4(X) to denote the space of probability

distributions over X. We use the notation −i to denote the set of players except

i, that is, −i := N \ {i}. For a generic vector x ∈ XN, x−i denotes the vector of

elements of x except theith element, that is, x−i = (x1, . . . , xi−1, xi+1, . . . , xN). We

use || · || to denote the Euclidean norm of a space.