• No se han encontrado resultados

8. RESULTADOS Y DISCUSIÓN

8.4. Expresión de genes relacionados con el metabolismo de lípidos en el tejido adiposo

8.4.3. Efecto del periodo sobre la expresión de genes

Conventions emerge as a result of agents in a population selecting the same action and learning the best strategy (action choice) over time. We assume that a population consists of a set of agents, Ag ={1, ..., N}, who select from a number of actions, Σ = {σ1, σ2, ..., σn}. Each timestep each agent selects an interaction partner at random, and both partners choose an action from Σ. The individual payoff for each agent is determined by the combination of action choices, thejoint action. We adopt the n-action coordination game, such that interaction partners receive a positive payoff if they select the same action and a negative payoff if their actions differ. The 2-action coordination game is often used in exploring convention emergence, but we expand to the n-action coordination game to avoid restricting the number of possible conventions as discussed above. We otherwise utilise the payoff matrix of Sen & Airiau [2007] such that choosing the same action gives a positive payoff (+4) and choosing differing actions results in a negative payoff (1). Sen et al. showed that these values were able to facilitate rapid convention emergence and as such are well- suited for our exploration of other factors that might effect the emergence. We

explore different values for these in Section 3.10 to examine the effect that this asymmetry might contribute.

Each agent chooses the action that it believes will result in the highest payoff based on its previous interactions. It does this by making use of a simplified version of the Q-Learning algorithm [Watkins, 1989]. For each actionσΣ an agent maintains an estimate of the payoff it expects to receive from choosing that action in the future (a “Q-value”). The agents update the relevant value after receiving a payoff for choosing an action,σ, in an interaction such that:

Q(σ) = (1α)×Q(σ) +α×payoff (3.1)

where αis a variable in the range [0,1] that controls the learning rate. For all agents we start with Q(σ) = 0 σ Σ so as not to bias any agent towards specific action.

We also assume an element of exploration, such that with probabilitypexplore agents will choose a random action from those available instead of the action they believe to be optimal. This allows agents to avoid local optima in the convention space and facilitates the emergence of global convention. If agents have multiple highest Q-values they will choose randomly between them as well. In this regard our model adopts the approach of Villatoro et al. [2009] by using this Q-Learning algorithm for both partners in an interaction to update their strategies. Airiau et al. [2014] show that populations of entirely Q-Learners emerge conventions faster than the related strategy of “Win or Learn Fast” policy hill-climbing (WoLF-PHC) or of mixed learners and so we adopt this learning method globally.

We assume that agents are situated on a topology that restricts their interac- tions such that agents can only interact with their neighbours (and hence select randomly from amongst these). The particular topologies used are discussed in each relevant section.

every timestep on what action they would choose if they were not exploring. The strategy they respond with will thus be the one with the highest Q-value for them or chosen randomly from amongst equal Q-values.

3.3.1

Intervention Agents

As discussed, fixed strategy agents, which we refer to as Intervention Agents

(IAs), have been shown to influence convention emergence when introduced at

the beginning of a simulation. Building on the work of Franks et al. [2013] and Griffiths & Anand [2012] we propose inserting these IAs at locations within the topology to affect convention emergence.

We generally seek to place these IAs at topologically influential locations as determined by a number of graph metrics. This has been shown to increase their efficacy with placement at both high-degree locations [Franks et al., 2013] and high-betweenness–centrality locations [Griffiths & Anand, 2012] performing better than random placement. Franks et al. [2014] additionally show that placement by eigenvector centrality (EC), highest edge embeddedness (HEE) and hyperlink-induced topic search (HITS) increase efficacy but only consider the case of a single IA being positioned by these metrics.

As such, we generally utilise the following 4 metrics to place IAs in this chapter: degree, eigenvector centrality, highest edge embeddedness and HITS. We also consider random placement as a baseline where appropriate. These metrics are discussed in detail in Section 2.7 and have been shown to good indicators of agent influence [Franks et al., 2014].

We choose to exclude both betweenness centrality (BC) and closeness cen- trality (CC) for two main reasons: (i) Franks et al. [2014] and Griffiths & Anand [2012] have shown that they offer little if any improvement over the met- rics chosen and (ii) they are substantially more computationally expensive than the other metrics, being ill-suited for larger topologies or topologies where the metrics must be recalculated often, as is the case for dynamic topologies. This view has been previously raised in the literature. Kang et al. [2011] argue that

these metrics were defined when large-scale networks (such as social networks or the Internet) were uncommon and that they are inherently inappropriate for use on large graphs as they do not consider scalability and are not amenable to parallelisation. Pfeffer & Carley [2012] agrees, stating that both betweenness centrality and closeness centrality are limited in applicability. They introduce approximations based on bounded-distance shortest path calculations but these still represent computational complexity that makes them infeasible for our use. Additionally, both Lawyer [2015] and ˇSiki´c et al. [2013] argue that these central- ity measures are only good indicators of influence for generally central nodes in the network and severely underestimate the influence of more peripheral nodes. Due to these limitations we exclude both centrality measures from our investi- gations.

The IAs will always choose to play their assigned strategy and have no ability to explore or deviate from this. They do however, continue to learn, via the same Q-Learning mechanism as all other agents, whilst being used as IAs. As they are unable to explore, they will only learn the value of their assigned strategy but this allows them to establish how well that strategy is performing as a convention. As mentioned in Section 2.3, we query these agents as we would any other, meaning that what convention membership they hold may not be that of the fixed strategy assigned to them if their assigned strategy is not performing well.

This interaction model is the general one used throughout this chapter and indeed throughout this thesis. Any deviations from it will be described in the relevant section. It is our view that this model is well-understood from previous work in the literature and will be particularly applicable to the exploration of destabilisation as it facilitates rapid and robust convention emergence and thus our experimentation can instead focus on the other aspects that affect destabilisation.

Documento similar