work Formation
One conclusion of the literature review is that actual human behaviour in the experiments differs often from the equilibrium condition. This section describes the RL based model of network formation and asks how the out- come differs from the theoretic predictions.
A dynamic version of the connections model is considered, similar to Watts (2001) and Jackson and Watts(2002), but adapted to a setting with RL agents. As a benchmark, the original analysis ofJackson and Wolinsky (1996) for the static, and Watts(2001) for the dynamic model can be used.
The game proceeds as follows:
– Two agents are picked randomly.
– Both agents decide whether to offer a link or not.
– If both agents offer a link, the connection is added, otherwise not. – The new network is computed.
– The two agents who acted receive their rewards, calculated with equa- tion 4.1.
If a link was formed, it exists as long as the two agents do not meet again. When they meet another time, the link is maintained if both agents offer a link again, otherwise it is severed.
Learning In the reviewed network games literature bounded rationality was described as an injection of ‘irrationality’, for example, as error term
ϵ as in Jackson and Watts (2002) or Bala and Goyal (2000), or a limited memory as in Beal and Querou (2007).
In the model presented here, RL can be seen as a form of limited ratio- nality. Agents start with no information at all and learn by trial and error about the game and the application of the appropriate actions. Players know only the name of the other players and may choose from the action set A ={a0. . . ai. . . an} given by {offer link, not-offer link}.
Using the concepts of BRA introduced in chapter2, the internal choice model for agent i is given by rk
u,v : Cu,vk,k̸=i → {offer,not-offer}. There are
k −1 mappings and the initial conditions contain only one attribute with one value (player-name=k), so no further expansion is possible. BRA thus reduces to disjoint sets of simple RL rules. For each rk agent i updates the action strengths, that is, ∀rk, using
q(aj(t, k)) =q(aj(t−1, k)) +γ(ui(g, t)−q(aj(t−1, k)))
Using the exponential selection rule in equation 2.10, agentichooses at the next encounter with agent j his action.
Parameter settings The model has four parameters of interest, α and
γ, cost c and value δ. As in the original JW model, wij is set to 1, andwii to 0. Agents are homogenous; cost and value are the same for all players.
In the simulations, the parameters c and α are varied. c can be seen as the structural parameter influencing the opportunities for the players; α
determines the rate of exploration. The greater α, the more likely explo- ration in the action selection process and the selection propensities for both actions become more similar; the smaller α, the faster the agents stick to
a reasonably good solution. The central question for the adaptive network model is whether it is possible to generate stable and efficient solutions, and how the properties of the learning rule have to be for this. The influence of randomness on the outcome has led to the choice of stochastic stability as the benchmark stability definition for the RL model.
The discount parameter γ is only of minor importance for the analysis.
γ sets the rate at which the reward is updated. The smaller this weight, the faster the experienced reward approximates the true reward. Experiments with various γ values were used to select the best model for a more detailed analysis of α. A short overview of different γ settings is given in section 4.6.4.
The value ofδ, 0< δ <1 is fixed at a value larger >0. Since there are no requirements or other substantial reasons for a particular value except that decay exists, it has been set to 0.5. For each cost range, the values for care drawn randomly in order to obtain some samples within each cost range. α is incremented by 0.01 from 0.01 to 1.
Table 4.1 shows the parameters in summary.
cost range α δ γ
c < 0.25 (low cost range) 0.01 . . . 1 0.5 0.1, 0.25, 0.75, 1 0.25< c≤0.5 (medium cost range) 0.01 . . . 1 0.5 0.1, 0.25, 0.75, 1
c >0.5 (high cost range) 0.01 . . . 1 0.5 0.1, 0.25, 0.75, 1 Table 4.1: RL network model parameter settings
Measurements for the simulations Networks and network formation can be described in a variety of ways. In section4.2.1the measuresD(den- sity) andL(average path length) were already introduced. Three additional measures are defined here:
A stability measure is computed to assess how robust the solutions are. It might occur that a simulation result comes very close to the theoretic equilibrium in settings where agents explore enough and discover better solutions. Since exploration comes at the cost of more random decisions in the process, the whole system can become unstable.
Definition 15. Stability. St = 1− 12
n(g,t−1)−n(g,t−1)
(n(n−1))
Stability is simply the difference in the number of links between two time steps, divided by the number of maximum possible links to standardise the measure. For a single simulation step, the value can be either 0 or 1. Over a sample of simulations,St can be interpreted as the probability that a link changes at t. It thus varies between 0 and 1 and the closer it is to 0 the more stable the network is.
To compare the results with the game-theoretic prediction, a fitness measure is defined as follows:
Definition 16. Fitness / Efficiency. Let the vector gstable be the stochas-
tic stable network (efficient network), and gactual a simulated network. Let
stepsmax be the maximum number of modifications starting from any net-
work to gstable, and stepsactual the number of modifications to reach gstable
fromgactual. Define the fitness at timetas: f itt= 12(stepsstepsactual,tmax +stepsstepsactual,tmax St)
The resulting measure varies between 0 and 1 and tends towards 1 the closer the network structure to the stochastic stable network, and the more stable the simulation result (multiplying the distance withSt and adding it in the enumerator has the effect that stable states are weighted higher as
To determine the stochastic stable network, the procedure in Jackson and Watts (2002) has been implemented as a computer program. The program computes the set of all possible networks, and finds out the pairwise stable network with the minimal resistance from all other networks in the set.