Sistema de activida Sistema de actividades
Anexo 3 Ejercicio aplicado en clase.
We have proved that games on graphs of out-degree one are determined. We will now use this result to prove determinacy in the general case.
When both players fix their positional state strategies, the game becomes a game on a graph of out degree one. We already know that we can define the solutions of average-price-per-reward optimality equations for this class of games. We also know that we can compare the values of gain and bias for different states. This will allow us to use a strategy improvement technique to find the optimal pair of state strategies. Subsequent strategies will be strictly better than previous ones (gain and bias will be used to compare strategies). A pair of strategies that can not be improved yields gain and bias that satisfy the average-price-per-reward optimality equations. Such a pair of strategies must exist because the sets of state strategies are finite, In the following we formalise this intuition.
We fix some finite average-price-per-reward game, Γ for the remainder of this section. Let µS and χS be state strategies for players Min and Max respectively,
and let (G, B) be gain and bias functions such that (G, B)|= OptAvgPR(ΓµSχS).
Remark 4.18. The solution to the average-price-per-reward optimality equations
OptAvgPR(ΓµSχS) is not unique. However, in the remainder of this section, for tech-
nical convenience (proof of Theorem 4.19), by a solution we will mean a particular solution, i.e., a solution constructed as in the proof of Theorem 4.17. This solution has the following property, for every sunS, the bias of the vertexmin(S) is equal to zero.
Givens∈SMin and e= (s, s0)∈E\EµSχS, we say that eis an improvement ofµS, with respect toχS, if:
2. G(s) =G(s0) andB(s)>Val(ae(G(s)) +B(s0).
A strategyµ0Sis an improvement ofµS with respect toχS if for every states, either
µS(s) =µ0S(s), orµ0S(s) =s0and (s, s0) is an improvement ofµSwith respect toχS.
An improvement is strict ifµS6=µ0S. An improvement ofχS is defined similarly.
We say that χS, a state strategy for player Max, is a best response to µS,
a state strategy of player Min, if there are no possible improvements of χS with
respect toµS.
To prove the existence of mutual best response strategies we apply Theorem 4.19 and the fact that the set of edge strategies is finite, to average-price-per-reward games, in which all the states belong to only one player.
Theorem 4.19. Let µSbe a state strategy of player Min,χS a best response strategy of player Max, and (G, B) gain and bias such that (G, B) |= OptAvgPR(ΓµSχS).
If µ0S is an improvement of µS with respect to χS, χ0S is a best response to µ0S, and
(G0, B0)|= OptAvgPR(Γµ0Sχ0S), then the following holds:
1. G(s)>G0(s), for alls∈S, and G(s)> G0(s) for some s∈S, or 2. G(s) =G0(s) and B(s)>B0(s), for alls∈S.
Moreover, ifµS 6=µ0S then(G, B)6= (G0, B0).
Proof. Consider the game graph Γµ0Sχ0S. For every edgee= (s, s0), either i) G(s)>
G(s0), or ii)G(s) =G(s0) and B(s)>Val(ae(G(s))) +B(s0).
We start by proving point 1. Observe that, for every edge (s, s0) in Γµ0Sχ0S, we have G(s) > G(s0). This implies that for every edge (s, s0), if G(s) > G(s0),
then s is a ray state. This observation allows us to use the same argument as in
Lemma 4.12 to prove thatG(s)>G0(s), for every state s. In particular, if an edge
(s, s0), in Γµ0Sχ0S, is such thatG(s)> G(s0), orB(s)>Val(ae(G(s))) +B(s0) and s is a rim state, thenG(s)> G0(s).
It remains to prove point (2). We know thatG(s) =G0(s) for alls∈S. Let
sbe a vertex, and letS be a sun in Γµ0Sχ0S such that,s∈S. Ifs0, . . . , skis the path fromsto min(S) then, for every (si, si+1),
B(si)>Val a(si,si+1)(G(s))
+B(si+1).
From the proof of point (1), we can assume that the sun S existed in the graph
ΓµSχS, and hence B0(min(S)) =B(min(S)) = 0. Hence, if we sum up and simplify, thekinequalities, we get:
B(s0)>Σki=0−1Val(a(si,si+1)(G(s)) +B(sk) =B
0(s
0),
as sk = min(S). To complete the proof, notice that if µS =6 µ0S, then B(s) >
Val(a(si,si+1)(G(s))) +B(si+1), for somei, and henceB(s)> B
0(s).
Corollary 4.20. A solution to optimality equations for finite average-price-per- reward games exist.
Proof. The set of edge strategies for both players is finite. This, together with
Theorem 4.19, guarantees the existence of mutual best response edge strategies. The rest follows from Theorem 4.17.
Corollary 4.21. In finite definable price-per-reward games, ε-optimal strategies are computable, provided that, under the optimal state strategies, every edge game admits rational ε-optimal moves.
Proof. Corollary 4.20 ensures that the solutions to the average-price-per-reward
games exist, and by Remark 4.8 this solution is definable. From this, by Corol- lary 4.11, we obtain definability of ε-optimal strategies. The existence of rational ε-optimal moves allows us to apply Proposition 2.7, to obtain the postulated com-
Theorem 4.22. Finite definable average-price-per-reward games are decidable. Proof. The set of equations OptAvgPR(Γ) is finite, thus the set of solutions is a set of
finite-dimensional vectors over the reals. Remark 4.8 ensures that (G, B), such that