Ejercicio aplicado en clase. - Sistema de activida Sistema de actividades

Sistema de activida Sistema de actividades

Anexo 3 Ejercicio aplicado en clase.

We have proved that games on graphs of out-degree one are determined. We will now use this result to prove determinacy in the general case.

When both players fix their positional state strategies, the game becomes a game on a graph of out degree one. We already know that we can define the solutions of average-price-per-reward optimality equations for this class of games. We also know that we can compare the values of gain and bias for different states. This will allow us to use a strategy improvement technique to find the optimal pair of state strategies. Subsequent strategies will be strictly better than previous ones (gain and bias will be used to compare strategies). A pair of strategies that can not be improved yields gain and bias that satisfy the average-price-per-reward optimality equations. Such a pair of strategies must exist because the sets of state strategies are finite, In the following we formalise this intuition.

We fix some finite average-price-per-reward game, Γ for the remainder of this section. Let µS and χS be state strategies for players Min and Max respectively,

and let (G, B) be gain and bias functions such that (G, B)|= Opt_AvgPR(Γ_µS_χS).

Remark 4.18. The solution to the average-price-per-reward optimality equations

Opt_AvgPR(ΓµS_χS) is not unique. However, in the remainder of this section, for tech-

nical convenience (proof of Theorem 4.19), by a solution we will mean a particular solution, i.e., a solution constructed as in the proof of Theorem 4.17. This solution has the following property, for every sunS, the bias of the vertexmin(S) is equal to zero.

Givens∈SMin and e= (s, s0)∈E\E_µS_χS, we say that eis an improvement ofµS, with respect toχS, if:

2. G(s) =G(s0) andB(s)>Val(ae(G(s)) +B(s0).

A strategyµ0S_{is an improvement of}_µS _{with respect to}_χS _{if for every state}_s_{, either}

µS(s) =µ0S(s), orµ0S(s) =s0and (s, s0) is an improvement ofµSwith respect toχS.

An improvement is strict ifµS6=µ0S. An improvement ofχS is defined similarly.

We say that χS, a state strategy for player Max, is a best response to µS,

a state strategy of player Min, if there are no possible improvements of χS with

respect toµS.

To prove the existence of mutual best response strategies we apply Theorem 4.19 and the fact that the set of edge strategies is finite, to average-price-per-reward games, in which all the states belong to only one player.

Theorem 4.19. Let µSbe a state strategy of player Min,χS a best response strategy of player Max, and (G, B) gain and bias such that (G, B) |= Opt_AvgPR(Γ_µS_χS).

If µ0S is an improvement of µS with respect to χS, χ0S is a best response to µ0S, and

(G0, B0)|= Opt_AvgPR(Γ_µ0S_χ0S), then the following holds:

1. G(s)>G0(s), for alls∈S, and G(s)> G0(s) for some s∈S, or 2. G(s) =G0(s) and B(s)>B0(s), for alls∈S.

Moreover, ifµS ₆₌_µ0S _then₍_{G, B}₎₆_{= (}_G0_{, B}0₎_.

Proof. Consider the game graph Γ_µ0S_χ0S. For every edgee= (s, s0), either i) G(s)>

G(s0), or ii)G(s) =G(s0) and B(s)>Val(ae(G(s))) +B(s0).

We start by proving point 1. Observe that, for every edge (s, s0) in Γ_µ0S_χ0S, we have G(s) > G(s0). This implies that for every edge (s, s0), if G(s) > G(s0),

then s is a ray state. This observation allows us to use the same argument as in

Lemma 4.12 to prove thatG(s)>G0(s), for every state s. In particular, if an edge

(s, s0), in Γ_µ0S_χ0S, is such thatG(s)> G(s0), orB(s)>Val(_a_e(G(s))) +B(s0) and s is a rim state, thenG(s)> G0(s).

It remains to prove point (2). We know thatG(s) =G0(s) for alls∈S. Let

sbe a vertex, and letS be a sun in Γ_µ0S_χ0S such that,s∈S. Ifs0, . . . , s_kis the path fromsto min(S) then, for every (si, si+1),

B(si)>Val a₍si,si+1)(G(s))

+B(si+1).

From the proof of point (1), we can assume that the sun S existed in the graph

ΓµS_χS, and hence B0(min(S)) =B(min(S)) = 0. Hence, if we sum up and simplify, thekinequalities, we get:

B(s₀)>Σk_i₌₀−1Val(a(si,si+1)(G(s)) +B(sk) =B

0₍_s

0),

as sk = min(S). To complete the proof, notice that if µS =6 µ0S, then B(s) >

Val(a₍si,si+1)(G(s))) +B(si+1), for somei, and henceB(s)> B

0₍_s_).

Corollary 4.20. A solution to optimality equations for finite average-price-per- reward games exist.

Proof. The set of edge strategies for both players is finite. This, together with

Theorem 4.19, guarantees the existence of mutual best response edge strategies. The rest follows from Theorem 4.17.

Corollary 4.21. In finite definable price-per-reward games, ε-optimal strategies are computable, provided that, under the optimal state strategies, every edge game admits rational ε-optimal moves.

Proof. Corollary 4.20 ensures that the solutions to the average-price-per-reward

games exist, and by Remark 4.8 this solution is definable. From this, by Corol- lary 4.11, we obtain definability of ε-optimal strategies. The existence of rational ε-optimal moves allows us to apply Proposition 2.7, to obtain the postulated com-

Theorem 4.22. Finite definable average-price-per-reward games are decidable. Proof. The set of equations Opt_AvgPR(Γ) is finite, thus the set of solutions is a set of

finite-dimensional vectors over the reals. Remark 4.8 ensures that (G, B), such that

In document Sistema de actividades para la memorización consciente de los ejercicios básicos de adición y sustracción límite 20 en segundo grado ” (página 88-97)