• No se han encontrado resultados

moderno cada vez más complejo e interdependiente, y un genuino

In document DIÁLOGO INTERRELIGIOSO (página 73-78)

Finally, we address the task of computing the minimal probabilities for ψ[r]. The

MDP Mlo

min is fairly the same as Mlomax except that we assign reward 1 to the self-loop in the goal state and that we deal with the minimal probability for Acc in M in the definition of the transition probabilities for the τ-transitions from the B-states to the goal state.

Formally, Mlo

min = (rS, Act Y tτ, ιu, Pminlo ,rew), where rS as well as the action set andĂ the enabled actions are as for Mlo

max. Likewise, Pminlo is as for Pmaxlo , except for the probabilities in the B-states, i.e., for s P B,

Pminlo (s, τ,goal) = PrminM,s Acc

Pminlo (s, τ,fail) = 1 ´ Pminlo (s, τ,goal) Pminlo (s, ι, sc) = 1

The reward structure for Mlo

min is given by: Ă

rew(s, act) = rew(s, act) for s P SzB Ă

rew(sc,act) = rew(s, act) for s P B Ă

rew(s, τ) = 0 for s P B Y tfailu Ă

rew(s, ι) = 0 for s P B

Ă

rew(s, τ) = 1 for s = goal

As for Mlo

max in the previous transformation, we can transform between paths in M and Mlo

min.

Intuitively, Mlo

min can simulate M-paths satisfying ψ[r] = (A Uěr B) ^Acc by paths satisfying (A1 UěrC), and vice versa, where A1 consists of the A-states, the B-states, those Bc-states bc where b P A and the goal state. The right hand side C of the until consists of all Bc-states and the goal state. In contrast to Mlomax, the goal state is included on the left side of the until to ensure (in conjunction with the self-loop on goal with reward 1) that every finite pathρrthat satisfies (A

1 U goal) will be extended to an infinite path that satisfies (A1 Uěr goal), regardless of the concrete value of r. Without the positive reward loop for the goal state, a minimising scheduler could ensure by scheduling τ that every path fragment ρrconsisting of A

1-states, ending in a B-state b and having reward rew(Ă ρ) ă rr is extended in such a way that (A

1 Uěr goal) (and thus also (A1 Uěr C)) is never satisfied. With the presented construction of Mlo

min, a

minimising scheduler has the following choice for continuing these paths: If it chooses τ, (A1 Uěr C) will be satisfied with Prmin

M,b(Acc). Essentially, choosing τ focuses on minimising the probability of ψ[r] = (A Uěr B) ^Acc by minimising the probability only of Acc, ignoring the possibility of minimising the probability for (A UěrB). The choice ι in this situation postpones the minimisation of the probability of Acc to a later moment. For a path fragment consisting of A1-states, ending in a B-state b and having rewardrew(rĂ ρ) ě r that has not yet satisfied (A

1 Uěr C), the choice of τ becomes attractive in all cases, as choosing ι will surely lead to satisfaction of (A1 Uěr C), as a Bc-state is reached in the next step.

Lemma 3.6.3. For all states s of M and all r P N:

Prmin

M,s (A Uěr B) ^Acc = PrminMlomin,s A

1 UěrC

where A1 = A Y B Y tb

c P Bc : b P Au Y tgoalu and C = BcY tgoalu.

Proof. As before, we will provide scheduler transformations in both directions.

Part 1. We first show that the minimal probability for (A Uěr B) ^Acc in M is greater or equal than the minimal probability for A1 Uěr C in Mlo

min. For this, we consider an arbitrary scheduler S for M. We define a scheduler T for Mlo

min as in Part 1 of the proof of Lemma 3.6.2, i.e., which first simulates S and switches to schedule τ continuously as soon as a finite path

r

ρ =sr0act0sr1act1 . . . rsn wherers0, . . . ,rsn´1 P A

1,

r

snP B and rew(rĂ ρ) ě r, has been generated.

With the above definition of T, there is no finite T-path of the form:

r

s0act0sr1act1 . . . srn wherers0, . . . ,rsn´1 P A

1,

r

snP Bc and rew(rĂ ρ) ě r

Note that for such paths we would have rsn´1 P B and actn´1= ι, which conflicts with T(rs0act0rs1act1 . . .actn´2rsn´1) = τ. This yields:

PrT Mlomin,s A 1 Uěr goal = PrT Mlomin,s A 1 Uěr (goal _ B c) =PrTMlo min,s A 1 Uěr C With a calculation as in the proof of Lemma 3.6.2 we get:

PrS M,s (A UěrB) ^Acc ě PrTMlomin,s A 1 Uěrgoal Hence, we get: PrS M,s (A Uěr B) ^Acc ě PrTMlomin,s A 1 UěrC As a consequence we obtain: Prmin

M,s (A Uěr B) ^Acc ě PrminMlomin,s A

1 UěrC

Part 2. To prove that the minimal probability for A1 Uěr C in Mlo

min is greater or equal than the minimal probability for (A UěrB) ^Acc in M, we pick an arbitrary scheduler T for Mlo

min. Let Smin be a scheduler for M that minimises the probabilities for Acc from all states. We now construct a scheduler S for M as follows. In its initial mode, S mimics T, provided that T does not select action τ. As soon as T schedules action τ, scheduler S switches its mode and simulates Smin from then on. The goal is now to show that:

PrS

M,s (A Uěr B) ^Acc ď Pr

T

Mlomin,s A

3.6 Quantiles under side conditions

The set of T-paths in Mlo

min that satisfy A1 Uěr C can be partitioned into two sets, those that satisfy A1 Uěr B

c and those that satisfy A1 Uěrgoal. As it can be the case that a path satisfies both A1 UěrB

c and A1 Uěrgoal, i.e., with a prefix that satisfies A1 UěrB

c before the τ-action is scheduled at a later point, we partition according to which of the two path formulas is satisfied first. Formally, letFP[AĂ 1 Uěr Bc] denote the set of finite T-paths of the form

r

ρ =rs0act0rs1act1 . . .srn where rsnP Bc,rs0, . . . ,rsn´1 P A

1 and rew(ρ) ě r

such that no proper prefix of ρrbelongs toFP[AĂ

1 Uěr B

c], i.e., if m ă n and sm P Bc then rew(rs0act0rs1act1 . . .actm´1rsm) ă r. LetFPĂτ[A

1 U B] be the set of T-paths r ρ in Mlomin of the form

r

ρ =rs0act0rs1act1 . . .actn´1srn with rs0, . . . ,rsn´1 P A

1,

r sn P B,

with T(rρ) = τ, i.e., up to the B-state where T schedules τ for the first time and which do not have a prefix in FP[AĂ 1 Uěr Bc]. Obviously, no path ρ P Ăr FPτ[A1 U B] is a proper prefix of some other path in FPĂτ[A1 U B] and all infinite T-paths that satisfy A1 Uěrgoal have a prefix in

Ă

FPτ[A1 U B]. We writeFPĂτ[A1 U t]for t P B to denote the set of paths in FPĂτ[A1 U t] ending in t, with an equivalent notation for FP[AĂ 1 Uěr t]. Then: PrT Mlomin,s (A 1 Uěr C) =PrT Mlomin,s FP[AĂ 1 Uěr B c]  +ÿ tPB ÿ r ρPĄFPτ[A1U t] Pr(rρ) ¨ Pminlo (t, τ,goal)

Let FP[A Uěr B] be the set of finite M-paths ρ with r

ρ|M = ρ and ρ P Ăr FP[A

1 UěrB

c]

and let FPτ[A U B] be the set of finite M-paths ρ withρ|rM = ρandρ P Ăr FPτ[A

1 U B]. No path in FP[A Uěr B] has a prefix in FP

τ[A U B] and vice-versa. Additionally, all paths in FP[A Uěr B] or FP

τ[A U B] are S-paths.

In particular, all S-paths that satisfy (A Uěr B) ^Acc have a prefix in either

FP[A Uěr B] or FP

τ[A U B]. However, not all infinite S-paths that have a prefix in FP[A Uěr B] satisfy (A Uěr B) ^Acc, as Acc is not guaranteed to be satisfied. Likewise, not all infinite S-paths that satisfy Acc with a prefix in FPτ[A U B] satisfy

(A Uěr B) ^Acc, as it is not guaranteed that (A Uěr B) holds. Thus, we have PrS

M,s (A UěrB) ^Acc = PrSM,s FP[A Uěr B] ^Acc

+PrSM,s FPτ[A U B] ^ (A Uěr B) ^Acc ďPrSM,s FP[A Uěr B]  +PrSM,s FPτ[A U B] ^Acc =PrSM,s FP[A Uěr B] +ÿ tPB ÿ ρPFPτ[AU t] Pr(ρ) ¨ PrS M,t(Acc) =PrSM,s FP[A Uěr B] +ÿ tPB ÿ ρPFPτ[AU t] Pr(ρ) ¨ PrSmin M,t (Acc) =PrTMlo min,s FP[AĂ 1 Uěr B c]  +ÿ tPB ÿ r ρPĄFPτ[A1U t] Pr(rρ) ¨ Pminlo (t, τ,goal) =PrTMlo min,s (A 1 Uěr C)

Thus, for every scheduler T in Mlo

min we can construct a scheduler S in M with PrS

M,s (A Uěr B) ^Acc ď PrTMlomin,s A

1 UěrC

Hence, we get:

Prmin

M,s (A Uěr B) ^Acc ď PrminMlomin,s A

1 UěrC

This completes the proof of Lemma 3.6.3.

As a consequence of Lemma 3.6.2 and Lemma 3.6.3 the transformations M Mlo

minand

M Mlomax permit to apply the methods presented in Section 3.4 for the computation of quantiles for lower reward-bounded until properties under side conditions:

quM,s DPDp((A Uě?B) ^Acc) = quMlomax,s DPDp(A1 Uě? goal) quM,s @PDp((A Uě?B) ^Acc) = quMlomin,s @PDp(A1 Uě? C



where A1 and C are defined as in Lemma 3.6.2 and Lemma 3.6.3, respectively. Please note that the definition of A1 slightly differs in both cases.

In document DIÁLOGO INTERRELIGIOSO (página 73-78)