moderno cada vez más complejo e interdependiente, y un genuino

Finally, we address the task of computing the minimal probabilities for ψ[r]. The

MDP Mlo

min is fairly the same as Mlomax except that we assign reward 1 to the self-loop in the goal state and that we deal with the minimal probability for Acc in M in the definition of the transition probabilities for the τ-transitions from the B-states to the goal state.

Formally, Mlo

min = (rS, Act Y tτ, ιu, Pminlo ,rew), where rS as well as the action set andĂ the enabled actions are as for Mlo

max. Likewise, Pminlo is as for Pmaxlo , except for the probabilities in the B-states, i.e., for s P B,

P_minlo (s, τ,goal) = Prmin_M,s Acc

P_minlo (s, τ,fail) = 1 ´ P_minlo (s, τ,goal) P_minlo (s, ι, sc) = 1

The reward structure for Mlo

min is given by: Ă

rew(s, act) = rew(s, act) for s P SzB Ă

rew(sc,act) = rew(s, act) for s P B Ă

rew(s, τ) = 0 for s P B Y tfailu Ă

rew(s, ι) = 0 for s P B

rew(s, τ) = 1 for s = goal

As for Mlo

max in the previous transformation, we can transform between paths in M and Mlo

min.

Intuitively, Mlo

min can simulate M-paths satisfying ψ[r] = (A Uěr B) ^Acc by paths satisfying (A1 _Uěr_C), and vice versa, where A1 consists of the A-states, the B-states, those Bc-states bc where b P A and the goal state. The right hand side C of the until consists of all Bc-states and the goal state. In contrast to Mlomax, the goal state is included on the left side of the until to ensure (in conjunction with the self-loop on goal with reward 1) that every finite pathρrthat satisfies (A

1 _U goal) will be extended to an infinite path that satisfies (A1 _Uěr goal), regardless of the concrete value of r. Without the positive reward loop for the goal state, a minimising scheduler could ensure by scheduling τ that every path fragment ρrconsisting of A

1-states, ending in a B-state b and having reward rew(Ă ρ) ă rr is extended in such a way that (A

1 _Uěr goal) (and thus also (A1 _Uěr _C)) is never satisfied. With the presented construction of Mlo

min, a

minimising scheduler has the following choice for continuing these paths: If it chooses τ, (A1 _Uěr _C) will be satisfied with Prmin

M,b(Acc). Essentially, choosing τ focuses on minimising the probability of ψ[r] = (A Uěr _{B) ^}Acc by minimising the probability only of Acc, ignoring the possibility of minimising the probability for (A Uěr_B). The choice ι in this situation postpones the minimisation of the probability of Acc to a later moment. For a path fragment consisting of A1-states, ending in a B-state b and having rewardrew(rĂ ρ) ě r that has not yet satisfied (A

1 _Uěr _C), the choice of τ becomes attractive in all cases, as choosing ι will surely lead to satisfaction of (A1 _Uěr C), as a Bc-state is reached in the next step.

Lemma 3.6.3. For all states s of M and all r P N:

Prmin

M,s (A Uěr B) ^Acc = PrminMlo_min,s A

1 _Uěr_C

where A1 _{= A Y B Y tb}

c P Bc : b P Au Y tgoalu and C = BcY tgoalu.

Proof. As before, we will provide scheduler transformations in both directions.

Part 1. We first show that the minimal probability for (A Uěr _{B) ^}Acc in M is greater or equal than the minimal probability for A1 _Uěr _C in Mlo

min. For this, we consider an arbitrary scheduler S for M. We define a scheduler T for Mlo

min as in Part 1 of the proof of Lemma 3.6.2, i.e., which first simulates S and switches to schedule τ continuously as soon as a finite path

ρ =s_r0act0sr1act1 . . . rsn wherers0, . . . ,rsn´1 P A

1_,

snP B and rew(rĂ ρ) ě r, has been generated.

With the above definition of T, there is no finite T-path of the form:

s0act0sr1act1 . . . srn wherers0, . . . ,rsn´1 P A

1_,

snP Bc and rew(rĂ ρ) ě r

Note that for such paths we would have rsn´1 P B and actn´1= ι, which conflicts with T(_rs0act0rs1act1 . . .actn´2rsn´1) = τ. This yields:

PrT Mlo_min,s A 1 _Uěr goal = PrT Mlo_min,s A 1 _Uěr (goal _ B c) =PrT_Mlo min,s A 1 _Uěr C With a calculation as in the proof of Lemma 3.6.2 we get:

PrS M,s (A UěrB) ^Acc ě PrTMlo_min,s A 1 _Uěrgoal Hence, we get: PrS M,s (A Uěr B) ^Acc ě PrTMlo_min,s A 1 _Uěr_C As a consequence we obtain: Prmin

M,s (A Uěr B) ^Acc ě PrminMlo_min,s A

1 _Uěr_C

Part 2. To prove that the minimal probability for A1 _Uěr _C in Mlo

min is greater or equal than the minimal probability for (A Uěr_{B) ^}Acc in M, we pick an arbitrary scheduler T for Mlo

min. Let Smin be a scheduler for M that minimises the probabilities for Acc from all states. We now construct a scheduler S for M as follows. In its initial mode, S mimics T, provided that T does not select action τ. As soon as T schedules action τ, scheduler S switches its mode and simulates Smin from then on. The goal is now to show that:

PrS

M,s (A Uěr B) ^Acc ď Pr

Mlo_min,s A

3.6 Quantiles under side conditions

The set of T-paths in Mlo

min that satisfy A1 Uěr C can be partitioned into two sets, those that satisfy A1 _Uěr _B

c and those that satisfy A1 Uěrgoal. As it can be the case that a path satisfies both A1 _Uěr_B

c and A1 Uěrgoal, i.e., with a prefix that satisfies A1 _Uěr_B

c before the τ-action is scheduled at a later point, we partition according to which of the two path formulas is satisfied first. Formally, letFP[AĂ 1 Uěr B_c] denote the set of finite T-paths of the form

ρ =_rs0act0rs1act1 . . .srn where rsnP Bc,rs0, . . . ,rsn´1 P A

1 and rew(ρ) ě r

such that no proper prefix of ρrbelongs toFP[AĂ

1 _Uěr _B

c], i.e., if m ă n and sm P Bc then rew(rs0act0rs1act1 . . .actm´1rsm) ă r. LetFPĂτ[A

1 _{U B]} be the set of T-paths r ρ in Mlo_min of the form

ρ =rs0act0rs1act1 . . .actn´1srn with rs0, . . . ,rsn´1 P A

1_,

r sn P B,

with T(rρ) = τ, i.e., up to the B-state where T schedules τ for the first time and which do not have a prefix in FP[AĂ 1 Uěr B_c]. Obviously, no path ρ P Ă_r FP_τ[A1 U B] is a proper prefix of some other path in FPĂ_τ[A1 U B] and all infinite T-paths that satisfy A1 _Uěrgoal have a prefix in

FPτ[A1 U B]. We writeFPĂ_τ[A1 U t]for t P B to denote the set of paths in FPĂ_τ[A1 U t] ending in t, with an equivalent notation for FP[AĂ 1 Uěr t]. Then: PrT Mlo_min,s (A 1 _Uěr _{C) =}PrT Mlo_min,s FP[AĂ 1 _Uěr _B c] +ÿ tPB ÿ r ρPĄFPτ[A1U t] Pr(rρ) ¨ P_minlo (t, τ,goal)

Let FP[A Uěr _B] be the set of finite M-paths ρ with r

ρ|M = ρ and ρ P Ăr FP[A

1 _Uěr_B

and let FPτ[A U B] be the set of finite M-paths ρ withρ|rM = ρandρ P Ăr FPτ[A

1 U B]. No path in FP[A Uěr _B] has a prefix in FP

τ[A U B] and vice-versa. Additionally, all paths in FP[A Uěr _B] or FP

τ[A U B] are S-paths.

In particular, all S-paths that satisfy (A Uěr _{B) ^}Acc have a prefix in either

FP[A Uěr _B] or FP

τ[A U B]. However, not all infinite S-paths that have a prefix in FP[A Uěr _B] satisfy (A Uěr _{B) ^}Acc, as Acc is not guaranteed to be satisfied. Likewise, not all infinite S-paths that satisfy Acc with a prefix in FPτ[A U B] satisfy

(A Uěr _{B) ^}Acc, as it is not guaranteed that (A Uěr _B) holds. Thus, we have PrS

M,s (A UěrB) ^Acc = PrSM,s FP[A Uěr B] ^Acc

+PrS_M,s FPτ[A U B] ^ (A Uěr B) ^Acc ďPrSM,s FP[A Uěr B] +PrS_M,s FPτ[A U B] ^Acc =PrS_M,s FP[A Uěr _B] +ÿ tPB ÿ ρPFPτ[AU t] Pr(ρ) ¨ PrS M,t(Acc) =PrS_M,s FP[A Uěr _B] +ÿ tPB ÿ ρPFPτ[AU t] Pr(ρ) ¨ PrSmin M,t (Acc) =PrT_Mlo min,s FP[AĂ 1 _Uěr _B c] +ÿ tPB ÿ r ρPĄFPτ[A1U t] Pr(rρ) ¨ P_minlo (t, τ,goal) =PrT_Mlo min,s (A 1 _Uěr _C)

Thus, for every scheduler T in Mlo

min we can construct a scheduler S in M with PrS

M,s (A Uěr B) ^Acc ď PrTMlo_min,s A

1 _Uěr_C

Hence, we get:

Prmin

M,s (A Uěr B) ^Acc ď PrminMlo_min,s A

1 _Uěr_C

This completes the proof of Lemma 3.6.3.

As a consequence of Lemma 3.6.2 and Lemma 3.6.3 the transformations M Mlo

minand

M Mlomax permit to apply the methods presented in Section 3.4 for the computation of quantiles for lower reward-bounded until properties under side conditions:

quM,s DPDp((A Uě?B) ^Acc) = quMlo_max,s DPDp(A1 Uě? goal) quM,s @PDp((A Uě?B) ^Acc) = quMlo_min,s @PDp(A1 Uě? C

where A1 and C are defined as in Lemma 3.6.2 and Lemma 3.6.3, respectively. Please note that the definition of A1 slightly differs in both cases.

In document DIÁLOGO INTERRELIGIOSO (página 73-78)