la conformación de una civilización capaz de

We first address the task to compute probability quantiles for the path formulas ϕ[r] = (A Uďr _{B) ^}Acc and a lower probability bound p and the induced quantiles:

quM,s @PDp(ϕ[?]) =min r P N : PrminM,s ϕ[r] D p ( quM,s DPDp(ϕ[?]) =min r P N : PrmaxM,s ϕ[r] D p

(

Such quantiles for upper reward-bounded until properties under ω-regular side conditions can be computed by a reduction to quantiles for pure upper reward-bounded until properties. For this purpose, we define the transformation M MĂmapping M to a new MDPM = MĂ up_min or M = MĂ up_max arising from M by inserting two fresh trap states goal and fail both equipped with reward zero. The behaviour of the target states s P B in M will be purely probabilistically in MĂ. That is, their enabled actions in Act get discarded and new probabilistic transitions to goal or fail via fresh action τ will be added. The probability to move to goal is given by the minimal or maximal probability for satisfying Acc from s, depending on whether we compute the universal or existential quantile. Formally:

Mup_min = (rS, Act Y tτu, P_minup ,rew)_Ă Mup_max = (rS, Act Y tτu, P_maxup ,rew)Ă

where rS = S Y tgoal, failu and τ is a fresh action symbol not contained in Act. The reward function in the transformed MDP is given by rew(s, act) = rew(s, act) if s P S,Ă act P Act(s) andrew(goal, τ) =Ă rew(fail, τ) = 0. For s P S,Ă

ActMup_min(s) =ActMupmax(s) =ActM(s) if s R B ActMup_min(s) =ActMupmax(s) = tτ u if s P B

If s R B then P_minup (s,act, t) = P_maxup (s,act, t) = P (s, act, t) for all act P Act and t P S. For the states s P B:

P_minup (s, τ,goal)def=Prmin_M,s Acc P_maxup (s, τ,goal)def=Prmax_M,s Acc and for ˚ P tmin, maxu:

P_˚up(s, τ,fail) = 1 ´ P_˚up(s, τ,goal) The states goal and fail are traps, e.g., we may deal with

ActMup_˚ (goal) = ActMup_˚ (fail) = tτu and Pup

˚ (goal, τ, goal) = P˚up(fail, τ, fail) = 1. Intuitively, the constructed MDP Mup

˚ can simulate M-paths satisfying ϕ[r] =

(A Uďr _{B) ^} Acc by paths satisfying (A Y B) Uďr goal and M-paths satisfying

(A Uďr B) ^ Acc by paths satisfying (A Y B) Uďr fail, and vice versa. We use (A Y B) on the left side of the until operator in Mup˚ to allow goal to be reached even if there are B-states that are not A-states. This is safe due to the fact that A Uďr _{B ” (A Y B) U}ďr _B (see Proposition 2.0.1), as the until formula is satisfied as soon as the first B-state is reached and reaching the goal-state in Mup

˚ requires passing through a B-state. The transition probabilities in Mup

˚ to move from a B-state to goal or fail reflect the objective in M to maximise or minimise the probability for ϕ[r].

Lemma 3.6.1. For all states s of M and r P N:

(a) Prmin

M,s (A Uďr B) ^Acc = PrminMup_min,s (A Y B) Uďrgoal

(b) Prmax

M,s (A Uďr B) ^Acc = PrmaxMupmax,s (A Y B) Uďrgoal

Proof. To prove statement (a) we first show that the minimal probability for (A Uďr

B) ^Acc in (M, s) is bounded from above by the minimal probability for (A Y B) Uďr

goal in (Mup

min, s). For this, we pick a scheduler T for Mupmin. Furthermore, let Smin be a scheduler for M such that

PrSmin

M,s Acc = PrminM,s Acc

for all states s of M. We now combine T and Smin to construct a scheduler S for M as follows. In its initial mode, S simulates T, unless the last state of the generated path is a B-state, in which case S switches to its second mode where it behaves as Smin. Formally, this means:

3.6 Quantiles under side conditions

• S(ρ) = T(ρ) for all finite paths ρ in M that do not contain a B-state,

• S(ρ) = Smin(ρB)for all finite paths ρ in M containing at least one B-state and where ρB is the suffix of ρ starting in the first B-state of ρ.

In the following calculation, let FP[A Uďr _t] denote the set of all finite S-paths ρ = s0act0s1act1 . . .actn´1sn in M with s0 = s and ts0, s1, . . . , sn´1u Ď AzB such that sn = t and rew(ρ) ď r. We write Pr(ρ) for the probability of ρ given by the product of the transition probabilities. Note that each ρ P FP[A Uďr_t] is also a T-path with the same probability. Furthermore, we write S[ρ] for the scheduler “S after ρ’’ which means S[ρ](ρ1_{) = S(ρ ˝ ρ}1₎ if last(ρ) = first(ρ1₎ and where ˝ denotes the concatenation of finite paths. We then have:

PrS M,s (A Uďr B) ^Acc = ÿ tPB PrS M,s ((A ^ B) Uďr t) ^Acc = ÿ tPB ÿ ρPFP[AUďr_t] Pr(ρ) ¨ PrS[ρ] M,t Acc (˚) = ÿ tPB ÿ ρPFP[AUďr_t] Pr(ρ) ¨ Prmin M,t Acc = ÿ tPB ÿ ρPFP[AUďr_t] Pr(ρ) ¨ Pup min(t, τ,goal) =PrT_Mup min,s (A Y B) U ďr goal

where (˚) holds because S[ρ] = Smin for each ρ P ŤtPBFP[A Uďr t]. Recall that S mimics Smin after having visited a B-state. In summary, for every scheduler T for Mup_min we can construct a scheduler S for M such that

PrS

M,s (A Uďr B) ^Acc = PrTMup_min,s (A Y B) U

ďr goal This yields:

Prmin

M,s (A Uďr B) ^Acc ď PrminMup_min,s (A Y B) U

ďr goal Vice versa, given a scheduler S for M, we construct a scheduler T for Mup

min such that the probability for (A Y B) Uďr goal under T is less or equal than the probability for

(A Uďr _{B) ^}Acc under S as follows. As long as no state s P B has been reached,

scheduler T for Mup

min behaves as S. If the input path for T ends in a state of B then T has no choice and must schedule τ from this moment on. That is, T(ρ) = τ for all finite paths ρ in Mup_min that contain a B-state. (Note that these paths either end in a B-state or in one of the trap states goal or fail.)

We can now use the same calculation as before, but replace the equality symbol in (˚) with ě as:

PrS[ρ]

M,t Acc ě Pr

min

This yields that for every scheduler S for M there is a scheduler T for Mup min with PrS M,s (A Uďr B) ^Acc ě Pr T Mup_min,s (A Y B) U ďr goal As a consequence we obtain: Prmin

M,s (A Uďr B) ^Acc ě PrminMup_min,s (A Y B) U

ďr goal

The proof of statement (b) is analogous. To show that the maximal probability for

(A Uďr _{B) ^}Acc in the pointed MDP (M, s) is greater or equal than the maximal

probability for the event ((A Y B) Uďr goal) in (Mup

max, s), we pick a scheduler T for Mup_max and a scheduler Smax for M such that

PrSmax

M,s Acc = PrmaxM,s Acc

for all states s of M. We design a scheduler S for M that operates in two modes. In its initial mode, scheduler S simulates T, unless a B-state has been reached, in which case S switches to its second mode and behaves as Smax from then on. Formally, the behaviour of S for a given finite input path ρ in M is as follows:

• S(ρ) = T(ρ) if ρ does not contain a B-state

• S(ρ) = Smax(ρB) if ρ contains at least one B-state and ρB is the longest suffix of ρ starting in a B-state

Using analogous calculations as in the proof of statement (a), we get: PrS

M,s (A Uďr B) ^Acc ě PrTMupmax,s (A Y B) U

ďr goal This shows:

Prmax

M,s (A Uďr B) ^Acc ě PrmaxMup_max,s (A Y B) U

ďr goal

It remains to prove that the maximal probability for (A Uďr _{B) ^}Acc in the pointed

MDP (M, s) is less or equal than the maximal probability for (A Y B) Uďr goal in (Mup_max, s). To see this, we take an arbitrary scheduler S for M. Let T be the scheduler that behaves as S as long as no B-state has been reached. For all input paths for T that contain at least one B-state, T schedules action τ. With analogous arguments as in the proof of statement (a), we obtain:

PrT Mup_max,s (A Y B) U ďrgoal ě PrS M,s (A Uďr B) ^Acc As a consequence, we get: Prmax

M,s (A Uďr B) ^Acc ď PrmaxMupmax,s (A Y B) U

ďr goal This yields the claim.

As a consequence of Lemma 3.6.1, the transformations M Mupmin and M Mupmax can be used to compute quantiles for upper reward-bounded until properties under side conditions by means of the linear programming techniques discussed in Section 3.3:

quM,s @PDp((A Uď? B) ^Acc) = quMup_min,s @PDp((A Y B) Uď? goal) quM,s DPDp((A Uď? B) ^Acc) = quMup_max,s DPDp((A Y B) Uď?goal)

3.6 Quantiles under side conditions

In document DIÁLOGO INTERRELIGIOSO (página 61-64)