The first result states the correctness of the construction of the scheduler D, that is, it asserts thatD and D assign the same probability to corresponding sets of paths.
Theorem 4.1. LetC =(S, Act, R, ν) be a CTMDP and D a GM-scheduler on C. Further, letC = (S, Act, R, ν) be the induced locally uniform CTMDP and D the scheduler that corresponds to D. Then it holds for all Π ∈ FPathsω that
Prων,D(Π) = Prων,D(extend(Π)).
Proof. Each cylinder Π ∈ FPathsω(C)is induced by a measurable base [ADD00, Thm. 2.7.2];
hence Π = Cyl(B) for some B ∈ FPathsn(C) andn ∈ N. But then, Prων,D(Π) = Prnν,D(B).
Further, it is easy to verify that extend(Cyl(B)) = Cyl(extend(B)). Thus Prnν,D(B) =
Prων,D(extend(Π)) by Lemma 4.7. ◻
With Lemma 4.4 and its extension, we are now ready to prove that local uniformization does not alter the CTMDP in a way that we leak probability mass with respect to the most important scheduler classes:
Theorem 4.2. LetC = (S, Act, R, ν) be a CTMDP and let C = (S, Act, R, ν) be its in-duced locally uniform CTMDP. For all Π ∈ FPathsω(C)and each scheduler class D from the set{GM, TTHR, TTPR, TAHR, TAPR} it holds that
sup
D∈D(C)
Prων,D(Π) ≤ sup
D′∈D(C)
Prων,D′(extend(Π)). (4.11)
Proof. By Thm. 4.1, the claim follows for the class of all GM-schedulers, that is, for D = GM. For the other classes, it remains to check that the GM-scheduler D used in Lemma 4.4 also falls into the respective class. Here, we state the proof forTTPR: If D ∶ S × R≥0 → Distr(Act) ∈ TTPR, define D(s, ∆) = D(s, ∆) if s ∈ S and D(sα, ∆) = {α ↦ 1} for sα ∈Scp.
Then Lemma 4.4 applies verbatim. ◻
Note that Thm. 4.2 does not mention the scheduler classesTPR and TAHOPR. This is for good reason: In Thm. 4.4, we will construct a counterexample that disproves Eq. (4.11) for these scheduler classes: Note that although we obtain aGM-scheduler D onC for any D ∈ TPR(C) ∪ TAHOPR(C) by Thm. 4.1, D is not guaranteed to lie in TPR(C) (or TAHOPR(C), respectively). Hence, Eq. (4.11) does not hold directly for all scheduler classes that are subsets ofGM.
For the main result, we identify the scheduler classes, that do not gain probability mass by local uniformization:
Theorem 4.3. LetC =(S, Act, R, ν) be a CTMDP, C = (S, Act, R, ν) its induced locally uniform CTMDP and Π ∈ FPathsω(C). Then
sup
D∈D(C)
Prων,D(Π) = sup
D′∈D(C)
Prων,D′(extend(Π)) for D ∈ {TTPR, TAPR} .
Proof. Theorem 4.2 proves the direction from left to right. For the reverse, let D′ ∈ TTPR(C) and define D ∈ TTPR(C) such that D(s, ∆) = D′(s, ∆) for all s ∈ S, ∆ ∈ R≥0. ThenD = D′andPrων,D′(extend(Π)) = Prν,Dω (Π) by Thm. 4.1. Hence the claim for TTPR
follows; analogue forD′∈TAPR(C). ◻
Conjecture 4.1. We conjecture that Thm. 4.3 also holds for GM and TTHR. For D′ ∈ GM(C), we aim at defining a scheduler D ∈ GM(C) that induces the same probabilities onC. However, a history π ∈ Paths⋆(C) corresponds to the uncountable set extend(π) in C such that D′(π, ⋅) may be different for each π ∈ extend(π).
As D can only decide once on history π, in order to mimic D′onC, we propose to weigh each distribution D′(π, ⋅) with the conditional probability of dπ given extend(π).
In the following, we disprove Eq. (4.11) forTPR- and TAHOPR-schedulers. Intuitively, TPR-schedulers rely on the sojourn time in the last state; however, local uniformization changes the exit rates of states by adding transitions to copy-states.
Theorem 4.4. For G ∈{TPR, TAHOPR}, there exists a CTMDP C = (S, Act, R, ν) and a measurable set of paths Π ∈ FPathsω(C)such that
sup
D∈G(C)
Prων,D(Π) > sup
D′∈G(C)
Prων,D′(extend(Π)).
Proof. We give the proof for TPR: Consider the CTMDPs C and C in Fig. 4.2(a) and Fig. 4.5(a), respectively.
Let Π ∈ FPathsω(C) be the set of paths inC that reach state s3 in 1 time unit and let Π = extend(Π). To optimize Prων,D(Π) and Prων,D′(Π), any scheduler D (resp. D′) must choose{α ↦ 1} in state s0. Nondeterminism only remains in states1; here, the optimal distribution over{α, β} depends on the time t0that was spent to reach states1: InC and C, the probability to go froms1tos3in the remainingt = 1 − t0time units is fα(t) = 31−31e−3t forα and fβ(t) = 1 + 21e−3t− 32e−t forβ. Fig. 4.5(b) shows the cumulative distribution functions (cdfs) of fαand fβ; as any convex combination ofα and β results in a cdf in the shaded area of Fig. 4.5(b), we only need to consider the extreme distributions{α ↦ 1}and
s0 s1 s3
Figure 4.5: Timed reachability of states3(starting ins1) inC and C.
{β ↦ 1} for maximal reachability. Let d be the unique solution (in R>0) of fα(t) = fβ(t), i.e. the point where the two cdfs cross. ThenDopt(s0
α,t0
ÐÐ→ s1, ⋅) = {α ↦ 1} if 1−t0≤d and {β ↦ 1} otherwise, is an optimal GM-scheduler for Π on C and Dopt ∈TPR(C)∩TTPR(C) as it depends only on the delay of the last transition.
For Π, D′is an optimalGM-scheduler onC if D′(s0
For TAHOPR, a similar proof applies that relies on the fact that local uniformization changes the number of transitions needed to reach a goal state. ◻ This proves that by local uniformization, essential information for TP and TAHOPR schedulers is lost. In other cases, schedulers fromTAHR and TAHOPR gain information by local uniformization:
Proof. Consider the CTMDPsC and C in Fig. 4.2(a) and Fig. 4.5(a), resp. Let Π be the time-bounded reachability property of states3within 1 time unit and let Π =extend(Π).
We prove the claim forTAHR: Therefore, we derive D ∈ TAHR(C) such that Prν,Dω (Π) = supD′∈TAHR(C)Prων,D′(Π). For this, D(s0) = {α ↦ 1} must obviously hold. Thus, the only nondeterministic choice occurs in states1for time-abstract historys0Ð→ sα 1whereD(s0
Ð→α
s1) = µ, µ ∈ Distr({α, β}). For initial state s0, Fig. 4.6(a) depictsPrων,D(Π) for all µ ∈ Distr({α, β}); obviously, D(s0
Ð→ sα 1) = {β ↦ 1} maximizes Prν,Dω (Π). On C, we prove that there existsD′∈TAHR(C) such that Prων,D(Π) < Prν,D′(Π): To maximize Prν,Dω ′(Π), define D′(s0) = {α ↦ 1}. Note that D′ may yield different distributions for the time-abstract pathss0
Ð→ sα 1ands0
Ð→ sα α0 Ð→ sα 1; forµ, µc∈Distr({α, β}) such that µ = D′(s0
Ð→α
s1) and µc = D′(s0
Ð→ sα α0 Ð→ sα 1) the probability of Π under D′ is depicted in Fig. 4.6(b) for all µ, µc ∈ Distr({α, β}). Clearly, Prων,D′(Π) is maximal if D′(s0
Ð→ sα 1) = {β ↦ 1}
and D′(s0
Ð→ sα α0 Ð→ sα 1) = {α ↦ 1}. Further, Fig. 4.6(b) shows that with this choice of D′,Prων,D′(Π) > Prων,D(Π) and the claim follows. For TAHOPR, the proof applies
analo-gously. ◻
With these counterexamples, we complete our discussion of local uniformization and come back to the question that was raised at the beginning of Sec. 4.2: The motivation to study locally uniform CTMDPs is to delay the scheduling decision until the current state is left.
As we have seen, forTTPR- and TAPR- schedulers, any given CTMDP can be trans-formed into a locally uniform one while preserving all measures. Moreover, in this thesis, we are particularly interested in time-bounded reachability objectives; for them, we know thatTTPR schedulers are sufficient, that is, we do not need to consider any other class of schedulers to obtain the optimal reachability probabilities.
However, a word of caution is necessary at this point: The results of this chapter might lead to the conclusion, that for time-bounded reachability objectives, one can transform an arbitrary CTMDP into a locally uniform one and investigate it with respect to late schedulers. Albeit possible, there is still an open theoretical problem in this approach:
The results of this chapter do not prove in any way, that local uniformization preserves measures with respect to late schedulers. Obviously, for such a proof, we need to define the semantics of non-locally uniform CTMDPs under late schedulers. However, in this setting, the scheduling decision and the sojourn time distribution become dependent on each other. The natural result are measurable schedulers that decide continuously during the sojourn in the current state. However, the implications of such a definition are ongoing research and outside the scope of this thesis.
D(s0α Ð→s1)(α)
probability
(a) TAHR-schedulers onC.
D′(s0 αÐ→ s1)(α) D′(s0Ð→ sα 0αÐ→ sα 1)(α)
D′
D
probability
(b)TAHR-schedulers onC.
Figure 4.6: OptimalTAHR-schedulers for time-bounded reachability.