EVALUACIÓN DEL PLAN DE SESIONES COACHING - Desarrollo de la cultura coaching en el laboratorio

The first result states the correctness of the construction of the scheduler D, that is, it asserts thatD and D assign the same probability to corresponding sets of paths.

Theorem 4.1. LetC =(S, Act, R, ν) be a CTMDP and D a GM-scheduler on C. Further, letC = (S, Act, R, ν) be the induced locally uniform CTMDP and D the scheduler that corresponds to D. Then it holds for all Π ∈ FPaths^ω that

Pr^ω_ν,D(Π) = Pr^ων,D(extend(Π)).

Proof. Each cylinder Π ∈ FPaths^ω(C)is induced by a measurable base [ADD00, Thm. 2.7.2];

hence Π = Cyl(B) for some B ∈ FPathsⁿ(C) andn ∈ N. But then, Pr^ω_ν,D(Π) = Prⁿν,D(B).

Further, it is easy to verify that extend(Cyl(B)) = Cyl(extend(B)). Thus Prⁿν,D(B) =

Pr^ω_ν,D(extend(Π)) by Lemma 4.7. ◻

With Lemma 4.4 and its extension, we are now ready to prove that local uniformization does not alter the CTMDP in a way that we leak probability mass with respect to the most important scheduler classes:

Theorem 4.2. LetC = (S, Act, R, ν) be a CTMDP and let C = (S, Act, R, ν) be its in-duced locally uniform CTMDP. For all Π ∈ FPaths^ω(C)and each scheduler class D from the set{GM, TTHR, TTPR, TAHR, TAPR} it holds that

sup

D∈D(C)

Pr^ω_ν,D(Π) ≤ sup

D^′∈D(C)

Pr^ω_ν,D^′(extend(Π)). (4.11)

Proof. By Thm. 4.1, the claim follows for the class of all GM-schedulers, that is, for D = GM. For the other classes, it remains to check that the GM-scheduler D used in Lemma 4.4 also falls into the respective class. Here, we state the proof forTTPR: If D ∶ S × R≥0 → Distr(Act) ∈ TTPR, define D(s, ∆) = D(s, ∆) if s ∈ S and D(s^α, ∆) = {α ↦ 1} for s^α ∈Scp.

Then Lemma 4.4 applies verbatim. ◻

Note that Thm. 4.2 does not mention the scheduler classesTPR and TAHOPR. This is for good reason: In Thm. 4.4, we will construct a counterexample that disproves Eq. (4.11) for these scheduler classes: Note that although we obtain aGM-scheduler D onC for any D ∈ TPR(C) ∪ TAHOPR(C) by Thm. 4.1, D is not guaranteed to lie in TPR(C) (or TAHOPR(C), respectively). Hence, Eq. (4.11) does not hold directly for all scheduler classes that are subsets ofGM.

For the main result, we identify the scheduler classes, that do not gain probability mass by local uniformization:

Theorem 4.3. LetC =(S, Act, R, ν) be a CTMDP, C = (S, Act, R, ν) its induced locally uniform CTMDP and Π ∈ F_Paths^ω_(C). Then

sup

D∈D(C)

Pr^ω_ν,D(Π) = sup

D^′∈D(C)

Pr^ω_ν,D^′(extend(Π)) for D ∈ {TTPR, TAPR} .

Proof. Theorem 4.2 proves the direction from left to right. For the reverse, let D^′ ∈ TTPR(C) and define D ∈ TTPR(C) such that D(s, ∆) = D^′(s, ∆) for all s ∈ S, ∆ ∈ R^≥0. ThenD = D^′andPr^ω_ν,D^′(extend(Π)) = Prν,D^ω (Π) by Thm. 4.1. Hence the claim for TTPR

follows; analogue forD^′∈TAPR(C). ◻

Conjecture 4.1. We conjecture that Thm. 4.3 also holds for GM and TTHR. For D^′ ∈ GM(C), we aim at defining a scheduler D ∈ GM(C) that induces the same probabilities onC. However, a history π ∈ Paths^⋆(C) corresponds to the uncountable set extend(π) in C such that D^′(π, ⋅) may be different for each π ∈ extend(π).

As D can only decide once on history π, in order to mimic D^′onC, we propose to weigh each distribution D^′(π, ⋅) with the conditional probability of dπ given extend(π).

In the following, we disprove Eq. (4.11) forTPR- and TAHOPR-schedulers. Intuitively, TPR-schedulers rely on the sojourn time in the last state; however, local uniformization changes the exit rates of states by adding transitions to copy-states.

Theorem 4.4. For G ∈{TPR, TAHOPR}, there exists a CTMDP C = (S, Act, R, ν) and a measurable set of paths Π ∈ FPaths^ω(C)such that

sup

D∈G(C)

Pr^ω_ν,D(Π) > sup

D^′∈G(C)

Pr^ω_ν,D^′(extend(Π)).

Proof. We give the proof for TPR: Consider the CTMDPs C and C in Fig. 4.2(a) and Fig. 4.5(a), respectively.

Let Π ∈ F_Paths^ω_(C) be the set of paths inC that reach state s3 in 1 time unit and let Π = extend(Π). To optimize Pr^ων,D(Π) and Pr^ων,D^′(Π), any scheduler D (resp. D^′) must choose{α ↦ 1} in state s0. Nondeterminism only remains in states1; here, the optimal distribution over{α, β} depends on the time t0that was spent to reach states₁: InC and C, the probability to go froms₁tos₃in the remainingt = 1 − t₀time units is fα(t) = ₃¹−₃¹e⁻^3t forα and fβ(t) = 1 + ₂¹e⁻^3t− ³₂e⁻^t forβ. Fig. 4.5(b) shows the cumulative distribution functions (cdfs) of fαand fβ; as any convex combination ofα and β results in a cdf in the shaded area of Fig. 4.5(b), we only need to consider the extreme distributions{α ↦ 1}and

s₀ s₁ s₃

Figure 4.5: Timed reachability of states3(starting ins1) inC and C.

{β ↦ 1} for maximal reachability. Let d be the unique solution (in R>0) of fα(t) = fβ(t), i.e. the point where the two cdfs cross. ThenDopt(s0

α,t0

ÐÐ→ s1, ⋅) = {α ↦ 1} if 1−t0≤d and {β ↦ 1} otherwise, is an optimal GM-scheduler for Π on C and Dopt ∈TPR(C)∩TTPR(C) as it depends only on the delay of the last transition.

For Π, D^′is an optimalGM-scheduler onC if D^′(s0

For TAHOPR, a similar proof applies that relies on the fact that local uniformization changes the number of transitions needed to reach a goal state. ◻ This proves that by local uniformization, essential information for TP and TAHOPR schedulers is lost. In other cases, schedulers fromTAHR and TAHOPR gain information by local uniformization:

Proof. Consider the CTMDPsC and C in Fig. 4.2(a) and Fig. 4.5(a), resp. Let Π be the time-bounded reachability property of states₃within 1 time unit and let Π =extend(Π).

We prove the claim forTAHR: Therefore, we derive D ∈ TAHR(C) such that Prν,D^ω (Π) = sup_D′∈TAHR(C)Pr^ω_ν,D^′(Π). For this, D(s0) = {α ↦ 1} must obviously hold. Thus, the only nondeterministic choice occurs in states₁for time-abstract historys₀Ð→ s^α ₁whereD(s0

Ð→α

s1) = µ, µ ∈ Distr({α, β}). For initial state s0, Fig. 4.6(a) depictsPr^ω_ν,D(Π) for all µ ∈ Distr({α, β}); obviously, D(s0

Ð→ sα ₁) = {β ↦ 1} maximizes Prν,D^ω (Π). On C, we prove that there existsD^′∈TAHR(C) such that Pr^ων,D(Π) < Prν,D^′(Π): To maximize Prν,D^ω ^′(Π), define D^′(s0) = {α ↦ 1}. Note that D^′ may yield different distributions for the time-abstract pathss0

Ð→ sα ₁ands0

Ð→ sα ^α₀ Ð→ s^α ₁; forµ, µc∈Distr({α, β}) such that µ = D^′(s0

Ð→α

s₁) and µ^c = D^′(s0

Ð→ sα ^α₀ Ð→ s^α ₁) the probability of Π under D^′ is depicted in Fig. 4.6(b) for all µ, µc ∈ Distr({α, β}). Clearly, Pr^ων,D^′(Π) is maximal if D^′(s0

Ð→ sα ₁) = {β ↦ 1}

and D^′(s0

Ð→ sα ^α₀ Ð→ s^α 1) = {α ↦ 1}. Further, Fig. 4.6(b) shows that with this choice of D^′,Pr^ω_ν,D′(Π) > Pr^ων,D(Π) and the claim follows. For TAHOPR, the proof applies

analo-gously. ◻

With these counterexamples, we complete our discussion of local uniformization and come back to the question that was raised at the beginning of Sec. 4.2: The motivation to study locally uniform CTMDPs is to delay the scheduling decision until the current state is left.

As we have seen, forTTPR- and TAPR- schedulers, any given CTMDP can be trans-formed into a locally uniform one while preserving all measures. Moreover, in this thesis, we are particularly interested in time-bounded reachability objectives; for them, we know thatTTPR schedulers are sufficient, that is, we do not need to consider any other class of schedulers to obtain the optimal reachability probabilities.

However, a word of caution is necessary at this point: The results of this chapter might lead to the conclusion, that for time-bounded reachability objectives, one can transform an arbitrary CTMDP into a locally uniform one and investigate it with respect to late schedulers. Albeit possible, there is still an open theoretical problem in this approach:

The results of this chapter do not prove in any way, that local uniformization preserves measures with respect to late schedulers. Obviously, for such a proof, we need to define the semantics of non-locally uniform CTMDPs under late schedulers. However, in this setting, the scheduling decision and the sojourn time distribution become dependent on each other. The natural result are measurable schedulers that decide continuously during the sojourn in the current state. However, the implications of such a definition are ongoing research and outside the scope of this thesis.

D(s0α Ð→s1)(α)

probability

(a) TAHR-schedulers onC.

D^′(s⁰ αÐ→ s¹)(α) D′(s₀Ð→ s^α ₀^αÐ→ s^α ₁)(α)

D^′

probability

(b)TAHR-schedulers onC.

Figure 4.6: OptimalTAHR-schedulers for time-bounded reachability.

In document Desarrollo de la cultura coaching en el laboratorio farmacéutico Danam México SA de CV. (página 100-112)