5. La interacción oral en los libros de texto
5.1 Las prácticas interactivas colaborativas y los temas abordados
The previous paragraphs established a more general result on the stability of V through the TMDP Bellman backups and provided some insight regarding the reason why [Boyan and Littman, 2001] could perform such a resolution. The next question is to try to find all cases where such an exact resolution is feasible — under the modeling hypotheses given at the beginning of section 5.1, namely piecewise polynomial functions and piecewise polynomial or discrete distributions. In the case of an ever-increasing degree of V there is no convergence of Value Iteration since the polynomials’ degree keep growing. Actually, convergence is pos- sible but the optimal value function might have an infinite degree. Thus an exact resolution necessarily implies A + C = −1. Then we need to check if all values of B allow for an exact calculation of V ’s coefficients.
For all values of B, equation 5.4 can be easily solved since we know how to calculate the coefficients of Unwithout approximation5. The same remark also holds for equation 5.3.
However, equation 5.2 implies to find — for a fixed s — the intersections of the |A| curves of (Q(s, t, a))a∈A. This corresponds to finding the roots of several polynomials of degree B. This operation is known to be feasible without approximation only in the following cases:
4The calculations can easily be done with the max operator, the reasoning is the same. 5for details please see appendix A
5.5. Is it possible to extend the exact resolution? • B = 0, trivial case
• B = 1, linear case
• B = 2, Newton’s formula
• B = 3, Cardan’s or Sotta’s Formula • B = 4, Ferrari’s or Descartes’s formula
For B ≥ 5 Galois proved that there was no general method to find the exact roots of a polynomial in a finite number of calculations. An interesting approximation technique used to find the smallest real root is Sturm’s method6 ([Sturm, 1835]).
Finally, equation 5.1 imposes to find the maxima of piecewise polynomial functions of degree B. This corresponds to finding the roots of polynomials of degree B − 1; if B < 6 this is an exact calculation. Thus, the limiting constraint here comes from equation 5.2. Lastly:
Exact resolution of TMDPs with piecewise polynomial modeling is feasible if: Pµ∈ DP−1
ri ∈ P4
L ∈ P0
(5.12)
These results highlight the reason why there is little room for extension of [Boyan and Littman, 2001]’s exact resolution scheme. However, this analysis opens the door to the un- derstanding of an approximate resolution in Pm.
The next chapter will present how the exact resolution is calculated, followed by the approximate resolution scheme. This chapter’s conclusion generalizes the results presented in [Boyan and Littman, 2001] by showing the limits of the piecewise polynomial representation framework for exact resolution and by allowing to focus on the difficulties associated with approximate resolution of piecewise polynomial TMDPs.
Chapter 5. Solving TMDPs via Dynamic Programming
6
The
TMDP
polyalgorithm: solving generalized TMDPs
Bellman backups over TMDPs with piecewise polynomial transition, reward and duration functions can be performed analytically, yielding piecewise polynomial value functions. The previous chapter defined the cases when the value function’s degree was stable throughout the iterations and when the calculations could be made without approximation. On this basis, we introduce the TMDPpolyalgorithm
which combines analytical computation of Bellman backups on value functions (with either exact or approximate calculations), L∞-bounded value function approxima-
tion and prioritized dynamic programming for solving the general case of piecewise polynomial TMDPs.
In the case where the TMDP definition obeys equation 5.12 it is possible to perform analytical calculations for the successive Bellman backups of Value Iteration. The next paragraph summarizes the properties obtained from the previous chapters which are used in order to solve TMDPs. It also introduces the basis of the TMDPpoly algorithm which is
detailed in the rest of the chapter.
6.1
Extending exact TMDP resolution: some conclusions and proper-
ties
1. Closed-form Bellman backups: If the reward, transition and duration functions of a TMDP model obey equation 5.12, then value iteration yields a sequence of piecewise polynomial value functions which have a stable (non-increasing) degree.
2. Interleaving idleness and action: TMDP resolution can interleave “wait” and “action” phases because wait(τ = 0) is an action which has no effect on the process’ state and no effect on rewards.
3. Decoupling the equations: Interleaving these phases corresponds to alternating wait and other actions. The consequence on the optimality equations is a decoupling of the calculation. One can calculate first the Q-values of standard actions’ (equation 5.3), find the optimal action and the associated value function V (equation 5.2) and then calculate wait(t0)’s Q-value as in equation 6.1 and choose the best t0 (equation 5.1).
Q(s, wait(t0)) = Qwait(s, t0) =
Z t0
t K(s, θ)dθ + V (s, t
Chapter 6. The TMDPpoly algorithm: solving generalized TMDPs
4. Ordering dynamic programming passes: As presented in section 2.3.3, making time observable in a planning problem avoids transitions that loop exactly on the same aug- mented state. However, loops are possible if one only takes the discrete, non-temporal part of the state space into account. The resolution scheme presented above updates the value function, in each discrete state, for all t. Therefore, taking the structure of time and causality into account in TMDP solving via dynamic programming cor- responds to updating the states in a pertinent ordering. Intuitively, a good strategy would be to update first the states that are close to the “reward providing states”. The idea of updating the states that have important value function change is generalized in the prioritized sweeping algorithm first introduced by [Moore and Atkeson, 1993]. Based on these remarks, we try to design an algorithm which implements a simple version of prioritized sweeping on the TMDP framework, first with the exact resolution hypotheses, then with an approximation scheme which allows faster convergence and easier calculations.