Sin la participación de los países emergentes

The usage of the BP-approach takes advantage of reusing already calculated values from previous iterations. This enables optimisation techniques for reducing the memory- consumption needed to store the previously calculated values. In order to reduce the memory requirements, one can use the observation that the constants cs,i in LP1i are obtained from the values pt,i´rew(s,α) where α P Act(s) and rew(s, α) ą 0. As a consequence, the solution pt,i

tPS for LP1i can be discarded as soon as LP1i+w has been solved for the maximal reward value w = max rew(s, α) : s P S, α P Act(s)( in M. A further improvement considers the maximum reward of all incoming transitions per state. That is, the value of pt,i is not needed any more as soon as LP1i+w has been solved where w equals the maximal reward of the state-action pairs (s, α) with P (s, α, t) ą 0.

Since there is only a reference to a previously calculated value when there occurs a positive reward (see the third case of the linear program in Figure 3.3 on page 29), only the values of direct successors of positive-reward states occur at the right-hand side of the inequation. Whereas, when there is no reward involved the references inside LP1 i do belong to the very same linear program and therefore the referenced value was not calculated previously. Therefore, only the values of direct successors of positive-reward states will be propagated by the BP-approach, and as a consequence it is only necessary to store the computed values of such states.

Using those observations we can utilise a reward window for storing the already computed values from previous iterations. This reward window has a fixed capacity determining the number of iterations within the BP-approach that the stored values should be available for supply. New values will be than added to the storage by simply replacing values that are no longer needed. The implementation of the quantile calculations supports three different approaches for storing the previously calculated values using a reward window:

All states

Using this approach the values for each state of the model are stored, regardless whether the values are needed in the necessary computations or not. Therefore, each state can be addressed directly without any need to do an index-recalculation whenever a reference is needed. In a precomputation-step the maximal reward value w = max rew(s, α) : s P S, α P Act(s)( of the whole model will be determined and every state has a reward window of size w. Therefore, it can be the case that many values will be stored despite the fact that the values may not be needed by the computations of the BP-approach.

Depending on the investigated model this approach can waste a huge amount of memory for values that will never be used during the whole calculations. But, on the other hand there are no additional index-recalculations needed like it is the case for the two approaches described next. So, when there is a model where almost all states are positive-reward successors this approach might be a good decision.

Uniform positive-reward successors

As stated earlier there is only a need to store values for the successors of positive-reward states (either positive state-rewards or positive transition-rewards or both). Therefore, this approach stores only the computed values for those successors. Depending on the model and the corresponding reward function this allows one to save a lot of memory compared to the previous approach. But, in order to be able to assign the stored values to the corresponding states of the model, it is required to use an additional data structure that takes care of the correct mapping. So, as a downside one has to do a mapping- computation each time an access to a stored value is needed. That is why one has to see if the memory savings will make up for the overhead of the additional index-recalculations. Again, a uniform reward window of size w = max rew(s, α) : s P S, α P Act(s)( will be used for each stored state, like it is done for the previous approach.

Depending on the model and the reward structure this approach might be a good compromise between the used memory for the storage and the time spent on the computations.

Individual positive-reward successors

As before, this approach only stores the calculated values for the successors of positive- reward states. But here, each state has an individual reward window that is exactly tailored for the specific state, meaning that each state s P S has its own reward window of size ws = max rew(s, α) : α P Act(s)(. So, the consumed memory is minimised using this approach, and this approach becomes useful when it is the case that there are only a few states that need a very large reward window and most of the states come along with just a small window. Because using the two previous approaches in this case would allocate huge reward windows even for states that do not need those huge windows, so a significant amount of memory would be consumed without any use for the calculations.

As a drawback this approach needs more operations for resolving the references to the corresponding stored values from the previous iterations. Therefore, the time for the computation might be effected in a negative way in comparison to the other two approaches.

See the eBond-protocol on page 140 for performance statistics on the usage of the different reward window-approaches. There the number of elements that is stored by each approach (see Figure 6.20) can be related with the computation time that is needed in order to compute a complete query (see Figure 6.21). It can be seen that the approaches where we only store values for the positive-reward successors can reduce the memory consumption in a substantial way, but this reduction has to be paid in terms of increased computation times due to the additional operations needed for resolving the correct references within the stored values.

5.1 Computation optimisations

In document DIÁLOGO INTERRELIGIOSO (página 103-107)