• No se han encontrado resultados

Capítulo 2. Marco teórico

2.2. Administración de Procesos de Negocio

Since the Exponential Weights Algorithm will never produce deterministic strate- gies, it is clear that the Blackwell’s method is genuinely different. The reason for non-deterministic strategies of the Exponential Weights method is simple: the up- date rule never pushes the mixed strategy to the corner of the probability simplex, always keeping a nonzero weight even on hopelessly bad experts.

What if instead of the Follow the Regularized Leader with entropic regulariza- tion, which yields the Exponential Weights Algorithm as shown in Example 19, we use Euclidean regularization, which yields a gradient-descent type update as shown in Example18?

To phrase the Follow the Regularized Leader (FTRL) method, we need to de- fine the loss as a linear or convex function. Of course, the indicator lossI©ybt 6=zt

ª

is neither of these, but the trick is to consider the linearized problem where the choice of the learner isqt ∈[0, 1], interpreted as the probability of predicting ybt=

1. SinceI©ybt 6=zt ª

=ybt+zt−2ybtzt forybt,zt∈{0, 1}, the expected loss can be writ-

ten as Eybt∼qtI © b yt6=zt ª =qt·(1−2zt)+zt =:`(qt,zt)

forqt∈[0, 1]. We can now define the FTRL method qt+1=argmin q∈[0,1] ½ t X s=1 q·(1−2zs)+η−1 1 2kq−1/2k 2 ¾ (21.7) with the Euclidean regularizer centered atq =1/2. The unconstrained problem over the real line has solution at ˜qt+1=12ηPts=1(1−2zs) which is subsequently clipped (or projected to) the interval [0, 1]. If this clipping happens, the strategy becomes deterministic, eitherqt =1 orqt =0. Clipping is needed precisely when |Pt s=1(1−2zs)| < 1 2η, or, equivalently, |z¯t−1/2| ≥ 1 4.

As shown in Figure21.4, the FTRL strategy gives deterministic prediction when the empirical frequency ¯zt is outside the band of width 21tη, centered at 1/2. The typi- cal guarantee ofO(1/pn) for the regret of FTRL setsη=c/pnfor some constantc

(or, in a time-changing manner,ηt=c/pt) and thus the width of the region where the prediction is randomized is of the order 1/pn.

¯ x ¯ c (0,0) (1,1) 1/2 1/(2t⌘)

Figure 21.4: FTRL with a fixed learning rate ηgives a mixed strategy only in the small band around 1/2.

Since FTRL enjoys small regret, we have derived yet another strategy that at- tains the goal in (2.1). Observe, however, that the FTRL strategy with the Euclidean regularizer is different from both the Exponential Weights Algorithm, and from Blackwell’s method. If we are to replicate the behavior of the latter, the informa- tion about the frequency of our correct predictions ¯ct needs to be taken into ac- count. This information (or,statisticabout the past) cannot be deduced from ¯zt alone, and so FTRL or Mirror Descent methods seem to be genuinely distinct from Blackwell’s algorithm based on ( ¯zt, ¯ct).

It turns out that the extra information about the frequency ¯ctof correct predic- tions can be used to set the learning rateηin Follow the Regularized Leader. With the time-changing learning rate

ηt = 1 4t|c¯t−1/2| the FTRL method qt+1=argmin q∈[0,1] ½ t X s=1 q·(1−2zs)+ηt1 1 2kq−1/2k 2 ¾ (21.8) becomes exactly the Blackwell’s algorithm. Let’s see why this is so. First, let us check the case when the optimum at (21.8) is achieved at the pure strategyqt =1 orqt =0. As argued earlier, this occurs when

|z¯t−1/2| ≥ 1

4tηt = |c¯t−1/2|.

This exactly corresponds to the regionsD1andD2in Figure21.1, thus matching

the behavior of Blackwell’s method. It remains to check the behavior inD3. Setting

the derivative in (21.8) to zero,

qt = 1 2+(2tηt)( ¯zt−1/2)= 1 2+ ¯ zt−1/2 2|c¯t−1/2| .

We need to check that this is the same value as that obtained geometrically in Fig- ure21.3. It is indeed the case, as can be seen from similar triangles: the ratio of

qt−1/2 to 1/2 is equal to the ratio of ¯zt−1/2 to|c¯t−1/2|.

21.4 Discussion

We have described three different methods for the problem of {0, 1}-sequence pre- diction. The Exponential Weights Algorithm puts exponentially more weight on the bit that occurs more often, but never lets go of the other bit, and the strategy is always randomized. The Follow the Regularized Leader method with a Euclidean regularizer produces a randomized strategy only in the narrow band around 1/2. A variant of FTRL which adapts the learning rate with respect to the proportion of correct predictions yields a randomized strategy in a triangular regionD3(Fig-

ure21.1), and deterministic prediction otherwise. Further, this behavior matches the Blackwell’s method, based on a geometric construction.

In the worst case, the performance (or, regret) of the three methods described above is the same, up to a multiplicative constant. However, the methods that use the extra information about the proportion ¯ct of correct predictions may have better convergence properties for “benign” sequences. Such adaptive procedures have been analyzed in the literature.

The beauty of Blackwell’s proof is its geometric simplicity and the rather sur- prising construction of the mixed strategyqt. As we will see in the next lecture, the construction appears out of a generalization of the minimax theorem.

Of course, our application of the Exponential Weights Algorithm or Follow the Regularized Leader is a shorter proof, but remember that we spent some time building these hammers. Is there a similar hammer based on the ideas of ap- proaching a desired set? The answer is yes, and it is called the Blackwell’s Ap- proachability Theorem. Using this theorem, one can actually prove a wide range of results in repeated games that go beyond the “regret” formulation. What is interesting, Blackwell’s Approachability itself can be proved using online convex optimization algorithms whose performance is defined in terms of regret. This result points to an equivalence of these big “hammers”. Finally, the sequential symmetrization tools we had developed earlier in the course can be used to prove Blackwell approachability in even more generality, without exhibiting an algo- rithm.