CUBA, EXCEDENTE COLECTIVO E IDENTIDAD NACIONAL

IÑAKI GIL DE SAN VICENTE

3.2 1917, COOPERATIVAS Y SOCIALISMO

5. COOPERATIVAS, SOVIETS CAMPESINOS Y MARXISMO

5.4 CUBA, EXCEDENTE COLECTIVO E IDENTIDAD NACIONAL

In the previous diagnosis, the curve F-score(l(θ),y∗) versusθon a set ofθ∈ {θ1, . . . , θG} can be efficiently computed. However, ExpFs(l(^ θ)) is expensive in practice. We make

a concrete analysis by using the Reuters dataset as an example, where number of class

C = 300, number of feature D = 5×104_{, number of test examples} _n _{= 10}5_{, average} number of non-zero features per example ¯D= 70, and number of θ candidateG= 20.

1. Due to memory constraints, the testing data{x1, . . . ,xn}can only be read in an online fashion, and cannot be stored. In some cases, privacy or other accessibility constraints disallow us to revisit the testing examples. In some other cases, though revisiting the data is allowed, we can only afford at most a dozen of passes due to the computational cost of reading and parsing the data.

2. Sampling is also expensive in time and space. Thewvector costs O(CD) mem-

ory. For the Reuters dataset, it costs 8CD bytes = 120 MB. With regard to

computational complexity, one sample takes O(nCD¯) time to be applied to all

the testing data, so the total cost is about 2×109_{. Therefore we can neither} compute nor store more than a dozen samples ofw. So we let S= 10.

Taking into account the above constraints, we propose two efficient exact algorithms: one takes a single pass over the testing data and the other uses multiple passes. Both algorithms rely on careful buffering which can be best illustrated by writing out the empirical expected F-score in ground terms. For class c, combining the definitions

in Eq. (3.12) and (3.14), we have

^ ExpFsc(l(θg)) = 1 S S X s=1 :=αc,s,g z }| { n X i=1 δ xi,ws,c˜ −˜bs>0 ·δ(p(y_ci = 1)> θg) n X i=1 δ xi,ws,c˜ ₋˜bs>0 | {z } :=βc,s + n X i=1 δ(p(y_ci) = 1)> θg) | {z } :=γc,g .

Technically, we maintain three counters: αc,s,g,βc,s andγc,g. They are all cheap in space, costing O(CSG) forα,O(CS) forβ, and O(CG) forγ. γ does not depend on

the samples, and can be computed efficiently. So the only problem left is α andβ.

Single pass If we are only allowed to visit the test dataset for a single pass, then for each testing example, we must apply all the samples of w. Since there is not enough memory to store all the weight samples, we have to regenerate these samples for every testing example. To ensure good statistical performance, all the testing examples need to “see” the same samples ofw, and therefore we store the seed of the random number generator for all the weight components. Algorithm9shows the whole algorithm, and it is essentially trading computations (of resampling weights) for IO pass.

Labeling could be done class by class which allows us to store all the weight samples of that class in memory. However, it will require reading through the testing data for

Algorithm 9:IO bounded computation ofExpFs(l(^ θ)) forθ∈ {θg :g∈[G]}. Input: A set of candidate thresholds θ_{∈ {}θg :g∈[G]}, bias b∼ N(µ0, σ20), and

posterior modelwc,d∼ N(µc,d, σ2_c,d) for classc∈[C] and featured∈[D]. Output: ExpFs^_c(l(θg)) for allc∈[C] and g∈[G].

1 Randomly generate seed s_c,d forc∈[C] and d∈[D]. 2 Draw iid random samples ˜b1, . . . ,˜bS from N(µ0, σ20).

3 Clear buffer α_c,s,g =β_c,s=γ_c,g = 0 for c∈[C], s∈[S], g∈[G]. 4 while there is still testing data do

5 Load the next test examplex, which hasnon-zero features d1, . . . , dF. 6 forc∈[C] (class index) do

7 p(yc= 1)←Φ qhµc,xi−µ0 σ2 0+ P x2 dσ2c,d !

utilizing feature sparsity. 8 forg∈[G] (threshold candidate index) do

9 Incrementγ_c,g by 1 if p(y_c= 1)> θ_g.

10 Create random number generatorsr_d1, . . . , r_d_F seeded by s_c,d1, . . . , s_c,d_F resp.

11 fors∈[S] (sample index) do

12 ford=d1, d2, . . . , dF (index of non-zero features) do 13 Sample ˜w_s,d∼ N(µ_c,d, σ_c,d2 ) using generatorrs,d. 14 if P_dx_dw˜_s,d−˜b_s >0 (i.e., y_c= 1) then

15 Incrementβ_c,s by 1.

16 forg∈[G] (threshold candidate index) do 17 Incrementαc,s,g by 1 ifp(yc= 1)> θg.

18 for c∈[C] (class index) and g∈[G] (threshold candidate index) do Output: ExpFs^_c(l(θg)) = _S1 PS_s=1 _β_c,sαc,s,g_+γ_c,g.

Multiple passes If the testing data can be visited for multiple passes, then we no longer need to regenerate weight samples. For each weight sample, we go through the whole testing data and update the countersα,βand γ. Since only a dozen of samples

are drawn, visiting the testing data for a dozen of passes is affordable. This algorithm is simpler than the single pass version, and we omit the details. Essentially, it trades multiple IO passes for the computational cost of regenerating samples.

Finally although 10 samples seem to be a very low number, the experimental results in Section 3.3.3and 3.4.3show that 5 samples already provide a pretty good, though approximate, characterization of how ExpFs(l(θ)) and F-score(l(θ),y∗) depend on θ,

which allows us to find the optimalθapproximately. Remember for each weight sample,

the whole testing dataset is used to compute the approximation. And we only need the mode of ExpFs(l(θ)), which could probably be roughly captured with a small set

of weight samples.

In document Iñaki Gil de San Vicente (página 107-117)