El fiscal Moreno y Escandón y la primera reforma educativa

4.2.2 Single Instruction Multiple Data (SIMD) Circuits

The Sharemind framework [BLW08,BJL12] for secure three-party computation showed that Single Instructions Multiple Data (SIMD) circuits can result in substantially reduced memory footprint. The idea of SIMD circuits is to replace the evaluation of n identical copies of the same sub-circuit on one-bit values by one evaluation of the sub- circuit on n-bit values. This optimization reduces the overall computation time and the memory footprint as the circuit needs to be generated only once. SIMD circuits are especially beneficial in data mining applications [BJL12], but can also speed up other applications, e.g., AES where the same S-Box is applied in parallel (cf. §4.2.1.12) or PSI (cf. §5.3).

We implement the SIMD evaluation by introducing two new virtual gate types that only require re-wiring of values, depicted in Figure 4.4: A combiner gate and a splitter gate. The combiner gate combines n one-bit input wires to a n-bit output wire. Subsequently, AND and XOR gates can be placed as usual to process the n-bit values. When the SIMD evaluation is done, a splitter gate can be used to convert n-bit wires back to n one-bit wires (or alternatively to an arbitrary subset or permutation of the output values).

Efficiency. The efficiency improvements of the SIMD programming style greatly depends on the function that is evaluated. The biggest efficiency improvements can be observed if the same function is evaluated on multiple independent inputs in parallel. In this case, for n independent function evaluations, the memory requirement of the circuit is reduced by factor n×. For functions where the data needs to be re-arranged more often, the memory improvement greatly varies. For instance, when evaluating the sort-compare-shuffle PSI circuit of [HEK12] on sets of 65 536 elements with 32 bit, the SIMD instructions reduce the memory requirement by factor 30× from 382 million to 12 million gates. The biggest disadvantage of the SIMD programming style, however, is that it is more complicated than a regular single instruction single data (SISD) programming style, since the programmer has to consider data-flow dependencies.

4.3 Optimized Pre-Computation

In this section, we show how to improve the pre-computation complexity of GMW using OT extension. We first show how to equally balance the computation and communication complexity among both parties (§4.3.1). Then, we show how to pre-compute MTs in a more communication-efficient manner using 2₁ R-OT extension (§4.3.2). Finally, we outline how to further improve communication for pre-computing MTs at the cost of increased computation complexity using N₁ R-OT extension (§4.3.3).

x

...

Combine

x

....

...

Combine

y

₁

...

Combine

z

x

₂

x

y

z

s

n Split

...

y

₁

z

y

z

y

z

s

Figure 4.4: One-time evaluation of identical circuits using SIMD operations.

4.3.1 Load Balancing

The GMW implementation of [CHK+₁₂_{] implements the} 4

1 OT extension protocol

of [LLXX05] in the setup phase, where the majority of the data is sent by the receiver (cf. §2.4.3). As the MTs used in the online phase are symmetric (cf. §2.4.2), we can run the OT extensions to generate them in the setup phase in either direction. Hence, to balance the communication, we run two instantiations of the 4₁ OT protocol (each for half of the AND gates) in parallel with the roles reversed. With this optimization, each party has the same workload per AND gate. Note that now we also need to run the base-OT protocol for the seed OTs twice, which however amortizes fairly quickly (35 ms computation time and 10 KBytes to be transferred).

4.3.2 2-MT: MTs via

2₁

R-OT

An AND gate in the GMW protocol can be computed efficiently using MTs (cf. §2.4.2), which are bits a0, a1, b0, b1, c0, c1 under the constraint that c0⊕ c1 = (a0⊕ a1)(b0⊕ b1).

Each Pi receives the shares labeled with index i. To pre-compute one MT, the work

of [CHK+12] evaluates 4₁ OT1₁ using the OT extension protocol of [LLXX05].

In the following, we present our new 2-MT approach for generating MTs that is outlined in Protocol 12and uses 2₁ R-OT2

1 (cf. §3.4.4). The R-OT functionality is exactly the

same as OT, except that the parties have no inputs but the sender gets two random messages and the receiver gets a random choice bit and corresponding message as output (cf. §3.4.4).

To understand the high-level idea of our protocol, note that a MT can be re-written as c0⊕ c1 = (a0⊕ a1)(b0⊕ b1) = a0b0⊕ a0b1⊕ a1b0⊕ a1b1. Both, P0 and P1 can compute

the terms a0b0 and a1b1 locally from their shares. We then compute the mixed-terms

81 4.3 Optimized Pre-Computation

hold no inputs. From the first OT they receive ((b0, v0), (a1, u1)) and from the second

OT they receive ((a0, u0), (b1, v1)), under the constraints that a1b0 = u1 ⊕ v0 and

a0b1 = u0⊕ v1. Finally, each Pi sets ci = aibi ⊕ ui⊕ vi.

PROTOCOL 12 (Generating Random MTs from 2₁ R-OT2 1).

• Oracles: The parties have an oracle access to the 2₁ R-OT1

1 functionality.

1. P0 and P1 invoke the 2₁ R-OT11 functionality where P0 plays the sender and P1

plays the receiver. P0receives as outputs two random bits (x0, x1) while P1 receives

a random choice bit a1 and xa1 as output. P0sets b0= x0⊕ x1and v0= x0; P1sets

u1= xa1.

[Note that a1b0= u1⊕v0as a1b0= a1(x0⊕x1) = (a1(x0⊕x1)⊕x0)⊕x0= xa1⊕x0=

u1⊕ v0.]

2. P0and P1again invoke the 2₁ R-OT11functionality, with reverse roles where P1plays

the sender and P0plays the receiver. P1receives as outputs two random bits (y0, y1)

while P0receives a random choice bit a0 and ya0 as output. P1sets b1= y0⊕ y1 and

v1= y0; P0sets u0= ya0.

[Note that a0b1= u0⊕ v1as a0b1= a0(y0⊕ y1) = (a0(y0⊕ y1) ⊕ y0) ⊕ y0= ya0⊕ y0=

u0⊕ v1.]

3. P0computes c0= a0b0⊕ u0⊕ v0; P1 computes c1= a1b1⊕ u1⊕ v1.

• Output: P0 outputs (a0, b0, c0); P1 outputs (a1, b1, c1).

Efficiency. As we have shown in §3.4, R-OT can be instantiated more efficiently than OT: In comparison to performing 4₁ OT1

1 using the protocol of [LLXX05], using our 2

1 R-OT 2

1 protocol only slightly increases the computation complexity per party from

2.5 to 3 PRG and CRF evaluations (and one additional matrix transposition), but improves the total communication complexity by a factor of approximately 2× from 4(κ + 1) to 2(κ − 1).

Correctness. For correctness, observe that c0⊕c1 = (a0b0⊕u0⊕v0)⊕(a1b1⊕u1⊕v1) =

a0b0⊕ (u0⊕ v1) ⊕ (u1⊕ v0) ⊕ a1b1 = a0b0⊕ a0b1⊕ a1b0⊕ a1b1 = (a0⊕ a1)(b0⊕ b1), as

required.

Security. In Protocol 12, a1, x0, and x1 are generated randomly by the first R-OT

and a0, y0, and y1 are generated randomly by the second R-OT. By the definition of

OT, P0 gains no information on (a1, y1−a0) and hence b1 = y0⊕ y1 and P1 gains no

information on (a0, y1−a1) and hence b0 = x0⊕ x1.

4.3.3 N -MT: MTs via

N₁

OT

To improve the communication in secure computation, the work of [KK13] proposed to use their N₁ OT protocol to reduce N₁ OT1_log

2N to

2 1 OT

log₂N

to convert from N₁ OT to 2

1 OT). They achieved a communication saving of up to

1.6× per 2₁ OT2₁, from 256 bits to 160 bits, when setting κ = 128 and N = 16. In the following, we introduce our N−M T protocol which further improves on their communication savings by using our optimized N₁ OT protocol from §3.2.4to directly compute a MT, which corresponds to a 4₁ OT1₁. For this reduction, we evaluate N₁ OT1_log

4(N ) which we can be directly transformed to

4 1 OT

log4(N )

1 . We vary possible

choices for N in Table4.2and observe that the highest improvement of 1.9× is obtained for N = 16, where one MT can be computed at the cost of 134 bits in the setup phase (2 MTs at the cost of 268 bits) as shown in Table 4.2. Adding the 4 bits for the evaluation of AND gates in the online phase, the total communication cost of a single AND gate is 138 bits.

N 4 8 16 32 64 128 256

#MTs 1 1.5 2 2.5 3 3.5 4

2-MT 256 384 512 640 768 896 1 024

N -MT 194 223 268 339 438 759 1 271

Improvement 1.32 1.72 1.91 1.89 1.75 1.18 0.81

Table 4.2: Communication for generating MTs using 2-MT (cf. §4.3.2) and our N -MT, based on our optimized N₁ OT protocol (cf. §3.2.4) of [KK13]. Best results marked in bold.

Efficiency. To pre-compute one MT using the N -MT technique, each party has to perform approximately 0.75 PRG evaluations and 4.25 CRF evaluations, instantiated using AES-256 with key-schedule (cf. §3.2.4.2), and both parties have to send 134 bits. In contrast, for 2-MT each party has to perform only 3 CRF evaluations, instantiated using the more efficient fixed-key AES-128, but both parties have to send 254 bits. Thus, we obtain a computation vs. communication trade-off, similar as for 2₁ OT extension vs. N₁ OT extension (cf. §3.5.5).

In document Reformismo en la educación colombiana: Historia de las políticas educativas 1770-1840 (página 52-55)