• No se han encontrado resultados

OEEs provide a way to safely execute simple computations on small amounts of data, without side channels. However, to answer complex queries over larger data, Hermetic also needs higher-level operators, which we describe next.

4.4.1 Background

Prior work [13,130] has already developed a number of oblivious operators, which are guaranteed to access the data in a deterministic way. A simple example would be a projection operator that performs a linear scan over all the input tuples and extracts a particular column of interest. is already addresses a subset of the MC, since the data is accessed in a deterministic, data-independent order, so we use these operators as a starting point. However, note that there are other kinds of memory accesses (to code, to the stack, etc.), and that thetiming of the accesses can be disclosive as well;

we will discuss how we address these shortly.

Previous work includes some operators that are relatively standard; for instance,

project,rename,union, andcartesian-productare similar to the operators found in any standard DBMS, and they are oblivious by nature, e.g.,cartesian-product

considers all different pairs of tuples from two relations in a data-independent way. However, there are some additional primitives that are needed for oblivious oper- ation. e linchpin of oblivious query processing is an oblivious sort primitive. Classical sorting algorithms, such as Quicksort, would leak the ordering of the data elements, so this requires special algorithms, such as Batcher’s odd-even mergesort (batcher-sort) [22], that access the data in a deterministic order.

Building on oblivious sorting, one can implement a variety of low-level oblivious primitives. augment adds a new column to a relation and sets it to the value of a particular expression; grsum (group-running sum) adds up the values in a column for fields that share the same key; filter discards all tuples that do not satisfy a predicate;semijoin-aggregationcounts the occurrences of one relation’s attribute in another relation; expand createsk clones of each tuple, where kis a value that is taken from a special column; and stitch linearly “stitches together” two relations of the same size by combining their columns.

Higher-level operators can be implemented by combining the above primitives. For instance, to implement a join, one first uses semijoin-aggregation on each relation to compute how many matches each tuple has in the other relation, then performs an expand to create as many clones of each tuple, and applies stitch to create the result. Similarly, a groupby would first sort the relation by the group key, then apply grsum to add up the values in the column that is being aggregated, and finally use filter to leave only the last tuple for each key, which contains the total sum. ese primitives, as well as the algorithms for building larger query plans from these smaller primitives, are fairly standard, so we do not discuss them in detail; instead, we focus on the points where Hermetic differs from prior work.

4.4.2 Challenges

ere are three key reasons why the existing operators are not sufficient for our pur- poses. e first and simplest one is that prior work tends to specify the operators in pseudocode, whereas the TC and IC very much depend on the finer details of the implementation. For instance, somex86instructions, such as floating-point in- structions and integer division, have data-dependent timing, and must be avoided to prevent the TC; similarly, to prevent the IC, we must avoid data-dependent branches (e.g., by using thecmovinstruction) and preload the code and the stack. e neces- sary steps are known, and we do not claim them as a contribution.

e second reason is that some prior work [13] often assumes a client-server model: the data is stored on the server and queries are processed on the client, which can issue read and write requests to the server. e threat model typically limits the adversary to observing thesequence of reads and writes, which excludes the TC and IC entirely and limits the MC to only data accesses. In other words, accesses to code or the stack are not considered, and it is assumed that the adversary cannot observe the precise timing of the accesses. In Hermetic, we make use of the OEE to block these channels; however, since the OEE is limited to small data sets, we also need operators to handle larger workloads.

e third reason is that, to our knowledge, prior work pays almost no attention to the OC: a selecteither reveals the exact number of rows that match the predicate, which is disclosive, or massively pads the output to the worst-case size, which is inefficient. To address this, we introduce a new technique, which we discuss next.

4.4.3 Dummy rows

To address the OC, Hermetic can pad relations with dummy tuples, so they appear larger (to the adversary) than the actual data tuples they contain. In essence, we add a special isDummycolumn to each relation, which is set to 1 for dummy tuples and to 0 otherwise, and we augment all the oblivious operators to ignore tuples that have

isDummy set to 1. e latter is necessary to maintain correctness: for instance, we must prevent the dummy tuples from appearing in the output of aselect, and we achieve this by adding an&& !isDummyto each predicate in the filter.

Hermetic also needs a way to introduce dummy tuples in all the operators that have a variable-size output (i.e., whose output size is not a deterministic function of their input size) – specifically, select, groupby, andjoin. We modified the above three operators to take an extra parameter that specifies the number of dummy tuples to add. For details please refer to AppendixC.2.

4.4.4 Differential privacy

To ensure differential privacy, the number of dummy tuples that a given operator must add needs to be drawn from a Laplace distribution with parameter λ =s/ε, whereε is the privacy parameter (Section4.2.2) andsis thesensitivity of the output

size to the input data – the maximum change in the output size that can result from adding or removing one input tuple. e sensitivity can be determined by the query planner (Section 4.5), but the actual Laplace value has to be drawn in the enclave, when the query is being executed. It is critical that this draw be immune to our four side channels as well: if the adversary can learn the value that is being drawn, she can compute the actual number of data tuples in the output and thus potentially learn facts about the input data.

To guard against this, we developed a special operator that can draw values from a Laplace distribution in constant time, with a deterministic control flow and with- out accessing main memory. To some degree, this involves only careful program- ming, e.g., to ensure that only static loop bounds are used; however, one important challenge is that the floating-point instructions onx86cannot be used because they have data-dependent timing [12]. We instead used a fixed-point math library based on [12], which uses integer instructions.

𝐶 𝑇 𝑃 𝜎age𝜖=0.01⋈cid 𝜖=0.01 ⋈location 𝜖=0.01 𝜎category𝜖=0.01 𝐶 𝑇 𝑃 𝜎age𝜖=0.001⋈cid 𝜖=0.001⋈location 𝜖=0.001 𝜎category𝜖=0.001 𝐶 𝑇 𝑃 𝜎age𝜖=0.001 ⋈cid𝜖=0.001 ⋈location𝜖=0.001 𝜎category𝜖=0.001 𝐶 𝑇 𝑃 𝜎age, category, cid, location 𝜖=0.001 × × (a) (b) (c) (d) Leakage  query  𝑄)*+,: SELECT  count(*)       FROM  Customer  C   WHERE  age  ≤27

Figure 4.2: Different plans for query ”SELECT count(*) FROM C, T, P

WHERE C.cid=T.cid AND T.location = P.location AND C.age27 AND

P.category='hospital'”.

4.4.5 New primitives

Finally, Hermetic needs three additional oblivious primitives. e first two are

histogram, which computes a histogram over the values in a given column, and

multiplicity, which computes the multiplicity of a column – that is, the num- ber of times that the most common value appears. ese operators are used by the query planner to compute statistics about the input data that it needs to find the best privacy-efficiency tradeoff. We discuss these more in Section4.5.

e third new primitive ishybrid-sort, which can obliviously sort large relations by repeatedly invokingto sort smaller chunks of data using a fast, non-oblivious sorting algorithm (mergesort), and to combine the results by merging chunks of data using a non-oblivious merging algorithm (linear-merge). hybrid-sortis essentially a block-based variant of batcher-sort that takes advantage of the OEE, so the obliviousness properties of batcher-sorttrivially apply tohybrid-sortas well.

Documento similar