• No se han encontrado resultados

CAPÍTULO 4: CARACTERIZACIÓN DEL CASO DE ESTUDIO

4.4 El contexto urbano: Casos de estudio

4.4.3 Medio Sociocultural

It should be noted that given the frequency of enciphered items in the noise column, any exact covering of these occurrences by means of a suitable set of transactions yields a correct realization of our encryption scheme. However, we aim at devising a method for arranging fake transactions that allows for a compact synopsis with a strong protection level.

Given a noise table specifying the noise N (e) needed for each enciphered item e, we generate the fake transactions as follows. First, we drop the rows with zero noise, corresponding to the most frequent items of each group or to other items with support equal to the maximum support of a group. Second, we sort the remaining rows in descending order of noise. Let e01, . . . , e0mbe the obtained ordering of (remaining)

enciphered items, with associated noise N (e01), . . . , N (e0m). The following fake transactions are generated:

• N (e0

1) − N (e02) instances of the transaction {e01}

• N (e0

• . . . • N (e0

m−1) − N (e0m) instances of the transaction {e01, . . . , e0m−1}

• N (e0

m) instances of the transaction {e01, . . . , e0m}

Continuing the example, we consider enciphered items of non-zero noise in Table 7.3. The following two fake transactions are generated: 2 instances of the transaction {e5, e3, e1} and 1 instance of the trans-

action {e5}. Note that even though the attacker may know the details of the construction method, he/she

is not able to distinguish these fake transactions from the true ones, since the attacker does not have any background knowledge of frequency of item sets or of original transaction length distribution.

It can be shown that this method yields a minimum number of different types of fake transactions that equals the number of enciphered items with distinct noise. This observation yields a compact synopsis for the client of the introduced fake transactions.

The purpose of using a compact synopsis is to reduce the storage overhead at the side of the data owner who may not be equipped with sufficient computational resources and storage, which is common in the outsourcing data model.

An adversary may observe the duplicity in fake transactions. However, notice that duplicity may also be present in the real transactions. Second, we assume the attacker only has knowledge of the frequency of items, but not that of itemsets, nor the distribution of transaction lengths in the original database. Thus, the attacker cannot distinguish the fake transactions from the true ones.

As a final remark, we observe that fake transactions introduced by this method may be longer than any transactions in the original TDB D. Recall that the attack model only includes plain items and their exact support in D as the background knowledge of the attacker and not the transaction lengths in D. So, adding longer fake transactions technically does not constitute privacy breach. However, for added protection, we can consider shortening the lengths of the added fake transactions so that they are in line with the transaction lengths in D. In our running examples above, we obtain 2 instances of fake transactions {e5, e3}, 2 of {e1}

and 1 instance of {e5}. These transactions are of length either 1 or 2. We briefly illustrate the idea here. Let

l be the length suggested by RobFrugal for a fake transaction and let l > lmax, where lmaxis the maximum

length of a transaction in D. Then find the largest number q : q ≤ lmaxand one of the following holds: (i)

q divides l evenly, or (ii) l mod q ≈ q, or (iii) l mod q < bl/qc. Here, we can take l mod q ≈ q to be l mod q = q − 1. If conditions (i) or (ii) hold, we simply split the fake transaction of length l into smaller ones of size q or q − 1. If condition (iii) holds, then we create bl/qc transactions of size q. From the remaining set of l mod q items, we add one each to l mod q distinct transactions. So, we will have transactions of size q or q + 1. For example, suppose l = 50 and lmax = 7, the calculated q value equals to 7, i.e., the

fake transaction of length 50 is split into 6 shorter ones of length 6, and 2 of length 7. More generally, there is enough flexibility in our framework to ensure that the distribution of fake transaction lengths is similar to that of the true transaction database. In our experiments (Section 8.3), we experimentally show the effectiveness of the procedure.

In order to implement the synopsis efficiently we use a hash table generated with a minimal perfect hash function[40]. Minimal perfect hash functions are widely used for memory efficient storage and fast retrieval of items from static sets. A minimal perfect hash function is a perfect hash function that maps n keys to n consecutive integers, usually [0 . . . n − 1]. Hence, h is a minimal perfect hash function over a set S if and only if ∀i, j ∈ S, h(j) = h(i) implies j = i, and there exists an integer p such that the range of h is p, . . . , p + |S| − 1. A minimal perfect hash function h is order-preserving if for any keys j and i, j < i implies h(j) < h(i).

In our scheme, the items of the noise table eiwith N (ei) > 0 are the keys of the minimal perfect hash

function. Given ei, function h computes an integer in [0 . . . n − 1], denoting the position of the hash table

storing the triple of values hei, timesi, occii, where:

• timesirepresents the number of times the fake transaction

{e1, e2, . . . , ei} occurs in the set of fake transactions

• occiis the number of times that eioccurs altogether in the future fake transactions after the transaction

7.4. ENCRYPTION/DECRYPTION SCHEME 115

Given a noise table with m items with non-null noise, our approach generates hash tables for the group of items. In general, the i-th entry of a hash table HT containing the item eihas

timesi= N (ei) − N (ei+1)

occi=Pgj=i+1N (ej)

where g is the number of items in the current group. Notice that each hash table HT represents concisely the fake transactions involving all and only the items in a group of g ≤ lmax items. The hash tables for

the items of non-zero noise in Table 7.3 are shown in Table 7.4. Given that in our example, lmax= 2, we

need to split the 3 items of non-zero noise in Table 7.3 into two sets, each with associated fake transactions, coded by the two hash tables. Notice that any pattern consisting of items from different hash tables is not supported by any fake transactions.

Table1 0 he5, 1, 2i

1 he3, 2, 0i

Table2 0 he1, 2, 0i

Table 7.4: Hash tables of items of non-zero noise in Table 7.3

Finally, we use a (second-level) ordinary hash function H to map each item e to the hash table HT containing e.

Note that after the data owner outsources the encrypted database (including the fake transactions), he/she does not need to maintain the fake transactions in its own storage. Instead the data owner only has to maintain a compact synopsis, which stores all the information needed on the fake transactions, for later recovery of real supports of item sets. The size of the synopsis is linear in the number of items and is much smaller than that of the fake transactions.

With the above data structure, we can define the function RS that allows an efficient computation of the real support of a pattern E = {e1, e2, . . . , en} with fake support s (defined by Equation 7.1 in Section

7.4.2) as follows:

RS(E) = s − (HT [h(emax)].times + HT [h(emax)].occ) (7.2)

where: i) emaxis the item in E such that for 1 ≤ j ≤ n, we have h(ej) ≤ h(emax), and ii) HT = H(ei) is

the hash table associated by H to any item eiof E. E.g., in Table 7.4, for E1= {e5}, RS(E1) = s1−(1+2),

whereas for E2 = {e5, e3}, RS(E2) = s2− (2 + 0), where siis the fake support of Ei. This is exactly

right since e5is fakely added 3 times while e3is fakely added 2 times.