• No se han encontrado resultados

1.4. DELIMITACIÓN

2.1.4. FUNDAMENTACIÓN TECNOLÓGICA

2.1.4.4. El Briefing

We search the nodes in the inverse itemset enumeration tree in a depth-first traversal to identify the most general undecorated threats. If the itemset IS(c) of the node c currently being visited is a threat, which can be told by counting TS(c), and is not a specialization of any previously found threat, IS(c) is a most general threat and will be materialized. We prune the subtree rooted at c if we can tell that there is no most general threat in the subtree, since we are only interested in finding the most general threats. Pruning is critical as such a tree is huge. In the following, we present a series of pruning techniques.

Baseline pruning: We can prune the subtree rooted at c if IS(c) is beyond the

power of any adversary’s knowledge by Observation 4.1, or if there is no sensitive item in TS(c) and we only consider the sensitive item disclosure (k = 1). Under these

conditions, there will be no privacy threat in the subtree. We call it the baseline pruning as it is far from enough.

Pruning I (Global pruning): If IS(c) is a specialization of a previously

materialized threat, we can prune the subtree rooted at c and the subtrees rooted at the c’s siblings (in the inverse itemset enumeration tree) whose items are taxonomic descendants of c’s item since it is impossible for any node in such subtrees to represent a most general threat.

Example 4.10: Node (Liquor, 3) in Figure 4.2 represents {Liquor}. As {Liquor}

is the first threat found by the depth-first search, node (Liquor, 3) is materialized. Afterwards, any node holding Liquor, Beer, or Wine will be pruned by Pruning I. ■

Note that Pruning I comes with the computational overhead of globally checking if IS(c) is a specialization of any threats materialized so far, which is the reason we call it the global pruning. The second and third pruning techniques below try to avoid as many such global checks as possible, and hence to improve the efficiency.

Pruning II (Local pruning): If IS(c) is a threat and c is not pruned by Pruning I

then IS(c) is a most general threat, we keep c on the tree and stop growing its children since they represent the specializations of IS(c), and prune c’s siblings (in the inverse itemset enumeration tree) whose items are taxonomic descendants of c’s item.

Example 4.11: Continue with the example for Pruning I. What Pruning II does is

to only prune the siblings of node (Liquor, 3), i.e., node (Beer, 2) and node (Wine, 2), which comes with little computational overhead. Any other node holding Liquor, Beer, or Wine will be left to the global pruning. This is why it is called the local pruning. ■

Theorem 4.2: For any node c and its parent node p, if IS(c) and IS(p) have the

same support, there is no most general threat in the subtree rooted at c.

Proof: We show that for any node v in the subtree rooted at c, there is a node w

(≠v) in the subtree rooted at p such that (1) IS(v) is a specialization of IS(w), and (2) if IS(v) is a threat, so is IS(w). This means that v will never represent a most general threat. Below, we construct the node w.

First, TS(c) is the subset of transactions in TS(p) that support node c’s item. Since IS(c) and IS(p) have the same support, i.e., |TS(c)| = |TS(p)|, we have TS(c) = TS(p). Second, let Z = IS(v) – IS(c). Every item in Z is listed before the item of p in the imposed ordering by the construction of the tree, so there is a node w in the subtree rooted at p such that IS(w) = IS(p) ∪ Z. Clearly, IS(w) is derived by removing node c’s item from IS(v). Therefore, w ≠ v and point (1) holds.

Notice that TS(v) is the subset of transactions in TS(c) that support Z, and TS(w) is the subset of transactions in TS(p) that support Z. Since TS(c) = TS(p), we have TS(v) = TS(w), which implies that if IS(v) is a threat, so is IS(w). Therefore, point (2) holds. ■

Pruning III (Generic pruning): For a child node c and its parent node p, if IS(c)

and IS(p) have the same support, the subtree rooted at c can be pruned by Theorem 4.2. We call it the generic pruning as it depends only on supports of itemsets and is

independent of the privacy requirement (i.e., regardless of the values of k, l, and m).

Example 4.12: As a trivial example in Figure 4.2, on the path from the null root

support of 4. By pruning III, node (Nutrient, 4) can be pruned without missing any most general threat. ■

The fourth pruning can drastically reduce the number of privacy threats to be materialized. We will show the effectiveness of all the pruning technique in the experimental evaluation.

Pruning IV (Permanently locked item pruning): If a threat only consists of the

taxonomic children of an item x, specializing x will always introduce the threat because it results in a specialization of the threat. Therefore, x will not be specialized by any valid cut. For this reason, we call such x a permanently locked item and we can remove all the taxonomic descendants of such a permanently locked x from the inverse itemset

enumeration tree.

As a consequence of Pruning IV, only a subset of the most general threats will be materialized. If we exclude any cuts containing the descendants of permanently locked items (which are invalid cuts) in the process of searching an optimal cut, this subset of the most general threats is sufficient for the validity check of any other cut.

Example 4.13: In the motivating example, {Liquor} is a threat, so Liquor’s

taxonomic parent Nutrient is a permanently locked item. By pruning IV, all taxonomic descendants of Nutrient, i.e., Liquor, Beer, Wine, Dairy, Milk, and Yogurt can be

excluded from further consideration. Therefore, {Liquor}, one of the most general threats, will not be included in the subset of the most general threats materialized, and so forth. ■

Documento similar