• No se han encontrado resultados

El menú Auto Power Down

To prove correctness of our algorithm, the next theorem summarizes key properties of the partitioning {T1,…,Tk} produced by the PP algorithm when SA‟ =

SA. Property (i) ensures that Ti is as balanced as T and Property (ii) ensures that

condition 1i < 2 holds, which is a problem requirement in Definition 4. We

handle the SA‟  SA case immediately after Theorem 4 in Theorem 5.

Theorem 4 (Partitioning properties when SA’ = SA). Let SA‟ = SA. If 1i is the

maximum relative frequency in sub-table Ti, 2 is the bound on posterior

knowledge Pr[X = x | Y = y], {T1,…,Tk} is a partitioning of T returned by the PP

algorithm, and  = |T|/fmax, where fmax is the maximum frequency of SA- values in T, then

(i) Ti is -balanced wrt SA and 1i ≤ 1/

(ii) If 2 > 1/, 1i < 2

Proof: We will prove (i) by showing 1i ≤ 1/, i.e., Ti is -balanced wrt SA. Note

that (ii) immediately follows from (i), since we are given 2 > 1/ and we can

substitute “< 2” for “≤1/” in 1i ≤ 1/ to get the desired result. We will prove (i),

i.e., show 1i ≤ 1/, in several steps and provide details for each step after

Equation (54).

Let fi and fj be frequencies of the same SA-value in initial groups gi and gj

(52)

(53)

(54)

Equation (52) says a relative frequency in a merged group can never be larger than the maximum relative frequency in an initial group prior to merging. We prove this inequality holds by contradiction. Assume is larger than (this

proof also works if we assume is larger). For the purpose of contradiction,

assume . We cross-multiply and simplify to get and we divide both sides by to get , which contradicts our

assumption that

; therefore, Equation (52) must hold.

Equation (53) is derived from the fact that both gi and gj are -balanced

(Lemma 4), so we know that the maximum relative frequency of either gi and gj

must be (Definition 5); i.e.,

.

Finally, Equation (54) holds because is a general expression for any relative frequency in a merged group Ti, which we can replace with the

specific maximum relative frequency of Ti, namely , to get the desired result:

. 

A question remains: how likely is it that the condition 2 > 1/ in Property

(ii) of Theorem 4 holds? The answer is as likely as the gap 21 is greater than

1/  fmax/|T|, which is the gap created by the floor function  = |T|/fmax. In fact, if 2 1/ > 1 – fmax/|T|, then 2 > 1/ follows because 1 ≥ fmax/|T|

(Equation (8)). In practice, 2 1/ > 1 – fmax/|T| normally holds.

The next theorem is the correctness counterpart of Theorem 4 for the case of SA‟  SA. Notice a difference in Theorem 5 (i) from Theorem 4 (i). In Theorem 5 (i) we say Ti is “nearly” -balanced wrt SA‟. We say this because

approaches when  approaches 1 and the definition of  = 1 / (1 –

(1/‟)) given in Theorem 5 implies  approaches 1 when ‟ = |T‟|/fmax is large. Note that we expect ‟ to be large when SA‟  SA because we expect SA‟ to contain SA-values with small frequencies (including the maximum frequency in SA‟, fmax).

Theorem 5 (Partitioning properties when SA’ SA). Let SA‟  SA. If 1i is the

maximum relative frequency in sub-table Ti, 2 is the bound on posterior

algorithm, and  = 1 / (1 – (1/‟)), ‟ = |T‟|/fmax, where fmax is the maximum frequency of SA‟-values in T (and T‟), then

(i) Ti is nearly -balanced wrt SA‟ and 1i ≤ /()

(ii) If 2 > /( ), then 1i < 2

Proof: We will prove (i) by showing 1i ≤ /( ). Note that (ii) immediately

follows from (i), since we are given 2 > /( ) and we can substitute “< 2” for

“≤ /( )” in 1i ≤ /( ) to get the desired result. We will prove (i) in several

steps and provide details for each step after Equation (57).

Let fi and fj be frequencies of the same SA-value in initial groups gi and gj

before merging. We have

(55)

(56)

(57)

Equation (55) says a relative frequency in a merged group can never be larger than the maximum relative frequency in an initial group prior to merging, as in the proof of Theorem 4 (Equation (52)).

Equation (56) follows from Lemma 5, which says when SA‟  SA, an SA‟- value in an initial group has a relative frequency less than or equal to /(), so we know that the maximum relative frequency of either gi and gj must be

  ; i.e.,

.

Finally, Equation (57) holds because is a general expression for any relative frequency in a merged group Ti, which we can replace with the

specific maximum relative frequency of Ti, namely , to get the desired result:

. 

Now that we have shown our PP algorithm is correct, next let us consider the time complexity.

Theorem 6 (Time complexity). Let n be the size of the dataset T, i.e., n = |T|, let

m be the domain size of SA, and let t be the number of initial groups generated by the balancing phase of the PP algorithm. The time complexity of the PP algorithm is .

Proof: The total time complexity is made up of the sum of the time complexities of

(58)

The balancing phase first requires the m SA-values to be sorted, which takes time. Then each of the n records is examined only one time, taking time. Therefore,

The rearranging phase first multiplies two t × m matrices A × AT (recall A represents the t initial groups g1,…,gt), which takes time. Then the

resulting t × t matrix A × AT matrix is used as input to the Reverse Cuthill-McKee algorithm, taking time [23]. Therefore,

Finally, the merging phase involves running the dynamic programming algorithm in Figure 14. For the input sequence of size t, first all the values (g[i..j]),  i < j, can be computed in a pre-processing step, which takes time. Then  i, at most t values of r are evaluated in the recursion in Step 2 of Figure 14, taking time. Therefore,

Hence, using Equation (58), we can say the time complexity is

Since , adding to leaves us with a term , which can be simplified to ). Therefore, the overall time complexity is

as desired. 

Our experiments show that the number of initial groups, t, is quite small on real life datasets (no more than 20). This is because the balancing phase

maximizes the size of each initial group. Therefore, the PP algorithm is linear in the cardinality n of T.

Documento similar