• No se han encontrado resultados

DEL ECUADOR.

1. Aspectos metodológicos de los inventarios

To examine the performance of the proposed protocols, we need a lower bound on the amount of communication required. We can obtain an information theoretic lower bound by considering how many bits are necessary for B to be able to computea, given that B already holdsb. If A already knewb

in addition toa, A could send a single message containing sufficient information for B to deriveafrom

b. If, for anyb, there are at mostmstrings thatacould possibly be, thenlogmbits are necessary to distinguish between them. The number of such possible strings is determined by the distance between

aandb. We shall denote bykthe quantity|σ| −1, whereσis the alphabet from which our strings are drawn.

Lemma 6.2.1 The amountSof communication needed to correcthHamming differences between stringsaand

b(|a|=|b|=n) is bounded in bits as:hlog(nk/h)≤S≤h(log(nk/h) + 2), providedh≤n/2.

Proof. The number of strings which satisfyd(a, b) =hcan be found exactly: we can choosehlocations out of thenand make a change there. Each of these characters can be changed to any one of|σ| −1 =k

different values. This gives precisely(n

h)khdifferent strings. We show(nh)kh≥ knh h

by induction inh

which follows since(1 + 1/h)h(n−h)2(n−h)≥nfor all2≤h≤n/2. The base caseh= 1gives equality. For the upper bound, we can similarly show that(n

h)kh≤ kenh h

also by induction inh.1 Lemma 6.2.2 The amountSof communication needed to correcteedit operations is bounded as:elog(2(|σ| −

1)n/e)≤S ≤e(log(2|σ|(n+ 1)/e) +log logn), wherendenotes the length of the stringa.

Proof. The lower bound follows by considering strings generated by only changes or insertions. The

number of such strings that can be formed fromawitheinsertions or changes can be found by choosing

ilocations to make an insert and(e−i)locations to make a change. This is at leastei=0(in)ki(en−i)ke−i.

This expression simplifies tokee

i=0(ni)(en−i), which is justke(2en). Using the methods of Lemma 6.2.1

bounds this quantity from below with(2kne )e. Taking the logarithm of this expression gives the required lower bound.

For the upper bound, we instead consider a bitwise encoding of the operations. Clearly, if we can encode all possible sequences of e edit operations in some number of bits, this upper bounds the amount of communication needed. We consider the operations in order of the occurrence (left to right). For the i’th difference, we encode the distance di from the last difference with

logdi bits, plus log logn bits to encode the length of the encoding of di. The total cost of this is

i(logdi+log logn) =elog logn+e

i(logdi)/e. Using Jensen’s inequality, this is no more than

elog logn+elogi(di/e)≤e(log((n+ 1)/e) +log logn)

It is possible to use a bit flag to denote whether each edit is an insertion or a change, requiring a furtherebits. We then uselog|σ|bits to code the character concerned. In the case of a deletion, this is represented as a ‘change’, but using the code of the existing character at that location. The total cost

of this encoding is given bye(log(|σ|(n+ 1)/e) +log logn) +e. Summing these two costs gives the

upper bound, as required.

For each of the other distances, we can take this idea of encoding each edit operation using a certain number of bits to get a bound on the amount of communication needed.

Lemma 6.2.3 The amount S of communication needed to exchange strings or permutations such that the relevant distance between the two items isdis bounded in the following ways for the following distance measures (|a|=n)

i) Compression distances: S≤9dlog|a|+|b|

ii) Tichy’s distance: S 2dlogn

iii) LZ distance: S 2dlog|a|+|b|

iv) Edit Distance with moves: S≤3dlog 2n

v) Reversal distance: S 2dlogn

vi) Transposition distance: S 3dlogn

vii) Swap distance: S≤dlogn

viii) RITE distance: S≤3dlog 2|a|+|b|

ix) Permutation Edit distance:S≤2dlogn

Proof. In each case we show a simple bitwise encoding of the relevant operations to give the corre-

sponding upper bound.

i) Compression distance: We show a bitwise encoding of the allowed operations to give the upper bound. A copy or move operation can uselog|a|bits to specify each of the start, length of substring and destination. Uncopies, or other operations can be encoded using fewer bits. Provided|a|+|b|is more than|σ|we will require no more than3dlog3(|a|+|b|)bits fordoperations. In order to show that each operation can be speicified usingO(logn)bits, we claim that any intermediate string consists solely of substrings ofaorb. Clearly, this is true at the start and the end of the editing operation. Then note that any block operation (copying, moving or uncopying) which operates on a string containing only substrings ofaandbresults in a substrings consisting of substrings ofaandb. Creating a sub- string that does not belong to either the source or destination string can be accomplished by character operations, but we must remove all such substrings at the end, and so these operations are superfluous, and can be removed from any optimal transformation. Hence, intermediate strings can be parsed into substrings ofaandb. There is no need for any intermediate string to contain more than one copy of any substring ofaorb. The total length of such substrings isO(n3), and so the length of any interme- diate substring never exceeds this. Hence locations in each intermediate string can be specified using

O(logn)bits.

ii) Tichy’s distance: Each operation consists of copying a block from the original string. The cost of describing a copy is at most2lognbits.

iii) LZ distance As with Tichy’s distance, each operation is a copy of a block from one or other of the strings, totalling at most2lognbits.

iv) Edit distance with moves A substring move operation requires3lognbits to describe the block being moved and its new location. The other operations require fewer bits to encode since|σ| ≤n. An additional 2 bits are required to flag whether the operation is an insertion, deletion, change or move. v) Reversal Distance: There aren(n−1)/2distinct reversals, so any one can be specified in2logn

bits.

vi) Transposition Distance There are n+13

distinct transpositions, so any one can be specified in

3lognbits.

vii) Swap Distance: There are(n−1)distinct swaps, so any one can be specified inlognbits. viii) RITE distance: The sequence need never exceedmax(|a|,|b|)in length, given a suitable organi- sation of the operations. This is because if the sequence does exceed this length, then there is a char-

acter being inserted that is subsequently deleted. We will usento denotemax(a, b). Each of the op- erations can be specified in at most3lognbits, to describe its location (start, end and destination), plus an additional 3 bits to describe which of the five possible operations it is. This gives a cost of

3(logn+ 1) = 3log 2nbits.

ix) Permutation edit distance: Each operation moves one element to a new position, and so can be

described using2lognbits.

Documento similar