Análisis de datos - Metodología 1 Muestra

4. ESTUDIO DE LA PRISIÓN DE MÁLAGA: DATOS ESTADÍSTICOS DE LAS MUJERES EN LA PRISIÓN DE MÁLAGAEL PERFIL DE LA MUJER EN

4.1. Metodología 1 Muestra

4.1.4. Análisis de datos

In this section, we prove some useful properties related to MDS array codes with optimal update. Let A = (ai,j) be an array of size p×k over a finite field F, where i ∈ [0,p−1],j ∈ [0,k−1], and each of its entries is an information element. LetR = {R0,R1, ...,Rp−1}andZ =

{Z0,Z1, ...,Zp−1}be two sets such thatRl,Zl are subsets of elements in Afor all l ∈ [0,p−1]. Then for all l ∈ [0,p−1], define the row/zigzag parity element as rl = ∑a∈Rlαaaandzl =

∑a∈Zlβaa, for some sets of coefficients{αa},{βa} ⊆F. We callRandZas the sets that generate

the parity columns.

An MDS array code over Fq withr parities is said to beoptimal-update if in the change of any information element onlyr+1elements are changed in the array. It is easy to see thatr+1 changes is the minimum possible number because if an information element appears onlyrtimes in the array, then deleting at mostrcolumns will result in an unrecoverabler-erasure pattern and will contradict the MDS property. A small finite-field size is desirable because we can update a small amount of information at a time if needed, and also get low computational complexity. Therefore we assume that the code is optimal-update, while we try to use the smallest possible finite filed. When r = 2, only 3elements in the code are updated when an information element is updated. Under this assumption, the following theorem characterizes the setsRandZ.

Theorem 4.7 For a(k+2,k)MDS code with optimal update, the setsRandZare partitions ofA into pequally sized sets of sizek, where each set inRorZcontains exactly one element from each column.

Proof: Since the code is a (k+2,k) MDS code, each information element should appear at least once in each parity column Ck,Ck+1. However, since the code has optimal update, each

element appears exactly once in each parity column.

Let X ∈ R, note that if X contains two entries of A from the systematic column Ci, i ∈ [0,k−1], then rebuilding is impossible if columnsC_iandC_k+1are erased. ThusXcontains at most

one entry from each column, therefore|X| ≤k.However each element ofAappears exactly once in each parity column, thus if|X| < k, X ∈ R, there is Y ∈ R, with|Y| > k, which leads to a contradiction. Therefore, |X| = kfor all X ∈ R. As each information element appears exactly once in the first parity column,R={R0, . . . ,Rp−1}is a partition ofAinto pequally sized sets of

sizek. Similar proof holds for the setsZ={Z0, . . . ,Zp−1}.

By the above theorem, for the j-th systematic column (a_0,j, . . . ,ap−1,j)T, its p elements are contained inpdistinct setsRl,l∈ [0,p−1]. In other words, the membership of thej-th column’s elements in the sets {R_l}defines a permutationgj : [0,p−1] → [0,p−1], such thatgj(i) = l iffai,j ∈ Rl. Similarly, we can define a permutation fj corresponding to the second parity column, where fj(i) =liffai,j ∈ Zl. For example, in Figure 4.2 each systematic column corresponds to a

permutation of the four symbols.

Observing that there is no significance in the elements’ ordering in each column, w.l.o.g. we can assume that the first parity column contains the sum of each row of Aandgj’s correspond to identity permutations, i.e.,ri =∑kj=−01αi,jai,j for some coefficients{αi,j}.

First we show that any set of zigzag sets Z = {Z0, ...,Zp−1}defines a(k+2,k)MDS array

code over a fieldFlarge enough.

Theorem 4.8 LetA = (ai,j)be an array of sizep×kand the zigzag sets beZ= {Z0, ...,Zp−1}, then there exists a(k+2,k)MDS array code forAwithZas its zigzag sets over the fieldFof size greater thanp(k−1) +1.

In order to prove Theorem 4.8, we use the well-known Combinatorial Nullstellensatz by Alon [Alo99]:

Theorem 4.9 (Combinatorial Nullstellensatz) [Alo99, Th 1.2] LetFbe an arbitrary field, and let f = f(x1, ...,xq)be a polynomial inF[x1, ...,xq]. Suppose the degree of f isdeg(f) = ∑q_i=1ti, where eachtiis a nonnegative integer, and suppose the coefficient of∏qi=1x

i in f is nonzero. Then,

ifS1, ...,Snare subsets ofFwith|Si|> ti, there ares1∈ S1,s2 ∈S2, ...,sq∈Sqso that

f(s1, ...,sq)6=0.

Proof: [Proof of Theorem 4.8] Assume the information of Ais given in a column vectorW

of length pk, where columni ∈ [0,k−1]of Ais in the row set [(ip,(i+1)p−1]ofW. Each systematic nodei,i∈[0,k−1], can be represented asQiWwhereQi = [0p×pi,Ip×p, 0p×p(k−i−1)].

Moreover defineQk = [Ip×p,Ip×p, ...,Ip×p],Qk+1 = [x0P0,x1P1, ...,xk−1Pk−1]where thePi’s are permutation matrices (not necessarily distinct) of size p×p, and the xi’s are variables, such that

Ck = QkW,Ck+1 = Qk+1W. The permutation matrixPi = (p

(i)

l,m)is defined as p

(i)

l,m = 1if and only if am,i ∈ Zl. In order to show that there exists such MDS code, it is sufficient to show that there is an assignment for the intermediates {xi}in the fieldF, such that for any set of integers

{s1,s2, ...,sk} ⊆ [0,k+1] the matrix Q = [QTs1,Q

T s1, ...,Q

sk] is of full rank. It is easy to see

that if the parity column Ck+1 is erased, i.e., k+1 /∈ {s1,s2, ...,sk} then Q is of full rank. If

k ∈ {/ s1,s2, ...,sk}andk+1 ∈ {s1,s2, ...,sq}then Qis of full rank if none of the xi’s equals to zero. The last case is when bothk,k+1∈ {s1,s2, ...,sk}, i.e., there are0≤i<j≤k−1such that

i,j∈ {/ s1,s2, ...,sk}.It is easy to see that in that caseQis of full rank if and only if the submatrix Bi,j =   xiPi xjPj Ip×p Ip×p  

is of full rank. This is equivalent todet(Bi,j)6=0. Note thatdeg(det(Bi,j)) = pand the coefficient ofx_ipisdet(Pi) ∈ {1,−1}. Define the polynomial

T=T(x0,x1, ...,xk−1) =

∏

0≤i<j≤k−1

det(Bi,j),

and the result follows if there are elements a0,a1, ..,ak−1 ∈ F such thatT(a0,a1, ...,ak−1) 6= 0. T is of degree p(₂k)and the coefficient of∏_ik₌−₀1x_ip(k−1−i) is∏k_i₌−₀1det(Pi)k−1−i 6= 0. Set for any

i,Si =F\0in Theorem 4.9, and the result follows.

The Theorem 4.8 states that there exist coefficients such that the code is MDS, and thus we will focus first on finding proper zigzag permutations{fj}. The idea behind choosing the zigzag sets is as follows: assume a systematic column(a0,j,a1,j, ...,ap−1,j)T is erased. Each element ai,j is rebuilt either by row or by zigzag. The setS = {S0,S1, ...,Sp−1}is called a rebuilding set for

column(a0,a1, ...,ap−1)T if for eachi,Si ∈ R∪Zandai ∈ Si. In order to minimize the number of accesses to rebuild the erased column, we need to minimize the size of

| ∪_ip₌−₀1Si|, (4.6)

which is equivalent to maximizing the number of intersections between the sets{Si}_ip=−01. More

specifically, the intersections between the row sets inSand the zigzag sets inS.

For a (k+2,k) MDS code C with p rows define the rebuilding ratio R(C) as the average fraction of accesses in the surviving systematic and parity nodes while rebuilding one systematic node, i.e.,

R(C) = ∑jminS0,...,Sp−1rebuildsj| ∪ p−1

i=0 Si|

p(k+1)k .

Notice that in the two parity nodes, we access p elements because each erased element must be rebuilt either by row or by zigzag, however∪_ip₌−₀1Si contains p elements from the erased column. Thus the above expression is exactly the rebuilding ratio. Define theratio functionfor all(k+2,k)

MDS codes withprows as

R(k) =min

C R(C),

which is the minimal average portion of the array needed to be accessed in order to rebuild one erased column. By (4.1), we know thatR(k)≥1/2. For example, the code in Figure 4.4 achieves the lower bound of ratio1/2, and thereforeR(3) = 1/2. Moreover, we will see in Corollary 4.17 thatR(k)is almost1/2for allkandp=2m_{, where}_m_{is large enough.}

So far we have discussed the characteristics of an arbitrary MDS array code with optimal update. Next, let us look at our code in Construction 4.1.

Recall that by Theorem 4.8 this code can be an MDS code over a field large enough. The ratio of the constructed code will be proportional to the size of the union of the elements in the rebuilding set in (4.6). The following theorem gives the ratio for Construction 4.1 and can be easily derived from Lemma 4.4 part (i). Recall that given vectorsv0, . . . ,vk−1, we write fi = fvi andXi =Xvi.

Theorem 4.10 The code described in Construction4.1and generated by the vectorsv0,v1, ...,vk−1 is a(k+2,k)MDS array code with ratio

R= 1

∑k−1

i=0 ∑j6=i|fi(Xi)∩ fj(Xi)|

2m_k₍_k₊₁₎ . (4.7)

Note that different orthogonal sets of permutations can generate equivalent codes, hence we define equivalence of two sets of orthogonal permutations as follows. LetF ={f1,f2, . . . ,fk−1,f0}

be an orthogonal set of permutations over integers[0,p−1], associated with subsetsX1,X2, . . . , Xk−1,X0. And letΣ = {σ1,σ2, . . . ,σk−1,σ0}be another orthogonal set over[0,p−1]associated

with subsetsY1,Y2, . . . ,Yk−1,Y0. ThenFandΣare said to beequivalentif there exist permutations g,hsuch that∀i∈[0,k−1],

h fig=σi,

g−1(Xi) =Yi.

Note that multiplyinggon the right is the same as permuting the rows of the systematic nodes, and multiplyinghon the left permutes the rows of the second parity node. Therefore, codes constructed usingForΣare essentially the same.

In particular, let us assume that the permutations are over integers [0, 2m−1], and the set of permutationsΣand the subsetsYi’s are the same as in Theorem 4.3: σi = fei,Yi = {x ∈ [0, 2m−

1]:x·ei =0}, andY0 ={x∈ [0, 2m−1]: x·(1, 1, . . . , 1) =0}. Next we show the optimal code

in Theorem 4.3 is optimal in size, namely, it has the maximum number of columns given the number of rows. In addition any optimal-update, optimal-access code with maximum size is equivalent to the construction using standard-basis vectors.

Theorem 4.11 LetFbe an orthogonal set of permutations over the integers[0, 2m−1], (i) the size ofFis at mostm+1;

(ii) if|F|= m+1then it is equivalent toΣdefined by the standard basis and zero vector.

Proof: We will prove it by induction onm. For m = 0 there is nothing to prove. (i) We first show that |F| = k ≤ m+1. It is trivial to see that for any permutations g,h on [0, 2m −

1], the set hFg = {h f0g,h f1g, ...,h fk−1g} is also a set of orthogonal permutations with sets g−1(X0),g−1(X1), ...,g−1(Xk−1).Thus w.l.o.g. we can assume that f0is the identity permutation

andX0= [0, 2m−1−1]. From the orthogonality we get that

∪k−1

i=1fi(X0) =X0 = [2m

−1_{, 2}m₋₁_]_.

We claim that for anyi6= 0,|Xi∩X0|= |X₂0| = 2m−2.Assume the contrary, thus if|Xi∩X0| >

2m−2, then for any distincti,j∈[1,k−1]we get that

fj(Xi∩X0),fi(Xi∩X0)⊆ X0, (4.8)

|fj(Xi∩X0)|=|fi(Xi∩X0)|>2m−2=

|X0|

2 . (4.9)

From equations (4.8) and (4.9) we conclude that fj(Xi∩X0)∩fi(Xi∩X0)6= ∅, which contradicts

the orthogonality property. If|Xi∩X0| > 2m−2 the contradiction follows by a similar reasoning.

Define the set of permutations F∗ = {f_i∗}k−1

i=1 over the set of integers[0, 2m−1−1]by fi∗(x) =

fi(x)−2m−1, which is a set of orthogonal permutations with setsXi∗ ={Xi∩X0},i=1, ...,k−1.

By inductionk−1≤mand the result follows.

(ii) Next we show that if |F| = m+1 then it is equivalent toΣ associated with {Y_i}. Let

F={f1,f2, . . . ,fm,f0}. Take two permutationsg0,h0such that g0−1(X1) =Y1

andh0f1g0(Y1) =Y1. Define f_i0 =h0fig0 for alli∈[0,m]. Then

f₁0(Y1) =Y1,f10(Y1) =Y1.

The new set of permutations{f_i0}m

i=0 is also orthogonal with subsets{g0−1(Xi)}im=0, so fi0(Y1)∩ f₁0(Y1) =∅. Hence for alli6=1

f_i0(Y1) =Y1 = [0, 2m−1−1],

f_i0(Y1) =Y1 = [2m−1, 2m−1].

By similar argument of part (i), we know{f₂0, . . . ,f_m0,f₀0}restricted toY1(or toY1) is an orthogonal

set of permutations, associated with subsetsY1∩g0−1(Xi)(or withY1∩g0−1(Xi), respectively),

i6=1. By the induction hypothesis, there exist permutations p,qoverY1such that fori6=1

σi = p fi0q,

q−1(Y1∩g0−1(Xi)) = Y1∩Yi, (4.10) whereσi,fi0 are restricted toY1. Similarly, there exist permutationsr,soverY1such that fori6=1

σi = r fi0s,

s−1(Y1∩g0−1(Xi)) = Y1∩Yi, (4.11) whereσi,fi0 are restricted toY1. Define permutation g00 over[0, 2m−1]as the union of qands: g00(x) =q(x)ifx∈Y1, andg00(x) =s(x)ifx∈Y1. Also defineh00over[0, 2m−1]as the union

ofp andr. Sog00,h00mapY1(orY1) to itself. We will show that {fi}mi=0is equivalent to Σusing g=g0g00andh =h00h0. Fori6=1, this is obvious from (4.10)(4.11). Fori=1, we have

g−1(X1) =g00−1g0−1(X1) =g00−1(Y1) =Y1.

We know σi = h fig, for i 6= 1. Let f = h f1g and we will show f = σ1. By orthogonality f(Yi)∩σi(Yi) =∅fori6=1. It is easy to see that fori∈ [2,m],σi(Yi) =Yi. Hence fori∈ [2,m]

standard basis duplication of standard basis constant weight vectors #sys. nodes m+1 s(m+1) O(mc) ratio 1₂ 1₂+ s−1 2s(m+1)+2 ≈ 1 2+ 2(m1+1) 1 2 + c 2 2m field size 3 s+2 2c+1

Figure 4.5: Comparison among codes constructed by the standard basis and zero vector, by s- duplication of standard basis and zero vector, and by constant weight vectors. The number of systematic nodes, the rebuilding ratio, and the finite-field size are listed. We assume that all the codes have2parities and2m rows. For the duplication code, the rebuilding ratio is obtained when the number of copiessis large. For the constant weight code, the weight of each vector is equal to

c, which is an odd number and relatively small compared tom.

Moreover, by construction f(Y1) =h00f₁0g00(Y1) =h00f₁0(Y1) =h00(Y1) =Y1, so

f(Y1) =Y1,f(Y1) =Y1. (4.13)

Any integerx ∈[0, 2m−1]can be written as the intersection ofYiorYi, for alli∈ [m], depending on its binary representation. For example,x=1means{x}=∩m−1

i=1 Yi∩Ym. For another example

ifx=0then{x}=∩m

i=1Yi, and f({0}) = f(∩mi=1Yi) =∩mi=2Yi∩Y1 ={2m−1}by (4.12)(4.13)

and since f is a bijection. Thus f(0) =2m−1. By a similar argument, f(x) =2m−1+xfor allx

and

f = σ1.

Thus the proof is completed.

Note that by similar reasoning we can show that if |F| = m, it is equivalent to {σ1, . . . ,σm} defined by the standard basis. Part (ii) in the above theorem says that if we consider codes with optimal update, optimal access, and optimal size, then they are equivalent to the standard-basis construction. In this sense, Theorem 4.3 gives theuniquecode. Moreover, if we find the smallest finite field for one code (as in Construction 4.5), there does not exist a code using a smaller field.

Part (i) of the above theorem implies that the number of rows has to be exponential in the number of columns in any systematic code with optimal ratio and optimal update. Notice that the code in Theorem 4.3 achieves themaximumpossible number of columns,m+1. An exponential number of rows can be practical in some storage systems, since they are composed of dozens of nodes (disks) each of which has size in an order of gigabytes. However, a code may corresponds to only a small portion of each disk and we will need the flexibility of the array size. The following example shows a code of flatter array size with a cost of a small increase in the ratio.

Example 4.12 LetT = {v ∈ _Fm

2 : kvk1 = 3}be the set of vectors with weight 3 and lengthm. Notice that|T| = (m₃). Construct the codeC by Taccording to Construction4.1. Givenv ∈ T,

|{u∈ T:|v\u|=3}|= (m−₃3), which is the number of vectors with 1’s in different positions than v. Similarly,|{u ∈ T : |v\u| = 2}| = 3(m−₂3)and|{u ∈ T : |v\u| = 1}| = 3(m−3). By Theorem4.10and Lemma4.4, for largemthe ratio is

1 2 + 2m−1₍m 3)3( m−3 2 ) 2m₍m 3)(( m 3) +1) ≈ 1 2 + 9 2m.

Note that this code reaches the lower bound of the ratio asmtends to infinity, and hasO(m3) columns. More discussions on increasing the number of columns is presented in the next section.

In document Mujer y prisión (página 44-55)