Compressed sensing and coding theory

(1)

by

Javier Hern´an Garc´ıa S´anchez

Thesis advisors

Ph.D Mauricio Velasco Gregory Ph.D Mauricio Junca Pelaez

A thesis presented for the degree of Master in Mathematics Departamento de Matem´aticas

Facultad de Ciencias Universidad de los Andes

Colombia 2016

(2)

Acknowledgements

I thank God for His unconditional presence, my advisors Mauricio Velasco and Mauricio Junca for the academic and personal education and help and my family for their constant support.

Also, I thank Tristram Bogart and Val´erie Gauthier for accepting to be part of the jury committee.

(3)

1 Compressed Sensing 6

1.1 Defininition of the problem . . . 6

1.2 Restricted Isometry Property . . . 8

1.3 Null Space Property . . . 12

1.4 Deterministic Construction of Compressed Sensing Matrices . . . 14

1.4.1 First construction . . . 14

2 Coding Theory 17 2.1 A brief introduction to coding theory . . . 17

2.2 Block linear codes . . . 19

2.3 Decoding in binary block linear codes . . . 21

2.4 All-zero codeword assumption . . . 24

2.5 Algebraic codes over curves . . . 28

2.6 Toric codes over surfaces . . . 32

2.6.1 Toric codes . . . 33

3 Construction of Binary Compressed Sensing Matrices from Codes 36 3.1 Description of the construction . . . 36

3.1.1 Second Construction: Sections of sheaves . . . 36

3.1.2 Third Construction: Matrices from Codes . . . 38

3.2 Computational experiments . . . 39

3.2.1 Recovering of deterministic matrices . . . 40

3.2.2 Sparsities of deterministic matrices vs. sparsities of random matrices 41 3.2.3 Time deterministic vs time random . . . 42

3.2.4 Deterministic vs. Random: time and sparsities . . . 42 2

(4)

3

4 Parity Check Matrices and Measurement Matrices 47

4.1 Relaxation to a cone . . . 47 4.2 Relation theorem . . . 51

(5)

The problem originated to the compressed sensing theory was the following [5]:

Let f ∈ _Rl _{be a signal. We want to recover} _f _{from the corrupted signal} _z ₌ _Af ₊_e, whereAis a full rankn×lmatrix withn > lande∈_Rn_{is an unknown vector of errors.} Note that if e =~0, f is easily recovered, becauseA is full rank. So, the problem makes sense whene6=~0. Also, if the fraction of corrupted entries is too large, recovering will be impossible, so it is natural to assume that the number of entries different from zero ineis small.

This problem is very similar to theerror correctingproblem in coding theory. In this case,Ais the matrix whose columns are the codewords in a code,f is called theplaintext and Af called the ciphertext. When the ciphertext is sent through a channel it is cor-rupted and the received message isz = Af +e. We want to recover the original message f from the received messagez. This is equivalent to recovering the unknown error vectore.

In order to recovere, we find a matrixM such thatM A= 0. This gives y=M z =M(Af +e) = M Af+M e=M e.

So, the problem turns into recoveringeknowing the producty =M e, which is the classi-cal compressed sensing problem described in detail in Chapter 1. this description forgets completely the relation with coding theory.

In [5], Emmanuel Candes and Terence Tao show that the random matrices whose their entries come from a Gaussian distribution are good choices for matrixM in the sense that allow recovering of error vectore. This conclusion is supported by a lot of successful ex-periments suggesting, in fact, that these matrices overcome the theoretical quality bounds existing for them.

But the problems come when the size of the signal increases: because Gaussian matri-ces are randomly generated, the probability to obtain zero as an entry is zero. It implies that these matrices are highly non-sparse and, therefore, difficult to treat computationally.

(6)

5

So, in order to reduce the computational cost of the recovering process we desire sparse matricesM.

One of the first sparse compressed sensing matrices appears in [1]. In this paper, De-Vore has constructed {0,1}-matrices with good performance in compressed sensing with respect to theRestrictive Isometry P roperty. Constructions made by DeVore recall the construction of algebraic block linear codes. This motivates us to generalize one of these constructions to algebraic geometric codes over curves and after to general block linear codes, returning to the coding theory origins of compressed sensing.

This document is organized as follows:

Chapter1is devoted to present the principal definitions and results about compressed sensing. Restrictive Isometry Property and Null Space Property are presented there. Also we show one of the DeVore constructions appearing in [1], specifically these which we generalize. Chapter 2gives a brief review of some important concepts in coding theory. Special emphasis is made in the decoding process in binary linear code, in order to support Chapter4 and construction of algebraic codes, which will be useful in the computational experiments of Chapter3. This last chapter describes our new constructions of compressed sensing matrices, which is the main contribution of this project, and presents some ex-periments that aim to compare behaviour of our matrices with respect to these Gaussian matrices from [5]. Chapter 4 presents the results appearing in [10] that relate explicitly compressed sensing and binary coding theory. Finally, Chapter 5 shows some of the re-maining and future work.

(7)

Compressed Sensing

In this chapter we give a general overview of the compressed sensing theory. In Section

1 the compressed sensing recovering signal problem is explained. Section 2 is devoted entirely to explain the Restricted Isometry P roperty [2], one of the few tools existing for the analysis of compressed sensing measurement matrices. Section3explains theN ull Space P roperty, a geometric viewpoint of compressed sensing that fixes necessary and sufficient conditions to exact recovering. Finally, Section4exhibits one of the constructions of deterministic compressed sensing matrices appearing in [1].

1.1 Defininition of the problem

Given a signal represented by a sparse vector x ∈ _RN_{, we want to recover} _x _{by taking} n < N measurements. Each one is represented by a linear functional λi :RN →R. More precisely, the measurements can be represented by a matrixΦof dimensionsn×N, where

Φ =



 

−− λ1 −−

.. .

−− λn −−



 .

and our objective is to recover the vectorxfrom knowing the vector of measuresy:= Φx. The mechanism for recovery is encoded in a function∆ :_Rn →_RN _{such that the error}

kx−∆(y)kX =kx−∆(Φ(x))kX is small in a set of vectors of interest, whereX is some chosen norm.

These functions∆are called decoders. Decoders are generally non-linear functions. For a vectorxin_RN _let_supp(_x₎_{be the set of components}_j _where_x

j is nonzero. For an integerk consider the setΣk ={x∈ RN | |supp(x)| ≤ k}, this set is called the set of

(8)

7

k-sparse vectors in_RN_{. If}_x _{is a}_{k-sparse vector then a natural decoder for}_x_{is to pick a} minimizer of theProblem 1.1.1

min

x0_∈

RN

kx0k0 s.t Φx0 =y, (1.1.1)

wherekx0k0 :=|supp(x0)|.

Proposition 1.1.1. Problem 1.1.1 is in generalN P −hard.

Proof. The proof here described is essentially the same as in [12]. The idea is reduce the problem of3−set exact cover, which isN P −hard, to an instance of Problem 1.1.1.

3−Set Exact Cover P roblem

• Instance: A setS, and a collectionCof3-element subsets ofS.

• P roblem: The problem consists in finding an exact cover fromC which covers ex-actlyS. More precisely, we want to find sets c1, . . . , ck ∈ C such that

Sk

i=1ci = S andci∩cj =∅fori6=j.

This problem is usually denoted asX3C. For a givenS ={s1. . . , sm}andC ={c1, . . . , cn}, we will construct a vectorvci in the following way:

(vci)j =

(

1 if sj ∈ci

0 otherwise .

LetAbe the matrix whose set of columns is{vci}ni=i andb = [1, . . . ,1]a vector ofm10s. The problem that we associate toX3CisProblem 1.1.2

minkxk0 s.t Ax=b. (1.1.2) We will see that this problem has an at mostm/3-sparse feasible solution if and only if X3Chas a solution.

First assume thatX3C has a solutionB ={ci1. . . , cil}. It implies thatmis multiple of3, otherwiseS cannot be covered exactly. Letx0 the vector

x0_i =

(

1 if ci ∈B

0 otherwise .

Becauseci∩cj =∅,Ax0 =b. Also,lmust be equal tom/3, thenx0 ism/3-sparse.

Now suppose that Problem 1.1.2 has a feasible solution x0 which is at most m/3-sparse. Because each column ofA has exactly3entries different from zero and Ax0 = b, thenx0

(9)

must have at leastm/3entries different from zero. So,x0ism/3-sparse, by our assumption. Consider B = {ci1, . . . , cim/3}, where the sets considered are these which correspond to entries different of zero inx0. Ifaj is thej-th row ofA, we haveajx0 = 1. It implies that sj ∈cil, for somel. So,S ⊆

Sl

r=1cir. But there arem/3sets and each one has3elements.

Then,B must coverSexactly. LetDi

a(u) := {v ∈ RN | kv −uki ≤ a}, the disc of radius ain norm`i centred aty. A feasible solutionx0of Problem 1.1.1 such thatx0 ∈Σkis obtained when the affine space

Φx=yintersectsD0

k(~0). Note thatD1

1(~0)contains all1-sparse vectors with length1, which are precisely the vectors {∓ei, . . . ,∓eN}. Moreover, we have

D₁1(~0) = conv{∓ei, . . . ,∓eN}.

This disc is the so calledcross polytopein_RN and contains allk-sparse vectors with norm `1lesser or equal to1, for anyk-sparsity.

In general, D1

a(~0)contains allk-sparse vectors with norm`1 lesser or equal to a, for any

k. Also, aD1₁(~0) = D1_a(~0). So, x0 belongs to the intersection of the dilation bD1₁(~0)and the affine space Ax = y, where b = kx0k1. Therefore, x0 is also a feasible solution in

Problem 1.1.3:

min

x0_∈

RN

kx0k1 s.t Φx0 =y. (1.1.3)

So, this Problem is a relaxation of problem 1.1.1.

The advantage of that relaxation is that it can be formulated as the following equivalent linear program:

min (x0_,s₎_∈

R2N

h~1, si

s.t

−s≤x0 ≤s

Φx0 =y.

where~1 = (1,1, . . . ,1)andx0 ≤smeans thatxi ≤si, for alli.

1.2 Restricted Isometry Property

We now want to know when Problem 1.1.3 has a unique solution and when this solution coincides withx. This is obviously desirable because the second problem is a lot easier to solve. Candes establishes in [2] a sufficient condition on the matrix of measurementsΦ. To describe it we need two basic definitions, both appearing in [3].

(10)

9

Definition 1. Letk be a non-negative integer. The linear operatorΦ :_RN _→

Rnsatisfies

the Restricted Isometry Property (RIP) of orderk with isometry constantδk if there is a

constant0≤δk <1satisfying

(1−δk)kzk22 ≤ kΦzk 2

2 ≤(1 +δk)kzk22, (1.2.1)

for allz ∈Σk.

Lemma 1.2.1. ConsiderS ⊆ [n],|S| = k andΦS as the matrix whose columns are the

columns labelled by the elements ofS inΦ. The following are equivalent: 1. ΦsatisfiesRIP of orderkwith isometry constantδk.

2. A proper valueλof the matrixAS := ΦtSΦSsatisfies1−δk ≤λ ≤1 +δk.

Proof. If AS = ΦtSΦS, then A is symmetric and also orthogonally diagonalizable. Let

{x1, . . . , xk}be an orthogonal basis of eigenvectors ofAS. If2is assumed, we have

(1−δ)xt_ixi ≤xtiASxi ≤(1 +δ)xtixi

(1−δ)xt_ixi ≤xtiΦtSΦSxi ≤(1 +δ)xtixi

(1−δ)kxik22 ≤ kΦSxik22 ≤(1 +δ)kxik22.

Because anyx0 ∈_Rk_{, satisfies}_x0 ₌Pk

i=1aixi, we also have

(1−δ)kx0k2

2 ≤ kΦSx0k22 ≤(1 +δ)kx

0_k2 2.

Now, the productΦSxis equivalent toΦxwherex ∈ Σk andsupp(x) = S. This implies thatΦsatisfies RIP of orderkwith isometry constantδ.

On the other hand, ifxisksparse, the productΦx = ΦSxS, wherexS ∈Rk. Under these considerations the proof that condition 1 implies condition2 follows similarly to the last proof.

Definition 2. We say thatxk ∈ Σk is a bestk-approximation of xifxk is a minimizer of

the following problem

minx0_∈_Σ

kkx−x

0_k

1.

Theorem 1.2.2. [2] If the matrixΦsatisfies

(1−δ2k)kzk22 ≤ kΦzk22 ≤(1 +δ2k)kzk22,

for any z ∈ Σk and some δ2k <

√

2−1 then, ifx∗ is the solution of 1.1.3, the following inequality is satisfied

kx−x∗k1 < C0kx−xkk1

forC0 some positive constant andxka bestk−sparseapproximation ofx. In particular,

(11)

We can also consider the case when the measurements ofxare contaminated with noise, by which we meany = Φx+z, wherez is the noise added. If we assume thatz satisfies

kzk2 ≤, we could try to solve theProblem 1.2.2, in order to estimate the measured signal x:

min

x0_∈

Rn

kx0k1 s.t ky−Φxk2 ≤. (1.2.2)

In this situation, we have the following theorem:

Theorem 1.2.3. [2] In the situation of Equation 1.2.2, and if x∗ is the solution of that problem, then

kx∗−xk2 ≤C0k−1/2kx−xkk1+C1. (1.2.3)

whereC0, C1 are constants given explicitly below.

In the following, we explain the proof of Theorem 1.2.2 and Theorem 1.2.3. Note that any x∗ is of the form x∗ = x+h, where h ∈ ker(Φ). For a set S ∈ [N] and a vector u∈_RN_{, we denote as}_u

S ∈RN the vector with entries(uS)isatisfying

(uS)i :=

(

ui if i∈S

0 otherwise .

We will construct a collection of setsTi inductively in the following way: 1. ConsiderT0as the set of theklargest entries ofx.

2. LetTibe the set of thek largest coefficients ofhS, whereS = (Si

−1

j=0Tj)

c

The proof of the theorems proceeds in two steps. The first one consists in bounding the quantity kh(T0∪T1)ck₂ using h_T0_∪_T1. The second one shows that kh₍_T0_∪_T1₎ck₂ is small enough. For that we need the following results:

Lemma 1.2.4. The following statement holds

|hΦx,Φx0i| ≤δs+s0kxk₂kx0k₂,

for any nonzero vectorsx, x0such that# supp(x)≤s,# supp(x0)≤s0. Proof. Suppose thatx, x0 are unit vectors andx+x0, x−x0 ∈Σs+s0. Then,

2(1−δs+s0)≤ kΦx±x0k2

2 ≤2(1 +δs+s0), because|kx+x0k2

2 = 2. Now, kΦ(x∓x0)k2

2 =kΦxk 2

2∓2hΦx,Φx

0_i

+kΦx0k2 2

(12)

11

. So,

4| hΦx,Φx0i |=| kΦ(x+x0)k2₂− kΦ(x−x0)k2₂ |≤|2(1 +δs+s0)−2(1−δs₊_s0)| and

| hΦx,Φx0i |≤δs+s0.

Now, for nonzerox, x0 not necessary unitary _kx_x_k,_kx_x00_k are unit vectors and the last in-equality becomes

| hΦx,Φx0i |≤δs+s0kxk₂kx0k₂. Lemma 1.2.5.

khTc

0k1 ≤ khT0k1+ 2kxT0ck1 (1.2.4)

Proof. Note that, forj ≥2, the following holds

khTjk2 ≤k1/2khTjk∞≤ khTj−1k1.

The last inequality follows becausekkhTjk∞ ≤ khTj−1k1. Then,

X

j≥2

khTjk2 ≤k1/2

X

j≥2

khTjk1 ≤k−1/2khTc

0k1. (1.2.5)

Becausekh(T0∪T1)ck₂ ≤P

j≥2khTjk2, then

kh(T0∪T1)ck₂ ≤k−1/2kh_Tc

0k1. (1.2.6)

Now, we want to boundkhTc

0k1. We obtain the following:

kxk1 ≥ kx∗k1 =kx+hk1 =

X

i∈T0

|xi+hi|+

X

i∈Tc 0

|xi+hi| ≥ kxT0k1−khT0k1+khTc

0k1−kxT0ck1.

So, becausekxk1− kxT0k1 =kxTc

0k1, we get

khTc

(13)

From Lemma 1.2.5, Equation 1.2.6 we obtain

|hΦhT0∪T1,Φhi| ≤ kΦhT0∪T1k2kΦhk2 ≤2

p

1 +δ2kkhT0∪T1k2 (1.2.7) and also

(1−δ2k)khT0∪T1k22 ≤ kΦhT0∪T1k22 ≤ khT0∪T1k2(2

p

1 +δ2k+

√ 2δ2k

X

2≤j

khTjk2). (1.2.8)

Equation 1.2.5 joint with Equation 1.2.8 gives us

khT0∪T1k2 ≤α+ρk−1/2e0, (1.2.9)

and therefore

khT0∪T1k2 ≤(1−ρ)−1(α+ 2ρe0). (1.2.10)

whereα = 2

√

1+δ2k

1−δ2k ,ρ=

√

2δ2k

1−δ2k ande0 =k

−1/2_k_x₋_x

kk1. From this we obtain khk2 ≤ khT0∪T1k2+kh(T0∪T1)ck₂ ≤ kh₍_T0_∪_T1₎ck₂+kh_T0k₂+ 2e₀ ≤

2kh(T0∪T1)ck₂+ 2e₀ = 2(1−ρ)−1(α+ 2e₀)≤2(1−ρ)−1(α+ (1 +ρ)e₀) So, we conclude

kx∗−xk2 =khk2 ≤2(1−ρ)−1(α+ 2e0)≤2(1−ρ)−1(α+ (1 +ρ)k−1/2kx−xkk1).

(1.2.11) This proves equation 1.2.3. To prove Theorem 1.2.2 just make= 0.

Remark 1.2.6. The Restricted Isometry Property is one of the very few known conditions which can be used for testing whether a given measurement matrix can be used for reliable compressive sensing of noisy data.

1.3 Null Space Property

Theorem 1.2.2 gives us a way to check how well does the decoder in problem 1.1.3 per-forms with respect to the bestk−sparseapproximation of a signalx. In particular it gives a sufficient condition for exact recovery ofk-sparse signals.

Now we will focus in finding necessary conditions for exact recovery when our signalxis k−sparsevia Problem 1.1.3. As shown in the following result the main role is played by the null space ofΦ.

Lemma 1.3.1. [4] Ifw∈_RN _{is any vector let}

(14)

13

and

Bw :={z ∈RN | kzk1 ≤ kwk1}.

Then,wis the unique solution of Problem 1.1.3 if and only ifBw∩Fw ={w}.

Proof. Note that any feasible solution of Problem 1.1.3 is of the formx+h, whereh ∈ ker(Φ). Fxis the set of feasible solutions of these problem. Recall that the cost for a vector z in Problem 1.1.3 iskzk1. So,Bxcorresponds with vectors with cost lesser or equal than the cost ofxin this problem.

Suppose first thatBx∩Fx ={x}. It implies that there is no vectorz 6=xfeasible in 1.1.3 that has less cost thanx.

Now, suppose that the unique solution of 1.1.3 isx. Suppose also that there exist z 6= x such thatz ∈Bx∩Fx. Becausekzk1 ≤ kxk1,xis not the unique solution of 1.1.3.

The following definitions joint with Lemma 1.3.1 help us to know when ak−sparse signalxcan be recovered using Problem 1.1.3.

Definition 3. Φsatisifies the Exact Recovery Property (ERP) over the setS ⊆[N],#S =

k, ifxis the unique solution of Problem 1.1.3, for anyxsupported inS.

Definition 4. Φsatisfies the Null Space Property (NSP) over the set S ⊆ [N], #S =k, if allh∈ker(Φ)satisfy the inequality

khSk1 <khSck₁.

Theorem 1.3.2. [4]Φsatisfies NSP if and only if satisfies ERP.

Proof. We will prove the contrapositive of both directions. Suppose first thatΦdoes not satisfy ERP. Then, forxfixed, there existzsuch thatzminimizes problem 1.1.3. Becausez is feasible, it satisfiesΦz = Φxandkzk1 ≤ kxk1. It implies that06=h=x−z ∈ker(Φ).

Now, becausexSc is zero, we have:

khSck₁ =kz_Sck₁ =kz_Sck₁+kz_S−z_S+xk₁− kxk₁ ≤

kzSck₁+kz_Sk₁+kz_S−xk₁− kxk₁ =

kzk1+khSk1− kxk1 ≤ khSk1.

Now, assume thatΦdoes not satisfy NSP. Then, there existsh6= 0such thatkhSck₁ ≤

khSk1. Considerx=hS. We have thatz =hSc satisfiesΦx= Φz. So,xis not the unique solution of 1.1.3.

(15)

1.4 Deterministic Construction of Compressed Sensing

Ma-trices

The most common source of measurement matricesΦwith good recovery properties come from random matrix theory. It was proven in [5] that matrices with standard Gaussian en-tries satisfy good RIP bounds. Such matrices offer good results in practice.

However, deterministic constructions are desirable in order to produce 100% reliable re-sults and to decrease storage costs. In this Section we review an interesting deterministic construction based on polynomials over finite fields due to De Vore [1]. The main output will be a{0,1}- matrix satifying RIP for some knownδandk.

The proof of Theorem 1.4.4 showed in this Section differs of these in [1]. We have used Theorem 1.4.3(Gershgorin Circles Theorem), which simplify the original proof.

1.4.1 First construction

LetF :=_Fq be the field of sizeq = pl, for a prime numberp. OrderF ×F lexicograph-ically. LetPr := {s ∈ F[x] | deg(s) ≤ r}. OrderPr also lexicographically. Fors ∈ Pr define the vectorvs :F ×F → {0,1}as:

vs(x, y) =

(

1,ify=q(x) 0, else IfPr ={s1, . . . , sm}, consider the following matrix:

Φ0 :=





| |

vs1 . . . vsm

| |



,

Lemma 1.4.1. Fors, t∈Pr,t 6=sthe following inequality holds vr·vs ≤r.

Proof. Let q = t − s. This is a polynomial in Pr. So, it has at most r roots. As a consequence, p, q agree in at mostr points overF. Therefore,vs andvt agree in at most r entries different from zero. Because the entries different from zero are all equal to 1, vs·vt≤r.

Example 1.4.2. ConsiderF =_F2and

(16)

15

F ×F ordered using lexicographic order gives the labels[(0,0),(0,1),(1,0),(1,1)]. So

v0 =

    1 0 1 0    

, v1 =

    0 1 0 1    

, vx =

    1 0 0 1    

, vx+1 =

    0 1 1 0     .

Onthe other hand, the lexicographic order applied toP1 gives the ordered vector [0,1, x, x+ 1].

Therefore, we obtain the matrix

Φ0 =



  

1 0 1 0 0 1 0 1 1 0 0 1 0 1 1 0



  

.

Theorem 1.4.3(Gershgorin Circles Theorem). LetA= (aij)i,j be at×tmatrix. Consider Rj =

P

i6=j|aij|andDj ={x | |x−ajj| ≤ Rj}, for allj. Then, ifλis an eigenvalue of A,λlies within at least oneDj.

Proof. Suppose thatλis an eigenvalue of Aandxis its respective eigenvector. Leti ∈[t]

such thatxi = maxjxj. Becausexis an eigenvector,|xi|>0. Now, λxi =

X

j

aijxj. Reordering we obtain

(λ−aii)xi =

X

j6=i aijxj.

Dividing byxi and noting that xj

xi ≤1we have

|λ−aii|=

X

j6=i aij xj xi ≤X

j6=i

|aij|

xj xi ≤X

j6=i

|aij|=Ri.

Therefore,λ∈Di.

Theorem 1.4.4. Consider the matrixΦ = √1

pΦ0. Then, φsatisfies RIP fork < p

r + 1and δ = (k−1)r_p.

(17)

Proof. LetT ⊆[N]with|T| =k and letAT =φtTφT. Each diagonal entry ofAT is equal to1and the off-diagonal entriesvq·vsequals the number ofx∈Fp such thatq(x) = s(x). Hence, by Lemma 1.4.1, every entry of the inner product is at mostrand thus the entries out of the diagonal inAT are at mostr_p. By the Gershgoring circle theorem every eigenvalue ofAT is contained in some circle with center1and radius

P

j6=i|Rij| ≤ r_p(k−1)and thus every eigenvalueλofAsatisfies

1−δ≤λ≤1 +δ

forδ := r(k_p−1). So, by Lemma 1.2.1,Φsatisfies RIP with isometry constantδ. Sinceδ <1

we obtain a bound onk < p_r + 1.

(18)

Chapter 2 Coding Theory

This chapter has as goals to describe a decoding process for binary linear codes and to show some examples of linear codes. In Section1a brief introduction to the basics of cod-ing theory is given. Elements and structure of a communication system are described. Sec-tion2treats block linear codes. In this section we also introduce the important concepts of Hamming distanceandminimal distanceand some results about them are established. Sections3 and4study a decoding process in the particular case of binary linear codes. Section 3 describes the process of decoding when a codeword is sent through a binary symmetric channel. Section4proves the validity of theall-zero codeword assumption, which will be important for us in Chapter 4. At the end, Sections 5 and6 show the ex-amples of the codes that will be used in Chapter 3. Section5explains the construction of algebraic codesover curves while Section6presents the construction oftoric codesover surfaces.

2.1 A brief introduction to coding theory

This section was written based on Chapters 1and 2 from [9]. Suppose that you want to send a message through a noisy channel which will corrupt the original message. If our message is written over an alphabet A, we can take each piece of the message, say for example each word, and encode it in such a way that we reduce the damage caused by the channel to the message. For example, suppose that the alphabet A is equal to {0,1}. A way to transmit 1 is to map it to the word 11111. Also, to transmit 0 we map it to the word00000. If the receiver chooses as the message the most frequent bit, for instance if the word received is11001the receiver will decide that1was the sent bit, it is possible to correct up to 2errors caused by the channel. But there is a problem with this model: if you can correct more errors, you need to send more bits, which can make the process very expensive computationally.

(19)

Coding theory tries to find codes that are more compact than the above scheme and allow faithful transition of information through a noisy channel. In general, a communication system consists of the following elements: A message alphabetA, an input alphabetX, an encoder, a channel, an output alphabetY and a decoder.

Definition 5 (Encoder). An encoder is an algorithm that transforms the symbols of the message alphabetAinto sequences of symbols of the input alphabetX.

Definition 6 (Channel). A channel (X, Y,_P)consists of an input alphabet X, an output alphabetY and for each pair(x, y)∈X×Y a conditional probability_P(y|x), the proba-bility to obtainygiven thatxwas transmitted. This probability is assumed independent of the previous and latter transmitions.

Definition 7. A channel is calledBinary Symmetric(BSC)ifX =Y ={0,1}and

P(y|x) =

(

p if x 6=y

1−p if x=y ,

for some real numberpwith0≤p≤1.

Definition 8. (Decoder) A decoder is an algorithm which allows us to obtain a message from the received symbols coming from the channel.

Remark 2.1.1. The following statements hold,

1. P

x∈XP(y|x) = 1,

P

y∈Y P(y|x) = 1.

2. We assume that the receiver knows the probabilities_P(y|x).

3. In a BSC, if 1₂ > pthen it is more effective for the receiver to flip the choice of the symbols (i.e. exchange0and1), so we can always assume thatp < 1₂.

4. Ifp = 1₂ we will see that our decoding algorithm, presented inSection 2.3will not be useful.

A communication system looks in the following way:

sender −→a encoder −→x channel −→y decoder a 0

−→ receiver,

whereais the sent word,xthe encoded word,ythe received word after sendxthrough the channel anda0 is the decoded word which is received.

Definition 9. ConsiderAk_{as the set of} _k_{-tuples of symbols in the alphabet}_A _and_Xn_as

the set ofn-tuples of symbols in the input alphabetX. Suppose that the encoder restricted toAkis a map:

ψ :Ak →Xn

(20)

19

2.2 Block linear codes

Now we will forget the alphabet A to focus in the encoded part of the communication system. For that, we center our attention in the codes. More precisely, we will study linear block codes over finite fields.

Definition 10. A linear block code over a finite field is a linear subspaceC ⊆_Fn

q, whereq

is a power of a prime number. A codeword is an element ofC.

In the last definitionC can be considered as the image of functionψfrom Definition 9, forgetting alphabetA.

Definition 11. A parity check matrix HC associated to the code C is a matrix over Fq

satisfying

HCx= 0 if f x∈C.

From now on we will assume that all our codes are block linear codes over some finite field. We will speak about decoding process when a block linear code is used. For that the following definition will be necessary:

Definition 12. Ifu, v are codewords of a code Cthe Hamming distance between uandv

is given by

dH(u, v) :=|{i|ui 6=vi}|. Remark 2.2.1. The Hamming distancedH is a metric overFnq. Definition 13. The minimal distance of a codeC is

d(C) := min

u,v∈CdH(u, v).

Note that if a channel introducesd(C)or more errors then it can transform a codeword into another. It suggests that the quality in the recovery of a code C does depend on the size ofd(C). This intuition is formalized in Theorem 2.2.3 below.

Suppose that x ∈ C is a codeword and that when it is sent through the channel the word y is received. y = x+e, wheree ∈ _Fn

q represents the errors added by the channel. If these channel introduces a sufficiently small number of errors,which means thateis small enough, thenywill be not so far fromxin terms of the Hamming distancedH. So, a way to recover the original codewordxfromyis to considerxas the minimizer of the problem

min

x0_∈_CdH(x

0

(21)

This method of recovering is called themaximum likelihood decoder. Problem 2.2.1, however, does not necessarily have a unique solution. Conditions for uniqueness of this solution are given in Theorem 2.2.3.

In order to analyse how well the maximum likelihood decoder works over the codeCwith respect to a channel addingkerrors, we establish the following definition:

Definition 14. LetCbe a code.

1. Cdetects up tokerrors if forx∈C,0< dH(x, y)≤kimplies thaty /∈C.

2. C corrects k errors if dH(x, y) ≤ k implies thatx is the unique solution of

Prob-lem 2.2.1.

Remark 2.2.2. Ifkdetects up tokerrors thenkor fewer errors cannot convert a codeword in another one.

Theorem 2.2.3. The following statements hold for any codeC

1. 1 +k≤d(C)if and only ifCcan detect up tokerrors. 2. 2k+ 1≤d(C)if and only ifCcan correctkerrors.

Proof. For(1), assume first that1 +k ≤d(C). For a codewordx, suppose that the channel introduceskor less errors and that a codewordxwas sent. Ifyis the received word, we have dH(x, y)≤k. It implies thatycannot be a codeword, otherwisedH(x, y)≥d(C)≥k+ 1. Therefore, k or less errors can be identified. In the other direction, If C can detect up to k errors, suppose that d(C) ≤ k. Then, there exist two codewords v, w ∈ C such that dH(v, w)≤k. It implies thatv /∈C, by hypothesis. This is clearly a contradiction.

For (2), assume first that 2k + 1 ≤ d(C). If a channel transmits the word x and in-troduce k or less errors, giving y as the received word, we have that dH(x, y) ≤ k. Suppose that there exist another codeword z such that dH(z, y) ≤ dH(y, x) ≤ k then dH(x, z)≤dH(x, y) +dH(y, z)≤k+k= 2k, which contradicts our assumption. So,xis the nearest codeword toyand the errors ofycan be corrected.

In the another direction, assume that C can correct k or less errors and suppose that d(C) ≤ 2k. It implies that there exist two words x, z ∈ C such thatdH(x, z) = d(C). Suppose, without loss of generality that, xandz differ in the firstd(C)entries. Definex0 as the vector such that

• x0_S1 =xS1, whereS1 ={1, . . . , k}.

• x0_S2 =zS2, whereS2 ={k+ 1, . . . , d(C)}. • x0_S3 =xS3 =zS3, whereS3 ={d(C) + 1, . . . , n}.

(22)

21

We have thatdH(x, x0) = d(C)−k ≤ k,dH(z, x0) = k. Becausex 6= z, this contradicts the fact thatCcorrectskerrors.

2.3 Decoding in binary block linear codes

We say that a block linear code C is binary if C ⊆ _Fn

2. So, our input alphabet is{0,1}.

Both0and1are called bits. If we transmit wordsx∈C along a BSC, we can consider the outputyas a word in_Fn

2. Moreover, as we saw beforey:=x+e, wheree=x−y∈ Fn2

is the vector representing the errors added by the channel. Recall that a BSC is defined for transition probabilitiesP(yi |xi), for1≤i≤n. In this caseP(yi |xi)is the probability to obtain the bityigiven that the bitxi was sent.

LetHC be the parity check matrix associated toC. Enumerating the columns, let I(HC)

be the set of labels of the columns ofHC. Also, enumerating the rows, letJ(HC)be the set of labels of rows ofHC.

Definition 15. LetIjbe the set of column labels ofHCwhich corresponds to1in the entries

of thej-th row ofHC. Also, letJibe the set of row labels ofHC corresponding to the rows

which have1in the columni.

Example 2.3.1. Consider the code

C =h[1,0,1,0],[0,1,1,0]i={~0,[1,0,1,0],[0,1,1,0],[1,1,0,1]}.

A parity check matrix associated to that code is

HC =

1 0 1 0 0 1 1 0

.

BecauseHC has dimension2×4,I(HC) = [4]andJ(HC) = [2]. I1 is in this case the set

{1,3}. Also,J2 ={2}.

Definition 16. For eachi∈I(HC)thelog−likelihood ratio λi is defined as λi := logP

(yi|0)

P(yi|1) .

If we have received the wordyoncex∈Cis sent through the channel.

As before, if we have receivedy oncex ∈ C is transmitted, the maximum likelihood decoding rule (MLD) decides for a maximizer of theproblem 2.3.1

max

x0_∈_CP(y|x

0

(23)

whereP(y | x0) = Qni=1P(yi | x

0

i). So, ifx¯is a optimum of that problem, it is a most probable sent codeword. Maximizing _P(y | x0)is equivalent to maximizinglog_P(y | x0), over the codewordsx∈C, becauselogis an increasing function. Note that

log_P(y|x0) =Xlog_P(yi |x0i) ?

=X−log

P(yi |0) P(yi |1)

x0_i+ log_P(yi |0) =

X

−λix0i+ logP(yi |0). The equality?follows from the fact that_P(yi |x0i) =−log

P(yi|0)

P(yi|1)

yi+ logP(yi |0).

P

log_P(yi |0)is a positive constant, so Problem 2.3.1 can be seen as

minhλ, x0i s.t x0 ∈C.

Because the functional in the last problem is linear, and all linear problems attains its optima at a vertex, we can consider the polytopeconv(C)as the convex hull of the codewords inC seen as elements in_Rn. Eachx∈Cis a vertex of then-cubeconv{P

i∈Iei |I ⊆[n]}. So, words inCare also the vertices inconv(C). We have then the equivalentproblem 2.3.2:

minhλ, x0i s.t x0 ∈conv(C). (2.3.2) This problem is not easy to solve, despite being linear, because isN P-hard [13]. It is because of this that we consider the following relaxation,Problem 2.3.3, proposed in [6].

minhλ, x0i s.t x0 ∈P(HC) (2.3.3) whereP(HC) =

T

j∈J(HC)conv(Cj)andCj is the code constructed taking the row hj ofHC and considering

Cj :={x0 ∈F2n| hhj, x0i= 0 mod (2)}. Note that C = T

j∈J(HC)Cj ⊆ P(HC). Also, because P(HC) is a subset of the n-cube

and all the codewords are vertices of thatn-cube, then all codewords are vertices ofP(HC). In [6]conv(Cj)is described in form of inequalities. To do this, begin consideringx1, . . . , xn as the variables denoting the bits of a codeword and(x1, . . . , xn)the associated codeword inCj. To represent the convex closure we require:

0≤xi ≤1 f or all i∈I(HC). (2.3.4) LetEj :={S ⊆Ij | |S|even}andDj :={R∪S |S ∈Ej, R⊆Ijc}.

(24)

23

Remark 2.3.2. For a codeword v ∈ Cj, there is a set T ∈ Dj such that supp(v) = T.

Reciprocally, for anyT ∈Dj, there exists a codewordv ∈Cj such thatsupp(v) = T. Define the auxiliary variableswj,T, for all T ∈ Dj. The idea is that, for a codeword x ∈ Cj, auxiliary variables indicate the positions of supp(x). It means that if xsatisfies

supp(x) = T ∈ Dj thenwj,T = 1 and for T0 6= T, wj,T0 = 0. So, these variables must satisfy

0≤wj,T ≤1 ∀T ∈Dj. (2.3.5)

and

X

T∈Dj

wj,T = 1. (2.3.6)

Finally,xsatisfies

∀i xi =

X

T∈Dj|i∈T

wj,T. (2.3.7)

ForT ∈Dj, consider the codewordcT ∈Cj such that

(cT)i :=

(

1 if i∈T

0 otherwise .

Note that cT ∈ Cj. Moreover, Remark 2.3.2 says us that any codewordc ∈ Cj has this form. LetQj be the polytope described by Equations 2.3.4- 2.3.7. If(x, w) ∈ Qj, where x= (x1, . . . , xn)andw= (wj,T)T∈Dj. We have the following result:

Proposition 2.3.3. x=P

T∈Djwj,TcT.

Proof. By Equation 2.3.7, we have

xi =

X

T∈Dj|i∈T

wj,T =

X

T∈Dj|i∈T

wj,T(cT)i ?

= X

T∈Dj

wj,T(cT)i.

Equality?follows from the fact that(cT)i = 0ifi /∈T.

Lemma 2.3.4. LetQ¯j be the projection ofQj over coordinates(x1, . . . , xn). Then Q¯j =

conv(Cj)

Proof. Letx∈conv(Cj)then, because{cT}T∈Dj =Cj, x= X

T∈Dj aTcT,

(25)

whereaT ≥0andPaT = 1. Makewj,T =aT. We will see that(x, w), wherewT =wj,T, satisfies Equations 2.3.4- 2.3.7.

First note that, by construction, Equations 2.3.4 and 2.3.6 are satisfied. Also by construc-tion,x=P

wj,TcT. Then xk =

X

T∈Dj

wj,T(cT)k =

X

T∈Dj|k∈T wj,T.

This is because ifk /∈ T, (cT)k = 0 and(cT)k = 1otherwise. So, Equation 2.3.7 holds, which implies(x1, . . . , xn)∈Q¯j.

The affirmationQ¯j ⊆conv(Cj)follows from Proposition 2.3.3.

By the last Lemma, we have that optimum in Problem 2.3.3 can be also attained solving

minhλ, x0i s.t x0 ∈ \

j∈J(HC)

Qj. (2.3.8)

From now on,P0(HC) := T_j∈J(HC)Qj.

Corollary 2.3.5. For(x, w)∈P0(HC),x=

P

T∈Djwj,TcT, for allj ∈J(HC).

Proof. Corollary follows from Proposition 2.3.3 and the fact thatP0(HC)is the intersection ofQj’s.

Corollary 2.3.6. P(HC) =

T

j∈J(HC)Q¯j

Proof. This follows directly from Lemma 2.3.4 and the definition ofP(HC).

2.4 All-zero codeword assumption

Throughout this section we consider the following situation: a codewordxis sent through a BSC and the wordyis received. The important information inyis which bits of xwere flipped by the channel and which not. So, it does not matter what was the sent codewordx, in order to use Problem 2.3.3 to recoverx. In particular, we could assume every time that x=~0. In this section this assumption is formalized.

Definition 17. Letz, z0 ∈P(HC). |z−z0|is the vector with entries|z−z0|i :=|zi−z0i|.

We say that the vectorzr _:= _|_x₋_z_|_{, for any}_z _∈ _P₍_H

C), is called therelative solution

associated toz.

Remark 2.4.1. Ifz, z0 ∈ Cj for somej then|z−z0|is the vector inP(HC)associated to

(26)

25

Proposition 2.4.2. For z ∈ P(HC)and zr its relative solution we have that the relative

solution ofzr_is_z_{. It means that the operation to take relative solutions is its own inverse.}

Proof. zr ₌_|_z₋_x_|_and₍_zr₎r ₌_|_zr₋_x_|_{. Now,}

|zr−x|i =|(zr)i−xi|=||zi−xi| −xi|.

Becausexi is zero or1and0≤zi ≤1, we have||zi−xi| −yi|=zi, as we want. Lemma 2.4.3. Ifz ∈P(HC)thenzr ∈P(HC).

Proof. Recall thatz ∈Q¯j for allj. So, by definition,z =

P

T∈DjwTcT, where0≤wT ≤

1andP

T∈DjwT = 1. Then

|z−x|i =

X

T∈Dj|i

wT(cT)i−

X

T∈Dj wTxi

= X

T∈Dj

wT((cT)i−xi)

.

Ifxi = 0, the following trivially holds

X

T∈Dj

wT((cT)i−xi)

= X

T∈Dj

wT|((cT)i−xi)|.

Ifxi = 1, we have

X

T∈Dj

wT((cT)i −xi)

= X

T|i∈T

wT((cT)i−xi) +

X

T|i /∈T

wT((cT)i −xi)

? = X

T|i /∈T

wT(−xi)

= X

T|i /∈T

wT| −xi| ?

= X

T|i /∈T

wT|(cT)i−xi| ?

= X

T|i /∈T

wT|(cT)i−xi|+

X

T|i∈T

wT|(cT)i−xi|

=X

T

wT|(cT)i−xi|

Equalities?follows from the fact that(cT)i = 1ifi ∈ T and zero if i /∈ T. The last calculations gives us

|z−x|= X

T∈Dj

(27)

By Remark 2.4.1, |cT −x| is the point associated to the codeword cT +x mod (2). Becausex∈C,x∈Cj. This implies that

|z−x|= X

T∈Dj

wT|cT −x| ∈Q¯j.

Asjis arbitrary, we have|z−x| ∈T

iQ¯j =P(HC).

Definition 18. Note that the maximum likelihood ratio vectorλdoes depend ony. So, we denote asλ(y)the likelihood ratio vector associated toy. Consider

B(x) := {y| ∃z ∈P(HC)withz 6=xandhλ(y), zi ≤ hλ(y), xi}.

B(x)is the set of wordsysuch that whenxis sent through the channel andyis received, recovery ofxusing Problem 2.3.3 is not possible.

Proposition 2.4.4. There is a bijectionφ between received words sendingxand received words sending~0.

Proof. Fix a codewordx∈C. Letybe the word received after sendingx. Defineu0_{as the}

word satisfying

u0_i :=

  

 

yi if xi = 0

1 if xi = 1 and yi = 0

0 if xi = 1 and yi = 1 .

Note thatu0just has possible entries different from zero in the positions flipped by the channel. So, it is a possible received word after sending~0. Defineφ(y) :=u0_{. Now, let}_y0

be the word received after sending~0. Consideru0 as the word with coordinates u0_i :=

  

 

y0

i if xi = 0

1 if xi = 1 and yi0 = 0

0 if xi = 1 and yi0 = 1 .

As before, u0 is a possible received word after sending x. Defineφ0(y0_{) =} _u0_{. Note that}

φ0 =φ−1. So,φis a bijection, as desired.

Lemma 2.4.5. Suppose thatφ(y) =y0_{, where}_φ_{is as in Proposition 2.4.4. If}_λ0 _:=_λ₍_y0₎

then

λ0_i =

(

λi(y) if xi = 0 −λi(y) if xi = 1 .

(28)

27

Proof. Suppose first thatxi = 0. Thenyi0 =yi. It implies that λ0_i = log

P(yi|xi = 0) P(yi|xi = 1)

=λi(y).

Now, recall that, because the channel is BSC, _P(1|0) = _P(0|1). If xi = 1, y0i is the symmetric symbol associated toyi. It means, ifyi = 1,y_i0 = 0and ifyi = 0,y_i0 = 1. So,

λ0_i = log

P(y0i|xi = 0) P(y0i|xi = 1)

?

= log

P(yi|xi = 1) P(yi|xi = 0)

=−λi(yi). The equality?follows from the symmetry of the channel and the definition ofy0

i.

Proposition 2.4.6. If y, y0 _{are as in the last Lemma and} _z _{is a feasible solution of}

Prob-lem 2.3.3 then:

hλ(y), zi − hλ(y), xi=hλ0, zri − hλ0, ~0i.

Proof. hλ0_{, ~}_0i_{= 0}_{. Also,}

hλ0, zri=Xλ0_i|zi−xi|=

X

i|xi=1

λ0_i(1−zi) +

X

j|xj=0

λ0_jzi.

By Lemma 2.4.5, we have

X

i|xi=1

λ0_i(1−zi) +

X

j|xj=0

λ0_jzi =−

X

i|xi=1

λi(y)(1−zi) +

X

j|xj=0

λj(y)zi =

X

i

λi(y)zi−

X

j|xj=1

λj(y) =hλ(y), zi − hλ(y), xi.

Lemma 2.4.7. There is a bijection betweenB(x)andB(~0).

Proof. Lety ∈ B(x). We will see thaty0 =φ(y)belongs toB(~0). Considerz ∈ P(HC) the word such that

hλ(y), zi − hλ(y), xi ≤0. By Proposition 2.4.6,

hλ(y), zi − hλ(y), xi=hλ0, zri. By Lemma 2.4.3,zr_∈_P₍_H

C). So,zris the word such that

hλ(y0), zri − hλ(y0), ~0i=hλ(y0), zri ≤0, becauseλ(y0_{) =}_λ0_{. It implies that}_y0 _∈_B₍_~₀₎_.

Consider the restriction map φ|B(x) : B(x) → B(~0), whereφ is the map from

Proposi-tion 2.4.4. By ProposiProposi-tion 2.4.2 and ProposiProposi-tion 2.4.4,φis bijective andφ|B(x)is injective.

Similar argument follows takingφ−1_|

(29)

Definition 19. _P(error|x)is the probability that, sending xthrough the BSC,xcannot be recovered using Problem 2.3.3. Formally,

P(error|x) :=

X

y∈B(x)

P(y|x).

Recall that_P(y|x) = Qn

i=1P(yi|xi)and each one of the probabilitiesP(yi|xi)is given by

the channel.

To see that we can assume every time~0as the sent word the following will be proven Theorem 2.4.8.

P(error|x) = P(error|~0)

Proof. We first see that_P(y|x) =_P(y0|~0).

P(y|x) = n

Y

i=1

P(yi|xi)

= Y

i|xi=1

P(yi|1)

Y

j|xj=0

P(yj|0) ?

= Y

i|xi=1

P(yi0|0)

Y

j|xj=0

P(y0j|0)

=_P(y0|~0).

Equality?follows from the symmetry of the channel and the definition ofy0_.

Now, becauseφis a bijection betweenB(y)andB(~0)we have

P(error|x) =

X

y∈B(x)

P(y|x) =

X

y0_∈_B₍~₀₎

P(y0|~0) = P(error|~0).

2.5 Algebraic codes over curves

We list some important definitions and results from algebraic geometry in order to define algebraic codes over curves. For a more detailed description of concepts here see [11]. Definition 20. For a normal algebraic varietyX, consider

(30)

29

Consider the free group generated byA W Div:=M

Y∈A ZY.

Each element inW Div is called a Weil divisor ofX. The group W Div is called the Weil group ofX.

ForD∈W Div,D=P

Y∈AaYY. We say thatDis effective if0≤aY, for allY.

Definition 21. ConsiderX a normal nonsingular curve. Consider a functionf ∈ k(X). Note that in this case elements inAare the points ofX. We define

div(f) := X

P∈A

eP(f)P,

whereeP(f)is the number of zeros minus the number of poles of the functionf at pointP. The valueeP(f)is calledthe valuation of f in P.

In general, over any normal variety X we can construct div(f),for f ∈ k(X), in a similar way. However, the definition of the valuation is more technical. Because in our context the general case is not necessary, we omit it.

Definition 22. A Weil divisorDis calledCartierif there exist open affine sets{Uj}j such

thatS

jUj =X andD|Uj =div(fj), wherefj ∈ OX(Uj). Definition 23(Sheaf of a divisor). Given a divisorD, consider

OX(D)(U) :={f ∈k(X)|(div(f) +D)|U ≥0} ∪ {0},

for any openU ⊆X. OX(D)is the sheaf with sectionsOX(D)(U)over the openU. Definition 24. IfD, E ∈W Div(X), we say thatDis linearly equivalent toE(denoted as

D∼E) ifD−E =div(f), for somef ∈k(X).

Remark 2.5.1. IfD∼EthenOX(D)∼=OX(E)

Remark 2.5.1 allows us to consider only elements D ∈ W Div/ ∼ to construct the sheafOX(D). The setW Div/ ∼is in fact a group and is called the Picard group ofX. It is denoted asPic(X).

Assume from now on that X is an algebraic non-singular curve over_Fq with genusg, for q =pnandpa prime number. The theory shown here comes from [7]

Definition 25. LetD = P1 +...+Pm be a divisor such that Pi is a rational point ofX

over_Fq. Suppose thatG∈Pic(X)andsupp(D)∩supp(G) =∅. Consider

evD :OX(G)(X)→Fmq

(31)

Proposition 2.5.2. L(D, G)is aFq-linear code with minimal distanced≥m−deg(G)if

deg(G)≤m.

Proof. BecauseL(D, G)comes from evaluating global sections of a sheaf, it is a vector space. A sectionf ∈ OX(G)(X)has at mostdeg(G)zeroes in{P1, . . . , Pm}. So,evD(f) has at leastm−deg(G)entries different from zero. It implies thatd≥m−deg(G). Remark 2.5.3. Ifsupp(D)∩supp(G)6=∅, two things can happen:

1. There existsf ∈ OX(G)(X)such thatf has a pole in some pointP ∈ supp(D), if

the sign ofP inGis positive.

2. All sectionsf ∈ OX(G)(X)must have a zero in the pointP ∈supp(D)∩supp(G),

if the sign ofP inGis negative.

In the first case the evaluation f(P) is not well defined. In the second case, all the vectors in C have an additional zero. So, the minimal distance of L(D, G) decreases with respect to L(D0, G0), where supp(D0) ∩ supp(G0) = ∅ and deg(D) = deg(D0),

deg(G) = deg(G0).

Example 2.5.4 (Reed-Solomon codes). ConsiderX = _P1(_Fq). Let D = P1 +...+Pm,

where Pi = [ai : 1]and ai 6= 0. Let also G = d[1 : 0]. OX(G)(X) corresponds to the

functionsf = p_q wheredeg(p) = deg(q)andf has at mostdpoles in[1 : 0]. It means that

OX(X) = {p/xk1 |deg(p) = k≤d}.

Making the variable change x0_x1 =x, this set corresponds to_Fq[x]≤d, the set of

polyno-mials with degree at mostd. In this case evD(f) = (f(a1), . . . , f(am)), forf ∈ Fq[x]≤d.

The codeL(D, G)is called in this case a Reed-Solomon code and is denoted asRSd(a, x), wherea= (a1, . . . , am).

Definition 26. Consider ΩX, the sheaf of differential forms overX. For ω ∈ Ω we can

considerdiv(ω) = P

P∈X vP(ω), wherevP(ω)is obtained noting thatωlocalized atP ∈ Xhas the formω=f du, whereuis the generator of the ideal ofP in the localization. So,

vP(ω)will be the usual valuation off with respect touin the local ring. Definition 27. GivenG∈Pic(X)consider the sheaf

ΩX(G)(U) :={ω ∈ΩX |(div(ω)−G)|U ≥0} ∪ {0}.

ForD=P1+...+Pm, define

(32)

31

such that, forω ∈ΩX(G−D)(X),ResD(ω) = (resP1(ω), . . . , resPm(ω)), whereresPi(ω)

is calculated in the following way: localizing aroundPi, ω = f du. This function has the

Laurent series expansionf =P∞

i=v_Pi(f)aiu

i_{, where}_a

v_Pi(f) 6= 0andvPi(f)is the valuation

off atPi. resPi(ω) = T rFpn/Fp(a−1), forT rFpn/Fp(b) = b

n−1₊_...₊_b2₊_b_.

The code obtained as the image ofResD is called a geometric Goppa code and is denoted

asΩ(D, G).

Proposition 2.5.5. Ω(D, G)is a_Fq-linear code with distanced ≥deg(G)−(2g−2).

Proof. Note that ResD(ω) is linear. BecauseΩX(G−D)(X) is a vector space we have thatΩ(D, G)is a vector space.

LetV(ω)andS(ω)be the set of zeros and the set of poles ofωrespectively. Note that the number of poles ofωis equal to

|V(ω)| −deg(div(ω)) =|V(ω)| −(2g−2). Becausediv(ω)−G+D≥0,|V(ω)| ≥deg(G). So,

|S(ω)| ≥deg(G)−(2g−2).

On the other hand, poles ofωin{P1, . . . , Pm}can be only simple, otherwise

div(ω)−G+D≥0

is not satisfied. Therefored≥deg(G)−(2g−2).

Example 2.5.6 (Classical Goppa codes). Take D and Pi as in Example 2.5.4. Consider P = [1 : 0] and g be a polynomial such that g(ai) 6= 0. The classical Goppa code is

defined as follows:

Γ(L, g) :={c∈_Fn q |

X c_i

x−ai

≡0 mod g}.

Ifdeg(g) ≥m, condition onΓ(L, g)only can be satisfied by~0. So, assume thatdeg(g) < m.

Consider the codeΩ(D, E−P), whereEis the set of zeroes ofg. For eachωsuch that

ω ∈ΩX(E−P)(X),

we have that ω must have zeroes on all Q ∈ supp(E). Also, ω can have some poles in

{P} ∪supp(D). It implies thatω|{x6=0} = Qm h

j=1(x−aj)dx

, whereg |hand some of the poles

ai are removable.

Because deg(g) < m, we can express h as rg+Q

(x−aj)h1. Then, the restriction of

ω becomes in rg+(

Q

x−aj)h1

Q

x−aj . So, the poles of the section are in rg

Q

x−aj. Expanding this

expression in partial fractions we obtain that

rg

Q

x−aj

=X cj

x−aj

≡0 mod g.

(33)

2.6 Toric codes over surfaces

We begin this section with some results about divisors on toric varieties and the relation between divisors and polytopes over toric varieties. These results come from [11].After, this toric codes over surfaces are defined and some results about their distances, appearing in [8], are established. From now on, suppose thatN is a lattice andM is its dual lattice. Definition 28. LetP ⊆M be a lattice polytope. IfF is the set of facets ofP, this can be defined as follows:

P :={m∈M | hm, uFi ≥ −aF,∀F ∈ F },

where uF ∈ N is the shortest vector orthogonal to the facetF and aF ∈ Z. Define the

inner normal fan associated toP,ΣP as

ΣP :={σQ |Q4P},

where σQ is the cone generated by the set {uF | F 4 Q, F ∈ F }. The toric variety

associated toP, denoted asXP, isX(ΣP), the toric variety generated byΣP. Claim 2.6.1. The toric varietyXP is projective.

Recall that forρ∈ Σ(1), whereΣ(1)is the set of generators of one dimensional cones ofΣ, we can construct the divisorDρ:=V(ρ).

Definition 29. We say thatDis aT-invariant divisor ifD ∈CDivT, whereCDivT is the

free abelian group generated by{Dρ}ρ∈Σ(1).

Definition 30. If

P ={x∈_Rn _{| h}_{x, u}

Fi ≥ −aF,∀F ∈ F },

we define the divisorDP overXP as DP =

X

uF

aFDuF.

Remark 2.6.2. DP is aT-invariant divisor.

Lemma 2.6.3. O(DP)(X)is generated as anF-module by the characters corresponding

to the lattice points ofP.

Proposition 2.6.4. IfP0is a polytope which results as a translation ofP, thenDP andDP0

(34)

33

2.6.1 Toric codes

Let P ⊆ _R2 _{be an integral polytope, i.e. the vertices of} _P _{are in}

Z2, such that, up to translation, P ⊆ [0, q−1]×[0, q−1], forq = pm andp some prime number. Consider X := XP(Fq). This toric variety has torus T = (F∗q)2. If ζ is a generator of F

∗

q as multiplicative subgroup, all the points inT have the form Pij = (ζi, ζj). LetDP be the divisor associated toP. Recall that the global sections of OX(DP) are generated by the characters corresponding to the points inP ∩_Z2_{. The toric code over}

Fqassociated toP is CP :={s(Pi,j)i,j∈[q−1] |s∈ OX(DP)(X)}.

In the following, we show some results about minimal distances of codesCP, for particular instances of the polytopeP. These come from [8].

Proposition 2.6.5. LetP = conv{(0,0),(a,0)}be a line segment. Then, forq ≥ a+ 1,

d(CP) = (q−1)2−a(q−1).

Proof. OX(DP)(X) =h1, x, x2, . . . , xai. For eachv ∈F∗q,

Cv ={χ(u, v)|u∈_F∗_q, χ∈ OX(DP)(X)}

is a Reed-Solomon code. So, CP is a product ofq−1Reed-Solomon codes. EachCv ⊆ Fqq−1 has distanceq−1−a≥0, because a section can only have at mostazeros and there exist sections with exactlyaof them. Then, the minimum number of entries different from zero in a codewordv ∈CP is(q−1)2−a(q−1).

Proposition 2.6.6. LetP = conv{(0,0),(a,0),(b, c)}.Ifa, b, c≥ 0anda ≥b+cthen, if

P ⊂[0, q−1]×[0, q−1]then

d(CP) = (q−1)2−a(q−1). To prove Proposition 2.6.6 we need some results.

Proposition 2.6.7. LetPl

i=1Pi ⊆ P, Pi, P lattice polytopes contained in[0, q−1]2. Let mibe the maximum number of zeroes of a section inOX_Pi(DPi)(XPi). Suppose that there

exist sectionssi ∈ OX_Pi(DPi)(XPi)pairwise disjoint and withsi havingmizeroes. Then

d(Cp)≤

l

X

i=1

d(CPi)−(l−1)(q−1)2.

Note thatd(CPi) = (q−1)2−mi.

Definition 31. Forf(x0, . . . , xm)and homogeneous polynomial with coefficients inFq,

V_Pm₍

Fq)(f) := {y∈P m₍

(35)

Theorem 2.6.8(Serre-Tsfasman). Iff(x0, . . . , xm)is an homogenous polynomial overFq

then |V_Pm₍

Fq)(f)| ≤ dq

m−1 ₊_pm

−2 where deg(f) = d and pm = |Pm(Fq)| = q m+1₋₁

q−1 .

Moreover, equality is obtained for a reducible hypersurface composed of d hyperplanes passing through a common linear space of codimension2.

Proposition 2.6.9. IfP = conv{(0,0),(0, a),(a,0)}thend(Cp) = (q−1)2−a(q−1).

Proof. Note thatXP is the toric variety P2(Fq). So, any hyperplane l is a line, which is determined by two points. By Serre-Tsfasman theorem, there exists l1, . . . , lalines inXP such that all pass through a unique common point t and f = l1· · ·la has the maximum number of zeros. Note that the choice of thea distinct lines and common pointt does not matter, the number of zeros remains. So, in order to maximize the quantity of zeros onT, we chooset= (0 : 0 : 1). Also, we chooseli different from the axes in the affine

Uz :={(u:v : 1)|u, v ∈Fq} ⊇T.

In this situation, each lineli hasq−1points. Because the intersection point t is unique, these lines do not intersect overT. Then, the total number of zeros off overT isa(q−1). Therefore,d(CP) = (q−1)2−a(q−1).

Proof. (Prop 2.6.6).

P ⊆ P∆, whereP∆ := conv{(0,0),(a,0),(0, a)}. By Proposition 2.6.9, the distance

ofCP∆ is(q−1)2−a(q−1). So,

d(CP)≥d(CP∆) = (q−1)2−a(q−1).

On the other hand, note that P contains the segment L = conv{(0,0),(a,0)}. So, by Proposition 2.6.7,

d(CP)≤d(CL) = (q−1)2−a(q−1).

The next example will use Propositions 2.6.5, 2.6.6 and 2.6.7 to calculate the minimum distance of certain toric code. It will use also the following definition.

Definition 32. LetP, Qbe lattice polytopes. These are lattice equivalet if there exists an affine isomorphismL:_Zn→_Zn_{such that}_L₌_A₊_t_where_A_{is a matrix with determinant}

∓1andL:P →Qis a bijection sending vertices ofP to vertices ofQ.

Example 2.6.10. Let

P = conv{(0,0),(d,0),(0, e),(d, e+rd)},

for somee, d, r ∈_N. P defines the Hirzebruch surfaceHr. Suppose thate+rd≤ q−1. P can be written as the Minkowski sumT +L, whereT = conv{(0,0),(d,0),(d, rd)}and

(36)

35

L= conv{(0,0),(0, e)}. We want to calculated(CP).

The triangle T is lattice equivalent to the triangle T0 = conv{(0,0)(rd,0),(0, d)}. By Proposition 2.6.6,

d(CT0) = d(C_T) = (q−1)2−rd(q−1).

On the other hand, by Proposition 2.6.5,

d(CL) = (q−1)2−e(q−1).

The sectiont=xdQrd

j=1(y−αj)∈ OX(DT)(X)has exactlyrd(q−1)zeros, if allαi are

distinct in_F∗_q. Now, becausee+rd≤ q−1, we can choose{β1, . . . , βe} ⊂ F∗q such that αj 6= βi, for alli, j. So, the section s =

Qe

l=1(y−βl) ∈ Ox(DL)(X)is disjoint oftand

have exactlye(q−1)zeros. By Proposition 2.6.7 we have that

d(Cp)≥(q−1)2−e(q−1) + (q−1)2−rd(q−1)−(q−1)2 = (q−1)2−(rd+e)(q−1).

On the other hand, the polygonP is contained in the triangle

∆ = conv{(0,0),(0, e+rd),(e+rd,0)}.

By Proposition 2.6.9, we have that

d(CP)≤d(C∆) = (q−1)2−(e+rd)(q−1).

(37)

Construction of Binary Compressed

Sensing Matrices from Codes

In this chapter our constructions of deterministic measurement matrices are exhibited. Sec-tion 1 describes a construction using sections of sheaves over algebraic curves in finite fields. This is a natural generalization of the construction appearing in Section 1.4.1. Sec-tion2shows generalization to linear codes over finite fields. Section 3exhibits computa-tional experiments which try to determine the quality of the constructions in the last two sections. Gaussian matrices from [5] are taken as reference point.

3.1 Description of the construction

3.1.1 Second Construction: Sections of sheaves

Let C be a non-singular irreducible normal curve contained in _P2_. _C _:= _V

Fq(f(x, y, z)), where_Fq is as in Section 2.5 andf is an homogeneous polynomial withdeg(f) = n. Let C0 ={c1, . . . , cl}be the points of the curve lying in the open set

Uz :={(x:y :z)∈P2 |z 6= 0}.

For such points, we choose the representatives of the form(x:y: 1).

LetOC(d) := OP2(d)|C. ifΓ(C) :=OC(d)(X), fors∈Γ(C)consider the vector

vs(a, b) :=

(

1 if s(a) =b

0 otherwise ,

for a ∈ C0 and b ∈ _Fq. Note that the same can be done using open sets Ux or Uy and choosing representatives. If _Fq = {r1, . . . , rq}, the entries ofvs can be ordered

lexi-cographically with respect to the indices of the pairs (ci, rj). Moreover, if OP2(d)(C) =

(38)

37

{s1, . . . , sm}, we can construct the following matrix:

Φ0 :=





| |

vs1 . . . vsm

| |



,

where the entries ofvsi are ordered with respect the lexicographic order before men-tioned.

Example 3.1.1. LetC ={(0 : y:z)|y, z ∈F}, whereF =_F2. Cis isomorphic toP1(F).

We consider

OC(1) ={0, y, z, y+z}.

OverUz we have that OC(1)corresponds with the set {0,1, x, x+ 1} andC|Uz = F, so

Φ0will be in this case equal to the matrixΦ0in Example 1.4.2.

Proposition 3.1.2. The number of positions different from zero in each column of Φ0 is |Uz|.

Proof. We are evaluating the section si in the pointsa ∈ Uz. For eacha ∈ Uz, just one of the labels {(a, r1),(a, r2), . . . ,(a, rq)} corresponds to a value distinct from zero, for

{r1, . . . , rq} = Fq. It is because the value of si(a) = rj for a unique value rj ∈ Fp. So, we have exactly one entry different of zero invsi, for eacha ∈Uz. This gives exactly|Uz| entries different of zero.

Lemma 3.1.3. Ifu, vare two distinct columns ofΦ0 thenu·v ≤d·deg(C).

Proof. Consider fu and fv the sections which correspond to u and v respectively. Let p := fu −fv. By Bezout’s theorem, if V(p) is the set of zeros of p over the algebraic closure of_Fq, V(p)intersects withC in exactlyl =d·deg(C), over that closed field. So, phas at mostl zeros overC. It implies thatfu andfv intersect in at mostl points overC and, therefore,uandvhas at mostl 10sin common. Sou·v ≤l.

Theorem 3.1.4. The matrix Φ := √1

tΦsatisfies RIP forδ =

d·deg(C)

t andk < t

d·deg(C) + 1,

wheret=|C0 |.

Proof. Suppose thatΛis a set ofkcolumns ofΦ. ConsiderA:= Φt_ΛΦΛ. By Lemma 3.1.3,

Ai,j ≤ d

·deg(C)

t ifi6=j. On the other hand,Ai,i = 1. So,

X

j6=i

Ai,j ≤(k−1)d·deg(C)

t =δ.

By the Gershgorin circle’s theorem, the proper valuesλ ofAsatisfy1−δ ≤ λ ≤ 1 +δ. So, forx∈_Rn_{such that}_supp(_x_{) = Λ}_{, inequality of RIP is satisfied for the described}_{δ. If} we claim thatδ <1, thenk < _d_·_deg(t _C₎+ 1.

(39)

3.1.2 Third Construction: Matrices from Codes

LetC be a linear code over_Fq, such thatC ={c1, . . . , cm}. Suppose that the parameters of the code are(n, m, d), wherem is the dimension ofC, n is such thatC ⊆ _Fn

q anddis the minimal distance of the code.

Let V := {ci,j}i,j be the values of all entries of the codewords ofC. Suppose thatV =

{v1, . . . , vl}. Consider [n] ×V. For each ci we can construct the vector wci, which is labelled by the elements of[n]×V, ordered in some fixed order. The vector is constructed as follows:

(wci)j,vk :=

(

1 if ci,j =vk

0 if ci,j 6=vk .

LetΦ0be the matrix

Φ0 :=





| |

wc1 . . . wcm

| |



.

This construction wants to consider each vectorci as a section of some sheaf. In this sense, the vectors wci play the same role that vectorsvs do in Construction 2. Moreover, this construction agrees with constructions1and2, in the particular cases of Reed-Solomon codes and codes from algebraic curves, respectively.

Example 3.1.5. ConsiderC as the binary code with codewords

{(0,0,0,0),(0,1,0,1),(1,1,0,0),(1,0,0,1)}.

In this casen = 4andV ={0,1}. So,

[n]×V ={(1,0),(1,1),(2,0),(2,1),(3,0),(3,1),(4,0),(4,1)}.

Ifc1 = (0,0,0,0),c2 = (0,1,0,1),c3 = (1,1,0,0),c4 = (1,0,0,1), then

vc1 =

            1 0 1 0 1 0 1 0            

, vc2 =

            1 0 0 1 1 0 0 1            

, vc3 =

            0 1 0 1 1 0 1 0            

, vc4 =

            0 1 1 0 1 0 0 1             .

(40)

39

So

Φ0 =

           

1 1 0 0 0 0 1 1 1 0 0 1 0 1 1 0 1 1 1 1 0 0 0 0 1 0 1 0 0 1 0 1

            .

Proposition 3.1.6. The number of entries distinct from zero in each column ofΦ0isn.

Proof. Consider the columnwci. For0 < r ≤ n, we can consider the labels involvingr, those are{(r, v1),(r, v2), . . . ,(r, vl)}. Because(ci)r = vj for some uniquevj ∈ V, along these labels there is exactly one position different from zero. From that, we conclude that the number of entries different from zero isn.

Lemma 3.1.7. Ifu, vare two distinct columns ofΦ0, thenu·v ≤n−d.

Proof. Letci, cj be the codewords such thatu = wci and v = wcj. Because the minimal distance of the code isd,ci andcj coincide in at mostn−dentries. So, the quantity of10s in thatuandvcoincide is at mostn−d. Then,u·v ≤n−d.

Theorem 3.1.8. The matrixΦ := √1

nΦ0satisfies RIP forδ = (k−1) n−d

n andk < n n−d+ 1.

Proof. LetΛbe a set ofk columns ofΦ. ConsiderA := Φt

ΛΦΛ. By Lemma 3.1.7,Ai,j ≤ n−d

n ifi6=j. On the other hand,Ai,i = 1. So,

X

j6=i

Ai,j ≤(k−1)n−d

n =δ.

By the Gershgorin circle’s theorem, the proper valuesλ ofAsatisfy1−δ ≤ λ ≤ 1 +δ. So, forx∈_Rn_{such that}_supp(_x_{) = Λ}_{, inequality of RIP is satisfied for the described}_{δ. If} we claim thatδ <1, thenk < _n₋n_d + 1.

3.2 Computational experiments

LetM be a measurement matrix coming from constructions2or3. In this section we use the codes described in the last sections to see the behaviour of M recovering exactly k-sparse signals.

Theorem 3.1.8 gives a bound for the sparsity that allows to comply RIP using M. More-over, this Theorem joint with Theorem 1.2.2, show us k-sparsities for which M recovers

(41)

exactly. However, our experiments show that some of the matrices constructed from codes using constructions2and3can recover signals of bigger sparsity than that given by Theo-rem 3.1.8. They also suggest that some of this matrices do not have a better behaviour than the proposed for RIP.

3.2.1 Recovering of deterministic matrices

Matrices in our experiments were constructed using the algebraic software M acaulay2. For each matrixM of dimensionsn×N the experiments were developed in the following way:

1. A sparsity sizekis fixed.

2. 20k-sparse random vectors of lengthN are generated.

3. For each one of the vectorsxfrom item2the Problem 1.1.3 is solved takingΦ =M, y= Φxand obtainingx0as solution.

4. The cases whenx=x0are counted.

To solve Problem 1.1.3 for matricesM of small size was used thesimplexmethod of M atlab. For bigger matrices we use packagecvx, and the function of`1 minimization in

that package. Experiments were made first sampling the entries non-zero in the random vectors from a distribution N(0,1). Also was used the random valueα = U · N(3,1), where U = (−1)l _and _l _chooses₀ _or ₁ _{uniformly. Both ways to construct the} _k-sparse vectors gave similar results.

Table 3.1 shows experiments made with constructions from [1], described in Section 1.4. This is, in fact, construction 2applied to codes from Example 2.5.4. As was seen in that example, the set of polynomials with degree up to d used in this construction, coincides with global sections ofO_P1(d)|_Uy.

Each matrix is represented as a 4-tuple(p, r, k, kδ), where pis the size of the field, r the maximum degree of the polynomials,kthe bound over the sparsity such that for anyl ≤k, RIP is satisfied forδ < 1and kδ the maximum k such that Theorem 1.2.2 ensures exact recovering fork-sparse vectors.

Recall thatk < p_r+ 1,δ= (k−1)r

p and therefore,kδis the biggestksuch that(k−1) r p <

√ 2−1.

The same experiment was made for toric codes studied in Subsection 2.6. In this case, construction 3 is used to obtain the matrices. Each matrix is described with the 4-tuple

(p, q, k, kδ), wherepis the characteristic of the field,qis the size of the field,ris the bound ofk-sparsity given by Theorem 3.1.8 andkδis the maximumkthat allows exact recovering according to Theorem 1.2.2.