Equivalence Classes of the Derived Kernel

Hence, for allf, g∈Im(v_m+1) we have

K_m+1(f, g) = bN_m+1(f), Nb_m+1(g)

= X

τ∈Tm

Nb_m+1(f)(τ)Nb_m+1(g)(τ)

= X

τ∈Tm

Nbm+1(f◦r_|v_m+1_|)(τ◦r_|v_m_|)Nbm+1(g◦r_|v_m+1_|)(τ◦r_|v_m_|)

= bN_m+1(f◦r), Nb_m+1(g◦r)

=Km+1(f◦r, g◦r) , as desired.

Having proven thatK_mis reversal symmetric for 1≤m≤n, we will now follow the general method of Case (1) to show thatK_n is not reversal invariant. Let 1≤m ≤n−1 and assume thatK_m is not reversal invariant. Let f ∈ Im(v_m) be such that K_m(f, f ◦r) <1. Then to prove that K_m+1 is not reversal invariant, it suffices to show that eitherK_m+1(e_f, e_f◦r)<1, or there existsf⁰ ∈Im(v_m) such thatλ(f⁰)> λ(f) andK_m(f⁰, f⁰◦r)<1.

First supposeK_m(f◦r, e_f◦h_i)<1 for all 2≤i≤ |v_m+1| − |v_m|+ 1. Then, noticing that ef◦h1=f and ef◦r_|v_m+1_|◦χ(h1) =ef◦h1◦r_|v_m_|=f ◦r_|v_m_| , by considering the templateτ=Nb_m(f◦r)∈ T_m we see that

N_m+1(e_f)(τ) = max

h∈Hm K_m(e_f◦h, f◦r)<1 , while

Nm+1(ef◦r)(τ) = max

h∈Hm Km(ef ◦r_|v_m+1_|◦hi, f ◦r_|v_m_|) =Km(f ◦r, f◦r) = 1 . This shows thatN_m+1(e_f)6=N_m+1(e_f◦r), and henceK_m+1(e_f, e_f◦r)<1 byProposition 5.4.

On the other hand, suppose K_m(f ◦r, e_f ◦h_i) = 1 for some 2≤i ≤ |v_m+1| − |v_m|+ 1. Using the reversal symmetry ofK_m, we get

K_m(f, e_f◦h_i◦r) =K_m(f ◦r, e_f◦h_i) = 1 . Then notice that we must have

Km(ef◦hi, ef◦hi◦r)<1 , for otherwise

Nm(f) =Nm(ef◦hi◦r) =Nm(ef◦hi) =Nm(f◦r) ,

contradicting the fact that Km(f, f ◦r)<1. Sinceef is constructed by growing the tail of f, we have λ(ef◦hi)≥λ(f) + 1> λ(f). Thus we can take the new witness f⁰ to beef◦hi ∈Im(vm), and we are done.

Proof. Without loss of generality we may assumek₁≤k₂. Write f =a₁. . . a_p. Since we already knowf isk₁-periodic, to provef isk-periodic it suffices to show that the substringa₁. . . a_k₁ isk-periodic.

Fix 1≤i≤k₁−k. Starting froma_i, we repeatedly use the periodicity off to move from one element off to the next by jumping eitherk₁ indices to the right ork₂ indices to the left. If we follow the rule of jumping to the left whenever possible, then we can reacha_i+k while staying within the stringf. More formally, consider the functionj:N₀→Zgiven byj(0) = 0, and ford∈N,

j(d) =

(j(d−1)−k₂ ifj(d−1)−k₂≥ −i+ 1, j(d−1) +k1 otherwise.

We claim that−i+ 1≤j(d)≤k1+k2−i for alld∈N, and thatj(d^∗) =k for somed^∗ ∈N. The first claim tells us that 1≤i+j(d)≤pfor alld∈N, and thus using the periodicity off, we haveai=a_i+j(d) for alld∈N. The second claim then gives usai=a_i+j(d^∗₎=ai+k, as desired.

For the first claim, it is clear from the definition ofj that j(d)≥ −i+ 1 for alld∈N. Now suppose the contrary thatj(d)> k₁+k₂−ifor somed∈N; letd⁰ be the smallest suchd. We consider the move fromj(d⁰−1) toj(d⁰). On the one hand, ifj(d⁰) =j(d⁰−1)−k₂, thenj(d⁰−1) =j(d⁰)+k₂> k₁+k₂−i, contradicting our choice of d⁰. On the other hand, if j(d⁰) = j(d⁰−1) +k₁, then j(d⁰ −1)−k₂ = j(d⁰)−k₁−k₂>−i, which means we should have jumped to the left to go fromj(d⁰−1) to j(d⁰). This shows thatj(d)≤k₁+k₂−ifor alld∈N.

For the second claim, we first note that by the Euclidean algorithm we can find x, y∈N satisfying k=xk₁−yk₂. Let de₁ be the first time we jump to the right xtimes. That is,de₁ is the smallest d∈N such that fromj(0) toj(d) we have jumped to the rightxtimes, not necessarily consecutively. Note that such ade₁ must exist, as we cannot jump to the left all the time. Similarly, letde₂ be the first time we jump to the lefty times. Observe thatde16=de2. We now consider two cases.

1. Ifde1<de2, then by the time we reachj(de1) we would have jumped to the rightxtimes and to the leftde1−x < ytimes. Thus

j(de1) =xk1−(de1−x)k2=k+yk2−(de1−x)k2=k+ (x+y−de1)k2 . Sincex+y−de1>0, we can now jump to the leftx+y−de1 times to obtain

j(x+y) =j(de1)−(x+y−de1)k2=k , as desired.

2. Ifde₁ >de₂, then by the time we reach j(de₂) we would have jumped to the left y times and to the rightde₂−y < xtimes. Thus

j(de₂) = (de₂−y)k₁−yk₂= (de₂−y)k₁+k−xk₁=k+ (de₂−x−y)k₁ . Note that sincei≤k₁−k≤k₂−k, for any 0≤d≤x+y−de₂−1 we have

j(de₂) +dk₁−k₂≤k+ (de₂−x−y)k₁+ (x+y−de₂−1)k₁−k₂=k−k₁−k₂≤ −i . This means fromj(de2), the nextx+y−de2moves have to be jumps to the right, and thus

j(x+y) =j(de2) + (x+y−de2)k1=k , as desired.

Now we present the following result, which constitutes a substantial part of the proof ofTheorem 5.13.

The following lemma says that when the jump in the patch sizes is larger than the jumps in the previous layer, we obtain a new equivalence class of the derived kernel.

Lemma D.5. Consider an architecture withn≥2 layers. Let q∈Nand2≤m≤n, and suppose that:

(i)|vm| − |vm−1|=q, (ii) |vm| ≥4q+ 2, and (iii) at layer m−1 we have

Km−1(f, g) = 1 implies f =g orf ^(k)∼g for some2≤k≤q+ 1 . Then at layermwe have

Km(f, g) = 1 if and only if f =g orf ^(k)∼g for some 2≤k≤q+ 1 .

Proof. We first prove the forward implication. Letf =a1. . . a_|v_m_|andg =b1. . . b_|v_m_| be two strings in Im(vm) such thatKm(f, g) = 1. For 1≤i≤q+ 1, let fi =ai. . . a_i+|v_m−1_|−1 andgi=bi. . . b_i+|v_m−1_|−1 be thei-th|vm−1|-substrings off andg, respectively. We divide the proof into several steps.

Step 1. By Proposition 5.4, K_m(f, g) = 1 implies N_m(f) = N_m(g), and so N_m(f)(τ) = N_m(g)(τ) for all templates τ ∈ T_m−1. In particular, by taking τ = Nb_m−1(f_i), 1 ≤ i ≤ q+ 1, we see that Nm(g)(τ) =Nm(f)(τ) = 1. This means there exists 1≤j≤q+ 1 such thatKm−1(fi, gj) = 1. Similarly, for each 1≤j≤q+ 1 there exists 1≤i≤q+ 1 such thatKm−1(fi, gj) = 1.

Step 2. We will now show that

f₁=g₁ or f₁is periodic with period ≤q+ 1 .

Letj^∗be the smallest index 1≤j≤q+1 such thatKm−1(f1, gj) = 1. Iff1^(k)∼gj^∗for some 2≤k≤q+1, thenf1 is k-periodic and we are done. Now assume f1 =gj^∗. Ifj^∗= 1, then f1 =g1 and we are done.

Suppose now thatj^∗>1. By the choice ofj^∗, we can find 2≤i^∗≤q+1 such thatKm−1(fi^∗, gj^∗−1) = 1.

We consider two cases.

1. Case 1: fi^∗ =gj^∗−1, which means bd =ad+i^∗−j^∗+1 for j^∗−1 ≤d ≤ j^∗+|vm−1| −2. Then for each 1 ≤ d ≤ |vm−1| −i^∗, from f1 = gj^∗ we havead = bd+j^∗−1, and from fi^∗ = gj^∗−1 we have bd+j^∗−1=a_(d+j^∗_−1)+i^∗_−j^∗₊₁=ad+i^∗. This shows thatf1 isi^∗-periodic.

2. Case 2: fi^∗^(k)∼gj^∗−1for some 2≤k≤q+1. Sincegj^∗−1isk-periodic, the substringa1. . . a|vm−1|−1= b_j^∗. . . b_j^∗_+|v_m−1_|−2 is also k-periodic. Moreover, since f_i^∗ is k-periodic, we also have a_|v_m−1_| = a_|v_m−1_|−k. This shows thatf₁=a₁. . . a_|v_m−1_|isk-periodic.

Step 3. Similar to the previous step, we can show that the following conditions are true:

f₁=g₁ or g₁ is periodic with period ≤q+ 1,

fq+1=gq+1 or fq+1 is periodic with period ≤q+ 1, and f_q+1=g_q+1 or g_q+1 is periodic with period ≤q+ 1 . Thus we can obtain the following conclusion:

f₁=g₁ or f₁andg₁are periodic with period ≤q+ 1, and fq+1=gq+1 or fq+1andgq+1 are periodic with period ≤q+ 1 .

Note that in the statement above, whenf1andg1are periodic their periods do not have to be the same, and similarly forfq+1 andgq+1.

Step 4. Finally, we now show that eitherf =g orf ^(k)∼g for some 2≤k≤q+ 1. Using the conclusion of the previous step, we consider four possibilities.

1. Supposef₁=g₁and f_q+1=g_q+1. In this case we immediately obtainf =g.

2. Suppose f₁ = g₁, f_q+1 is k₁-periodic, and g_q+1 is k₂-periodic, where k₁, k₂ ≤ q+ 1. Then the substring a_q+1. . . a_|v_m−1_|=b_q+1. . . b_|v_m−1_|, which is of length|v_m−1| −q =|v_m| −2q≥2q+ 2, is bothk1-periodic andk2-periodic. ByLemma D.4, this implies thataq+1. . . a_|v_m−1_|=bq+1. . . b_|v_m−1_| is periodic with period k= gcd(k1, k2). In particular, this means both fq+1 and gq+1 are alsok- periodic. Now given |vm−1|+ 1≤d≤ |vm|, choosex∈Nsuch that q+ 1≤d−xk≤ |vm−1|, and observe thatad=ad−xk=bd−xk=bd. Together withf1=g1, we conclude thatf =g.

3. Supposef1andg1are periodic with period at mostq+ 1, andfq+1=gq+1. By the same argument as in the previous case, we can show thatf =g.

4. Now suppose that f₁,f_q+1, g₁, andg_q+1 are periodic with periods at most q+ 1 (but the periods do not have to be the same). ApplyingLemma D.4 to the substring a_q+1. . . a_|v_m−1_|, we see that f₁ and f_q+1 are k₁-periodic for some k₁ ≤q+ 1, and hence f is also k₁-periodic. Similarly, g is k₂-periodic for somek₂ ≤q+ 1. Let 1≤j ≤q+ 1 be such that K_m−1(f₁, g_j) = 1. We have two possibilities to consider.

(a) Iff1=gj, then byLemma D.4we know thatf1=gj is periodic with periodk⁰= gcd(k1, k2)≤ q+ 1, and thusf andgare alsok⁰-periodic. Sincef₁=g_j, we conclude thatf ^(k∼⁰⁾g.

(b) On the other hand, supposef1^(k)∼ gj for some 2≤k≤q+ 1. Since f1 is bothk-periodic and k1-periodic,Lemma D.4 tells us thatf1is periodic with periodk3= gcd(k, k1), and hence f is alsok3-periodic. Similarly, sincegj isk-periodic and k2-periodic, Lemma D.4tells us that gj

is periodic with period k4 = gcd(k, k2), and hence g is k4-periodic. In particular, this means bothf andg arek-periodic. Then fromf₁^(k)∼g_j, we conclude thatf ^(k)∼g.

This completes the proof of the forward direction.

Now we show the converse direction. Clearly if f = g then we have Km(f, g) = 1. Now suppose f ^(k)∼ g for some 2 ≤k ≤q+ 1. This implies that the collection of |v_m−1|-substrings off contains the same unique substrings as the collection of|vm−1|-substrings ofg. That is, as sets,{fi|1≤i≤q+ 1}= {gi|1≤i≤q+ 1}. Then for allτ∈ Tm−1,

N_m(f)(τ) = max

1≤i≤q+1hNb_m−1(f_i), τi= max

1≤i≤q+1hNb_m−1(g_i), τi=N_m(g)(τ) . Therefore,Nm(f) =Nm(g), and we conclude thatKm(f, g) = 1, as desired.

The following lemma, which is very similar toLemma D.5, states that if at some layer we see a jump in the patch sizes that we have encountered before, then we do not obtain new equivalence classes of the derived kernel.

Lemma D.6. Consider an architecture withn≥2 layers. Let q∈Nand2≤m≤n, and suppose that:

(i)|v_m| − |v_m−1| ≤q, (ii) |v_m| ≥4q+ 2, and (iii) at layer m−1 we have

K_m−1(f, g) = 1 if and only if f =g orf ^(k)∼g for some2≤k≤q+ 1 . Then at layermwe have

Km(f, g) = 1 if and only if f =g orf ^(k)∼g for some 2≤k≤q+ 1 .

Proof. The proof of the forward direction is identical to the proof of Lemma D.5. For the converse direction, clearly f = g implies Km(f, g) = 1. Now suppose f ^(k)∼ g for some 2 ≤ k ≤ q+ 1. Let d = |v_m| − |v_m−1| ≤ q. Then for each 1 ≤ i ≤ d+ 1 we have f_i ^(k)∼ g_i, where f_i and g_i are the i-th

|v_m−1|-substrings off and g, respectively. By our assumption, this meansK_m−1(f_i, g_i) = 1, and thus Nb_m−1(f_i) =Nb_m−1(g_i). Then for allτ∈ T_m−1,

Nm(f)(τ) = max

1≤i≤q+1hNbm−1(fi), τi= max

1≤i≤q+1hNbm−1(gi), τi=Nm(g)(τ) . HenceNm(f) =Nm(g), and we conclude thatKm(f, g) = 1, as desired.

Given the preliminary results above, we can now proveTheorem 5.13easily.

Proof of Theorem 5.13. Let`₁ = 0, and for 2≤m ≤n let`_m denote the maximum jump in the subsequent patch sizes up to layerm,

`_m= max

2≤m⁰≤m |v_m| − |v_m−1| .

Note that`_m≤`_m+1and`_n=`. We will show that for each 1≤m≤n,

K_m(f, g) = 1 if and only if f =g orf ^(k)∼gfor some 2≤k≤`_m+ 1 . The statement of the theorem will then follow from the claim above by takingm=n.

We proceed by induction. The claim above is true form= 1 since we assumeK₁is fully discriminative.

Now assume the claim is true at layerm−1. Note that at layer mwe have

|vm| ≥ |v1|+`m≥3`+ 2 +`m≥4`m+ 2 .

If`m> `m−1, then|vm| − |vm−1|=`m. By the induction hypothesis, at layerm−1 we have K_m−1(f, g) = 1 implies f =gor f ^(k)∼g for some 2≤k≤`_m+ 1 ,

and so byLemma D.5 we conclude that the claim holds at layerm. On the other hand, if`_m =`_m−1, then by the induction hypothesis at layerm−1 we have

K_m−1(f, g) = 1 if and only if f =g orf ^(k)∼g for some 2≤k≤`_m+ 1 , and so byLemma D.6we conclude that the claim holds at layerm.

In document Generalization and Properties of the Neural Response (página 51-55)