Reversal Invariance of the Derived Kernel

D.2.1 Reversal Invariance from the Initial Kernel

We note that the result of Theorem 5.6 actually follows immediately from Proposition 3.1 and Corol- lary 4.1 in [56]. However, for the sake of completeness we reconstruct the proof here because part of the argument will be needed to proveTheorem 5.7andTheorem 5.9.

Proof of Theorem 5.6. It suffices to show that if K_m is reversal invariant then K_m+1 is reversal invariant, for 1≤m≤n−1. First, from the assumption that K_m is reversal invariant we have

Nbm(g) =Nbm(g◦r) for allg∈Im(vm) .

Next, sinceH_mis closed under taking counterparts and χis an involution, χ:H_m→H_mis a bijection.

Furthermore, givenh=h_i∈H_m, we note that for 1≤j≤ |v_m|we have r_|v_m+1_|◦hi

(j) =r_|v_m+1_|(i+j−1)

=|vm+1|+ 1−(i+j−1)

=|v_m+1|+ 2−i−j , and similarly,

χ(hi)◦r|vm|

(j) = h|vm+1|−|vm|+2−i◦r|vm| (j)

=h|vm+1|−|vm|+2−i(|vm|+ 1−j)

=|v_m+1| − |v_m|+ 2−i+ (|v_m|+ 1−j)−1

=|vm+1|+ 2−i−j . This shows that

r_|v_m+1_|◦h=χ(h)◦r_|v_m_| for allh∈H_m . Therefore, for eachf ∈Im(vm+1) andτ∈ Tm we have

N_m+1(f ◦r_|v_m+1_|)(τ) = max

h∈Hm

bN_m(f◦r_|v_m+1_|◦h), τ

= max

h∈Hm

bNm(f◦χ(h)◦r_|v_m_|), τ

= max

h∈Hm

bN_m(f◦χ(h)), τ

=N_m+1(f)(τ) .

This gives usNm+1(f◦r) =Nm+1(f), and thusKm+1(f, f◦r) = 1. Since this holds for allf ∈Im(vm+1), we conclude thatKm+1 is reversal invariant.

Remark D.1. By inspecting the proof above, we see thatTheorem 5.6also holds when we use a general pooling function Ψ that is invariant under permutations of its inputs. That is, we can replace the max pooling function inTheorem 5.6with a pooling function Ψ satisfying the property that for eachk∈N,

Ψ(α) = Ψ(π(α)) for allα∈R^k andπ∈Sk ,

whereSk is the symmetric group of degreek, and given α= (α1, . . . , αk)∈R^k andπ∈Sk, π(α) = (α_π(1), . . . , α_π(k))∈R^k .

For example, the average pooling function is invariant under permutations of its inputs.

D.2.2 Reversal Invariance from the Transformations

Proof of Theorem 5.7. Recall from the proof ofTheorem 5.6that for eachh∈L⊆L_n−1 we have r_|v_n_|◦h=χ(h)◦r_|v_n−1_| .

Then, givenf ∈Im(vn) we note that f ◦r_|v_n_|◦h|h∈H_n−1 =

f◦r_|v_n_|◦h|h∈L ∪

f◦r_|v_n_|◦χ(h)◦r_|v_n−1_||h∈L

f◦χ(h)◦r_|v_n−1_||h∈L ∪

f ◦r_|v_n_|◦r_|v_n_|◦h|h∈L

f◦χ(h)◦r_|v_n−1_||h∈L ∪

f ◦h|h∈L

f◦h|h∈Hn−1 . Therefore, for eachτ∈ Tn−1 we have

N_n(f ◦r)(τ) = max

h∈Hn−1

bN_n−1(f ◦r◦h), τ

= max

h∈Hn−1

bN_n−1(f◦h), τ

=N_n(f)(τ) .

This gives us Nn(f◦r) = Nn(f), and thus Kn(f, f ◦r) = 1. Since this holds for all f ∈ Im(vn), we conclude thatK_n is reversal invariant.

Remark D.2. As noted in Remark D.1, by inspecting the proof above we see that Theorem 5.7 also holds when we use a general pooling function Ψ that is invariant under permutations of its inputs.

D.2.3 Reversal Invariance from the Templates

Proof of Theorem 5.9. It suffices to show that ifK_m is reversal symmetric then K_m+1 is reversal invariant, for 1 ≤ m ≤ n−1. The conclusion will then follow since we assume K₁ to be reversal symmetric, and as we noted above, reversal invariance implies reversal symmetry. First recall from the proof ofTheorem 5.6thatχ: H_m→H_mis a bijection and

r_|v_m+1_|◦h=χ(h)◦r_|v_m_| for allh∈H_m . Letf ∈Im(vm+1). Given a templateτ∈ Tm, write

τ=X^d

i=1

c_iNb_m(t_i) for somed∈N, c_i∈R, andt_i∈Im(v_m) . Then, using the reversal symmetry ofK_m, we can write

Nm+1(f◦r_|v_m+1_|)(τ) = max

h∈Hm

bNm(f ◦r_|v_m+1_|◦h), τ

= max

h∈Hm

bN_m(f ◦χ(h)◦r_|v_m_|), τ

= max

h∈Hm

Xd i=1

ciKm f◦χ(h)◦r|vm|, ti!

= max

h∈Hm

Xd i=1

c_iK_m f◦χ(h), t_i◦r_|v_m_|!

= max

h∈Hm

bNm(f ◦χ(h)), τ◦r_|v_m_|

=N_m+1(f)(τ◦r_|v_m_|) . Since we assumeτ is symmetric, we then have

Nm+1(f◦r)(τ) =Nm+1(f)(τ) for allτ∈ Tm .

This gives usNm+1(f) =Nm+1(f◦r), and thusKm+1(f, f◦r) = 1. Since this holds for allf ∈Im(vm+1), we conclude thatKm+1 is reversal symmetric, as desired.

Remark D.3. As noted inRemarks D.1andD.2, by inspecting the proof above we see thatTheorem 5.9 also holds when we use a general pooling function Ψ that is invariant under permutations of its inputs.

D.2.4 Impossibility of Learning Reversal Invariance with Exhaustive Templates

Proof of Theorem 5.11. In this proof we use of the following terminologies. Given a k-string f =a1. . . ak, thetailoff is the longest posterior substringaiai+1. . . ak with the property thataj=ak

for i ≤ j ≤ k. Let λ(f) denote the length of the tail of f. Furthermore, given 1 ≤ m ≤ n−1 and f ∈Im(vm), letef ∈Im(vm+1) denote theextensionoff obtained by extending the tail off, that is,

e_f(i) =

(f(i) if 1≤i≤ |v_m|, and f(|vm|) if|vm|+ 1≤i≤ |vm+1| .

For example, supposeS={a, b, c, d}, |v1|= 5, and |v2|= 8. Givenf =abcdd∈Im(v1), the tail of f is ddandλ(f) = 2, and the extension off isef =abcddddd∈Im(v2).

We now consider two cases, corresponding to the two conditions in the statement of the theorem.

Case (1). Suppose |vm+1| − |vm| = 1 for 1 ≤m ≤n−1. We will show that if Km is not reversal invariant then K_m+1 is not reversal invariant, for 1 ≤ m ≤ n−1. This will imply that if K₁ is not reversal invariant thenK_n is not reversal invariant, as desired.

Let 1≤m≤n−1, and assumeK_mis not reversal invariant. Choose awitnessfor the failure ofK_m to be reversal invariant, that is, a stringf ∈Im(v_m) such thatK_m(f, f◦r)<1. We will usef to exhibit a witness for the failure ofK_m+1 to be reversal invariant. To this end, we claim that either:

(a) K_m+1(e_f, e_f◦r)<1, or

(b) we can find another stringf⁰∈Im(v_m) with the property thatλ(f⁰)> λ(f) andK_m(f⁰, f⁰◦r)<1.

If we fall on Case (b), then we can repeat the procedure above withf⁰ in place off. However, observe that sinceλ(f)≤ |vm|and every iteration of this procedure through Case (b) increases λ(f), we must eventually land in Case (a), in which case we are done sinceef is a witness for the failure ofKm+1 to be reversal invariant. Thus it suffices to prove the claim above.

Since|v_m+1| − |v_m|= 1, the setH_m only consists of two translationsh₁ andh₂. Note that {e_f◦h|h∈H_m}={f, e_f◦h₂}

and {e_f◦r_|v_m+1_|◦h|h∈H_m}={e_f◦h₂◦r_|v_m_|, f◦r_|v_m_|} . IfKm(ef◦h2◦r, f)<1, then withτ =Nbm(f)∈ Tmwe have

N_m+1(e_f◦r_|v_m+1_|)(τ) = max

K_m(e_f ◦h₂◦r_|v_m_|, f), K_m(f◦r, f) <1 , while

N_m+1(e_f)(τ) = max

K_m(f, f), K_m(e_f ◦h₂, f) =K_m(f, f) = 1 .

This impliesN_m+1(e_f)6=N_m+1(e_f◦r), and hence K_m+1(e_f, e_f◦r)<1 byProposition 5.4. Similarly, ifKm(ef◦h2, f◦r)<1, then by considering τ⁰ =Nbm(f◦r) we can show that Nm+1(ef)(τ⁰)<1 and N_m+1(e_f◦r)(τ⁰) = 1, which impliesK_m+1(e_f, e_f◦r)<1.

Now suppose thatK_m(e_f◦h₂◦r, f) = 1 andK_m(e_f◦h₂, f◦r) = 1. Then notice that we must have K_m(e_f◦h₂, e_f◦h₂◦r)<1 ,

for otherwise

N_m(f) =N_m e_f◦h₂◦r

=N_m e_f◦h₂

=N_m(f ◦r) ,

contradicting the fact thatK_m(f, f◦r)<1. Sinceλ(e_f◦h₂) =λ(f) + 1 > λ(f), we can take the new witnessf⁰ to beef◦h2∈Im(vm), and we are done.

Case (2). SupposeK1 is reversal symmetric. We will first show that ifKmis reversal symmetric then Km+1is reversal symmetric, for 1≤m≤n−1. The assumption thatK1 is reversal symmetric will then imply thatKmis reversal symmetric for all 1≤m≤n.

Let 1 ≤m≤ n−1, and supposeKm is reversal symmetric. Recall from the proof of Theorem 5.6 thatχ:Hm→Hmis a bijection and

r_|v_m+1_|◦h=χ(h)◦r_|v_m_| for allh∈H_m .

Then, given a stringf ∈Im(vm+1) and a templateτ=Nbm(g⁰)∈ Tm, whereg⁰∈Im(vm), we can write Nm+1(f)(τ) = max

h∈Hm Km(f◦h, g⁰)

= max

h∈Hm K_m(f◦h◦r_|v_m_|, g⁰◦r_|v_m_|)

= max

h∈Hm K_m(f◦r_|v_m+1_|◦χ(h), g⁰◦r_|v_m_|)

=Nm+1(f◦r_|v_m+1_|)(τ◦r_|v_m_|) . Note that the mapping

Tm3τ =Nbm(g⁰)7−→^r Nbm(g⁰◦r) =τ◦r∈ Tm

is a bijection. This implies

kN_m+1(f)k=kN_m+1(f ◦r)k , and therefore,

Nb_m+1(f)(τ) =N_m+1(f)(τ)

kN_m+1(f)k = Nm+1(f ◦r_|v_m+1_|)(τ◦r_|v_m_|)

kN_m+1(f ◦r_|v_m+1_|)k =Nb_m+1(f◦r_|v_m+1_|)(τ◦r_|v_m_|) .

Hence, for allf, g∈Im(v_m+1) we have

K_m+1(f, g) = bN_m+1(f), Nb_m+1(g)

= X

τ∈Tm

Nb_m+1(f)(τ)Nb_m+1(g)(τ)

= X

τ∈Tm

Nbm+1(f◦r_|v_m+1_|)(τ◦r_|v_m_|)Nbm+1(g◦r_|v_m+1_|)(τ◦r_|v_m_|)

= bN_m+1(f◦r), Nb_m+1(g◦r)

=Km+1(f◦r, g◦r) , as desired.

Having proven thatK_mis reversal symmetric for 1≤m≤n, we will now follow the general method of Case (1) to show thatK_n is not reversal invariant. Let 1≤m ≤n−1 and assume thatK_m is not reversal invariant. Let f ∈ Im(v_m) be such that K_m(f, f ◦r) <1. Then to prove that K_m+1 is not reversal invariant, it suffices to show that eitherK_m+1(e_f, e_f◦r)<1, or there existsf⁰ ∈Im(v_m) such thatλ(f⁰)> λ(f) andK_m(f⁰, f⁰◦r)<1.

First supposeK_m(f◦r, e_f◦h_i)<1 for all 2≤i≤ |v_m+1| − |v_m|+ 1. Then, noticing that ef◦h1=f and ef◦r_|v_m+1_|◦χ(h1) =ef◦h1◦r_|v_m_|=f ◦r_|v_m_| , by considering the templateτ=Nb_m(f◦r)∈ T_m we see that

N_m+1(e_f)(τ) = max

h∈Hm K_m(e_f◦h, f◦r)<1 , while

Nm+1(ef◦r)(τ) = max

h∈Hm Km(ef ◦r_|v_m+1_|◦hi, f ◦r_|v_m_|) =Km(f ◦r, f◦r) = 1 . This shows thatN_m+1(e_f)6=N_m+1(e_f◦r), and henceK_m+1(e_f, e_f◦r)<1 byProposition 5.4.

On the other hand, suppose K_m(f ◦r, e_f ◦h_i) = 1 for some 2≤i ≤ |v_m+1| − |v_m|+ 1. Using the reversal symmetry ofK_m, we get

K_m(f, e_f◦h_i◦r) =K_m(f ◦r, e_f◦h_i) = 1 . Then notice that we must have

Km(ef◦hi, ef◦hi◦r)<1 , for otherwise

Nm(f) =Nm(ef◦hi◦r) =Nm(ef◦hi) =Nm(f◦r) ,

contradicting the fact that Km(f, f ◦r)<1. Sinceef is constructed by growing the tail of f, we have λ(ef◦hi)≥λ(f) + 1> λ(f). Thus we can take the new witness f⁰ to beef◦hi ∈Im(vm), and we are done.

In document Generalization and Properties of the Neural Response (página 47-51)