Formación del profesorado - Profesorado de secundaria

ANTECEDENTES Y CUESTIONES TEÓRICAS

Capítulo 3. Profesorado de secundaria

3.2. Formación del profesorado

^.7.02 .04 .04^{.05 .05} .08 .01 .01



.

Recalling the notation of this section, compute ¯⁽⁰⁾, ¯⁽¹⁾, and ¯(S²) for this source alphabet. Observe that ¯⁽¹⁾> ¯(S²)/2.

7.2 The Shannon bound for higher-order encoding

Again, suppose that k≥ 1 and the “(k + 1)-gram” frequencies f (i1,...,ik+1) are given. Recall that the k-gram frequencies are then known: f(i1,...,ik) =

j=1 f(i1,...,ik, j) =m

j=1 f( j,i1,...,ik).

In the preceding section we looked at kth-order Huffman encoding. Clearly other kth-order replacement scheme strategies are possible; you need only sup-ply a prefix-condition scheme for encoding s1,...,smfor each context si₁···si_k. Let, for some such association of schemes to contexts, (i1,...,ik, j) be the length of the code word for sj in the scheme corresponding to context si₁···si_k, and

¯(i1,...,ik) =

m j=1

P(sj| si₁···si_k)(i1,...,ik, j)

m j=1

f(i1,...,ik, j)

f(i1,...,ik) (i1,...,ik, j).

7.2 The Shannon bound for higher-order encoding 187

Then the average length of a code word replacing a source letter, using whatever our context schemes are, is

¯ =

1≤i1,...,ik≤m

f(i1,...,ik) ¯(i1,...,ik)

1≤i1,...,ik+1≤m

f(i1,...,ik+1)(i1,...,ik+1),

as in the preceding section, where the method was kth-order Huffman encoding.

Since Huffman’s algorithm gives the minimal ¯(i1,...,ik) for each context si···si_k, among prefix-condition schemes associable to that context, it follows from the preceding that ¯^(k), the value of ¯ for kth-order Huffman encoding, will be the smallest kth-order ¯ achievable. Therefore, in thinking about bounds on compression achievable with kth-order replacement schemes, we may as well stick with kth-order Huffman encoding. Henceforward,(i1,...,ik, j) and ¯^(k) will be as in Section 7.1.

Let

H(i1,...,ik) = −

m j=1

P(sj| si₁···sik)log2P(sj | si₁···sik)

= −

m j=1

f(i1,...,ik, j)

f(i1,...,ik) log₂ f(i1,...,ik, j) f(i1,...,ik) ,

the “entropy of the source in context si1···sik.” We define the kth-order entropy of S= {s1,...,sm} to be

H^(k)(S) = H^(k)=

1≤i1,...,ik≤m

f(i1,...,ik)H (i1,...,ik).

Plugging the full gory expression for H(i1,...,ik) into the expression for H^(k), thrashing about and doing what comes naturally with logarithms, one finds that

H^(k)(S) = H (S^k⁺¹) − H (S^k), where

H(S^k⁺¹) = −

1≤i1,...,ik+1≤m

f(i1,...,ik+1)log₂f(i1,...,ik+1)

is the plain old zeroth-order entropy of S^k⁺¹. We have, for each context si₁···si_k,

H(i1,...,ik) ≤ ¯(i1,...,ik) < H (i1,...,ik) + 1,

by the Noiseless Coding Theorem, plus the fact that the average code word length obtainable by Huffman’s algorithm is the best (smallest) obtainable with a prefix code.

7.2.1 Theorem The average code word length ¯^(k)(S) achieved bykth-order Huffman encoding applied to a source alphabetSsatisfiesH^(k)(S) ≤ ¯^(k)(S) <

H^(k)(S) + 1.

Proof: By the preceding remarks, H^(k)(S) =

1≤i1,...,ik≤m

f(i1,...,ik)H (i1,...,ik)

≤

1≤i1,...,ik≤m

f(i1,...,ik) ¯(i1,...,ik) = ¯^(k)(S)

1≤i1,...,ik≤m

f(i1,...,ik)(H (i1,...,ik) + 1)

= H^(k)(S) +

i₁,...,ik

f(i1,...,ik) = H^(k)(S) + 1.

7.2.2 Corollary If the sj are binary words with average length ¯L, then the compression ratio _{¯L/ ¯}^(k)(S)achieved bykth-order Huffman encoding applied toSsatisfies _¯L/(H^(k)(S) + 1) < ¯L/ ¯^(k)≤ ¯L/H^(k).

As mentioned in the last section, we do not know whether or not ¯^(k)always decreases as k increases. If this were the case, then increasing the order repays your effort with a better compression ratio. However, when m= 256, as is often the case, it is a lot of trouble to increase the order, and actual case studies with k= 0,1,2,3 show a discouragingly small improvement in the compression ratio going from k= 1 to k = 2, and a minuscule improvement obtainable by taking k= 3.

This sort of experimental observation agrees with the behavior predicted by the theory developed by Claude Shannon [63, 65]. In practice it is impossible to let k get very large, much less go to infinity. And, in fact, there is a theoretical obstacle to letting k go to infinity: we would have to have an infinitely long source text, given our notion of how the relative frequencies f(i1,...,ik+1) are obtained. Shannon gets around this difficulty by envisioning “the source”

as a probabilistic finite state automaton, a system of states; as time pulses on discretely, the current state changes (or not) at each pulse, and source letters are emitted. What the next state will be and which letter is emitted are both random variables depending on the current state—that is, the different possibilities have their probabilities, and those probabilities vary with the current state. Thus there is a hypothetically endless string of source letters emitted, with statistical properties, including the probabilities f(i1,...,ik+1), for each k, determined by the nature of the source automaton.

Is every source “language” correctly (whatever that means) modeled by some such source automaton? This is a far deeper question than we will ever an-swer, although we will have a bit more to say about it in Section 7.4. For now, let us assume that our source is one of these Shannon automata. Shannon showed

7.2 The Shannon bound for higher-order encoding 189

that the kth-order entropies H^(k)tend to a limit, let us call it H^(∞), which Shan-non called the entropy of the source. Thus the ShanShan-non bound ¯L/H^(k) on the compression ratio achieved with kth-order Huffman encoding tends to a limit

¯L/H^(∞)if H^(∞)> 0. Consequently, when H^(∞)> 0, the compression ratio

¯L/ ¯^(k)cannot be increased without bound by taking k larger and larger. The ex-perimental case studies mentioned above, with ¯⁽²⁾not much smaller than ¯⁽¹⁾ and ¯⁽³⁾very close to ¯⁽²⁾, are very much in accord with the picture suggested by Shannon’s results and Theorem 7.2.1 of the compression ratio coming to a screeching halt at some unbreachable limit, as k increases.

This is an instance of difficult mathematics confirming intuition. If we re-quire lossless compression, meaning that the original file shall always be com-pletely recoverable from its encoded version, then surely there should be some natural limits, depending on the nature of the original file, to how much com-pression can be realized. However, it is important to realize that the Shannon bounds on the compression ratio, of the form ¯L/H^(k), k= 0,1,...,∞, apply to the replacement-by-encoding-scheme methods discussed in this chapter. As we have seen in the last chapter, these bounds can be beaten by other methods in some cases. So the natural bound to the compression ratio, even given a Shannon automaton-type source, may not be the Shannon bound ¯L/H^(∞).

One last remark about the Shannon bounds ¯L/H^(k): Shannon asserts, but does not show, that the H^(k)non-increase with k in case the source is a proba-bilistic finite state automaton. Therefore, the Shannon bounds on the compres-sion ratio, ¯L/H^(k), are going in the right direction (up!) as k increases, even though we do not know about the actual compression ratios, ¯L/ ¯^(k), achievable by kth-order Huffman encoding.

We finish this section with an elementary verification of Shannon’s asser-tion about the monotonicity of the H^(k), without assuming anything about the nature of the source.

7.2.3 Theorem Suppose thatk≥ 1and the_{(k +1)}-gram frequencies f(i1,..., ik+1),1≤ i1,...,ik+1≤ m, for anm-letter sourceSare known. Then

H^(k)(S) ≤ H^(k−1)(S).

Proof: Observe that h(x) = −x log2x has negative second derivative on(0,∞) and so is concave on[0,∞). (Note: h(0) = 0, by convention.) Therefore, for λ1,...,λr ≥ 0 with

= 7.2.4 Corollary WithkandSas above,

(k + 1)H^(k)(S) ≤ H (S^k⁺¹) ≤k+ 1

k H(S^k) ≤ (k + 1)H (S).

The left-hand inequality, and its proof below, are due to Shannon [63].

Proof:

H(S^k⁺¹) = (H (S^k⁺¹) − H (S^k))

+ (H (S^k) − H (S^k⁻¹)) + ··· + (H (S) − 0)

= H^(k)(S) + H^(k−1)(S) + ··· + H⁽⁰⁾(S)

≥ (k + 1)H^(k)(S), by the theorem above. Therefore, also,

H(S^k⁺¹) ≥ (k + 1)H^(k)(S) = (k + 1)(H (S^k⁺¹) − H (S^k))

1. Compute H⁽⁰⁾and H⁽¹⁾for the source of Exercise 7.1.1, and the Shannon bounds ¯L/H⁽⁰⁾and ¯L/H⁽¹⁾.

2. Compute H⁽⁰⁾and H⁽¹⁾for the source of Exercise 7.1.4.

3. Show that, for k≥ 1, H^(k)(S) = H (S^k⁺¹) − H (S^k), as asserted in this sec-tion.

4. Show that if the sj occur randomly and independently in the source text, with relative frequencies f1,..., fm, so that, for each k, f(i1,...,ik) =

In document UNIVERSIDAD COMPLUTENSE DE MADRID (página 56-61)