CAPÍTULO III. ASPECTOS METODOLÓGICOS
3.1 LA APROXIMACIÓN AL OBJETO DE ESTUDIO
One of the basic laws of physics, the second law of thermodynamics, states that the entropy of an isolated system is nondecreasing. We now explore the relationship between the second law and the entropy function that we defined earlier in this chapter.
In statistical thermodynamics, entropy is often defined as the log of the number of microstates in the system. This corresponds exactly to our notion of entropy if all the states are equally likely. But why does entropy increase?
We model the isolated system as a Markov chain with transitions obey- ing the physical laws governing the system. Implicit in this assumption is the notion of an overall state of the system and the fact that knowing the present state, the future of the system is independent of the past. In such a system we can find four different interpretations of the second law. It may come as a shock to find that the entropy does not always increase. However,relative entropy always decreases.
1. Relative entropyD(µn||µn)decreases withn. Letµnandµnbe two probability distributions on the state space of a Markov chain at time
n, and letµn+1 andµn+1 be the corresponding distributions at time n+1. Let the corresponding joint mass functions be denoted by
p and q. Thus, p(xn, xn+1)=p(xn)r(xn+1|xn) and q(xn, xn+1)= q(xn)r(xn+1|xn), where r(·|·) is the probability transition function
for the Markov chain. Then by the chain rule for relative entropy, we have two expansions:
D(p(xn, xn+1)||q(xn, xn+1))=D(p(xn)||q(xn))
=D(p(xn+1)||q(xn+1))
+D(p(xn|xn+1)||q(xn|xn+1)).
Since both p and q are derived from the Markov chain, the con- ditional probability mass functions p(xn+1|xn) and q(xn+1|xn) are
both equal tor(xn+1|xn), and henceD(p(xn+1|xn)||q(xn+1|xn))=0.
Now using the nonnegativity of D(p(xn|xn+1)||q(xn|xn+1))(Corol-
lary to Theorem 2.6.3), we have
D(p(xn)||q(xn))≥D(p(xn+1)||q(xn+1)) (4.44)
or
D(µn||µn)≥D(µn+1||µn+1). (4.45)
Consequently, the distance between the probability mass functions is decreasing with time nfor any Markov chain.
An example of one interpretation of the preceding inequality is to suppose that the tax system for the redistribution of wealth is the same in Canada and in England. Then if µn and µn represent the distributions of wealth among people in the two countries, this inequality shows that the relative entropy distance between the two distributions decreases with time. The wealth distributions in Canada and England become more similar.
2. Relative entropyD(µn||µ)between a distributionµnon the states at
time n and a stationary distribution µdecreases with n. In (4.45),
µn is any distribution on the states at time n. If we let µn be any stationary distribution µ, the distribution µn+1 at the next time is also equal to µ. Hence,
D(µn||µ)≥D(µn+1||µ), (4.46)
which implies that any state distribution gets closer and closer to each stationary distribution as time passes. The sequence D(µn||µ)
is a monotonically nonincreasing nonnegative sequence and must therefore have a limit. The limit is zero if the stationary distribution is unique, but this is more difficult to prove.
3. Entropy increases if the stationary distribution is uniform. In gen- eral, the fact that the relative entropy decreases does not imply that the entropy increases. A simple counterexample is provided by any Markov chain with a nonuniform stationary distribution. If we start
this Markov chain from the uniform distribution, which already is the maximum entropy distribution, the distribution will tend to the stationary distribution, which has a lower entropy than the uniform. Here, the entropy decreases with time.
If, however, the stationary distribution is the uniform distribution, we can express the relative entropy as
D(µn||µ)=log|X| −H (µn)=log|X| −H (Xn). (4.47)
In this case the monotonic decrease in relative entropy implies a monotonic increase in entropy. This is the explanation that ties in most closely with statistical thermodynamics, where all the micro- states are equally likely. We now characterize processes having a uniform stationary distribution.
Definition A probability transition matrix [Pij], Pij =Pr{Xn+1=
j|Xn=i}, is calleddoubly stochastic if
i Pij =1, j =1,2, . . . (4.48) and j Pij =1, i =1,2, . . . . (4.49)
Remark The uniform distribution is a stationary distribution of P if
and only if the probability transition matrix is doubly stochastic (see Problem 4.1).
4. The conditional entropy H (Xn|X1) increases with n for a station-
ary Markov process. If the Markov process is stationary, H (Xn) is
constant. So the entropy is nonincreasing. However, we will prove that H (Xn|X1) increases with n. Thus, the conditional uncertainty
of the future increases. We give two alternative proofs of this result. First, we use the properties of entropy,
H (Xn|X1)≥H (Xn|X1, X2) (conditioning reduces entropy)
(4.50)
=H (Xn|X2) (by Markovity) (4.51)
Alternatively, by an application of the data-processing inequality to the Markov chain X1 →Xn−1→Xn, we have
I (X1;Xn−1)≥I (X1;Xn). (4.53)
Expanding the mutual informations in terms of entropies, we have
H (Xn−1)−H (Xn−1|X1)≥H (Xn)−H (Xn|X1). (4.54)
By stationarity, H (Xn−1)=H (Xn), and hence we have
H (Xn−1|X1)≤H (Xn|X1). (4.55)
[These techniques can also be used to show that H (X0|Xn) is
increasing in nfor any Markov chain.]
5. Shuffles increase entropy. If T is a shuffle (permutation) of a deck of cards and X is the initial (random) position of the cards in the deck, and if the choice of the shuffle T is independent ofX, then
H (T X)≥H (X), (4.56) where T X is the permutation of the deck induced by the shuffle T
on the initial permutation X. Problem 4.3 outlines a proof.