The common mistake is to think that if we satisfy the criteria of convergence, that is, independence and finite variance, that central limit is a given.Take the conventional formulation of the Central Limit Theorem1:
Let X1, X2 ,... be a sequence of independent identically distributed random variables with mean m & variance 2 satisfying m< 1 and 0 < 2<1, then
1Feller 1971, Vol. II
5.2. PREASYMPTOTICS AND CENTRAL LIMIT IN THE REAL WORLD 103
PN
i=1Xi N m pn
! N(0, 1)as n ! 1D
Where ! is converges “in distribution” and N(0,1) is the Gaussian with mean 0 andD unit standard deviation.
Granted convergence “in distribution” is about the weakest form of convergence.
Effectively we are dealing with a double problem.
The first, as uncovered by Jaynes, corresponds to the abuses of measure theory: Some properties that hold at infinity might not hold in all limiting processes .
There is a large difference between convergence a.s. (almost surely) and the weaker forms.
Jaynes 2003 (p.44):“The danger is that the present measure theory notation presup-poses the infinite limit already accomplished, but contains no symbol indicating which limiting process was used (...) Any attempt to go directly to the limit can result in nonsense”.
We accord with him on this point –along with his definition of probability as infor-mation incompleteness, about which later.
The second problem is that we do not have a “clean” limiting process –the process is itself idealized.
Now how should we look at the Central Limit Theorem? Let us see how we arrive to it assuming “independence”.
The Kolmogorov-Lyapunov Approach and Convergence in the Body. 2 The CLT works does not fill-in uniformily, but in a Gaussian way indeed, disturbingly so.
Simply, whatever your distribution (assuming one mode), your sample is going to be skewed to deliver more central observations, and fewer tail events. The consequence is that, under aggregation, the sum of these variables will converge “much” faster in the⇡
body of the distribution than in the tails. As N, the number of observations increases, the Gaussian zone should cover more grounds... but not in the “tails”.
This quick note shows the intuition of the convergence and presents the difference between distributions.
Take the sum of of random independent variables Xi with finite variance under distribution '(X). Assume 0 mean for simplicity (and symmetry, absence of skewness to simplify).
A more useful formulation is the Kolmogorov or what we can call "Russian" approach of working with bounds:
2See Loeve for a presentation of the method of truncation used by Kolmogorov in the early days before Lyapunov started using characteristic functions.
Figure 5.4: Q-Q Plot of N Sums of variables distributed according to the Student T with 3 de-grees of freedom, N=50, compared to the Gaus-sian, rescaled into standard deviations. We see on both sides a higher incidence of tail events.
106simulations
So the distribution is going to be:
✓ 1
Z u u
e Z22 dZ
◆
,for u z u
inside the “tunnel” [-u,u] –the odds of falling inside the tunnel itself, and
Z u 1
Z'0(N )dz + Z 1
u
Z'0(N )dz
outside the tunnel, in [ u, u],where '0(N )is the n-summed distribution of '.
How '0(N )behaves is a bit interesting here –it is distribution dependent.
Before continuing, let us check the speed of convergence per distribution. It is quite interesting that we the ratio of observations in a given sub-segment of the distribution is in proportion to the expected frequency NN1uu
1 where Nuu, is the numbers of observations falling between -u and u. So the speed of convergence to the Gaussian will depend on
Nuu
N11 as can be seen in the next two simulations.
To have an idea of the speed of the widening of the tunnel ( u, u) under summation, consider the symmetric (0-centered) Student T with tail exponent ↵= 3, with density
2a3
⇡(a2+x2)2, and variance a2. For large “tail values” of x, P (x) ! ⇡x2a34. Under summation of N variables, the tail P (⌃x) will be 2N a⇡x43. Now the center, by the Kolmogorov version of the central limit theorem, will have a variance of Na2 in the center as well, hence
P (⌃ x) = e 2a2 Nx2 p2⇡ap
N Setting the point u where the crossover takes place,
5.2. PREASYMPTOTICS AND CENTRAL LIMIT IN THE REAL WORLD 105
Figure 5.5: The Widening Center. Q-Q Plot of variables distributed according to the Stu-dent T with 3 degrees of freedom compared to the Gaussian, rescaled into standard deviation, N=500. We see on both sides a higher incidence of tail events. 107simulations.
2000 4000 6000 8000 10 000
N u
Figure 5.6: The behavior of the
"tunnel" under summation
e 2aNx2 p2⇡ap
N '2N a3
⇡x4 , hence u4e 2aNu2 ' p22ap3p⇡aN N, which produces the solution
±u = ±2ap N
s W
✓ 1
2N1/4(2⇡)1/4
◆ ,
where W is the Lambert W function or product log which climbs very slowly3, particu-larly if instead of considering the sum u we rescaled by 1/ap
N.
Note about the crossover. See the competing Nagaev brothers, s.a. S.V. Nagaev(1965,1970,1971,1973), and A.V. Nagaev(1969) etc. There are two sets of inequalities, one lower one below which
the sum is in regime 1 (thin-tailed behavior), an upper one for the fat tailed behavior, where the cumulative function for the sum behaves likes the maximum . By Nagaev (1965) For a regularly varying tail, where E (|X|m) <1 the minimum of the crossover
3Interestingly, among the authors on the paper on the Lambert W function figures Donald Knuth:
Corless, R. M., Gonnet, G. H., Hare, D. E., Jeffrey, D. J., Knuth, D. E. (1996). On the LambertW function. Advances in Computational mathematics, 5(1), 329-359.
should be to the left ofq
m
2 1 N log(N )(normalizing for unit variance) for the right tail (and with the proper sign adjustment for the left tail).
So P>P XN i
Generalizing for all exponents > 2. More generally, using the reasoning for a broader set and getting the crossover for powelaws of all exponents:
p4 since the standard deviation is aq
↵