La movilidad internacional estudiantil en el posgrado

The common mistake is to think that if we satisfy the criteria of convergence, that is, independence and finite variance, that central limit is a given.Take the conventional formulation of the Central Limit Theorem¹:

Let X1, X2 ,... be a sequence of independent identically distributed random variables with mean m & variance ² satisfying m< 1 and 0 < ²<1, then

1Feller 1971, Vol. II

5.2. PREASYMPTOTICS AND CENTRAL LIMIT IN THE REAL WORLD 103

i=1Xi N m pn

! N(0, 1)as n ! 1D

Where ! is converges “in distribution” and N(0,1) is the Gaussian with mean 0 and^D unit standard deviation.

Granted convergence “in distribution” is about the weakest form of convergence.

Eﬀectively we are dealing with a double problem.

The first, as uncovered by Jaynes, corresponds to the abuses of measure theory: Some properties that hold at infinity might not hold in all limiting processes .

There is a large diﬀerence between convergence a.s. (almost surely) and the weaker forms.

Jaynes 2003 (p.44):“The danger is that the present measure theory notation presup-poses the infinite limit already accomplished, but contains no symbol indicating which limiting process was used (...) Any attempt to go directly to the limit can result in nonsense”.

We accord with him on this point –along with his definition of probability as infor-mation incompleteness, about which later.

The second problem is that we do not have a “clean” limiting process –the process is itself idealized.

Now how should we look at the Central Limit Theorem? Let us see how we arrive to it assuming “independence”.

The Kolmogorov-Lyapunov Approach and Convergence in the Body. ² The CLT works does not fill-in uniformily, but in a Gaussian way indeed, disturbingly so.

Simply, whatever your distribution (assuming one mode), your sample is going to be skewed to deliver more central observations, and fewer tail events. The consequence is that, under aggregation, the sum of these variables will converge “much” faster in the⇡

body of the distribution than in the tails. As N, the number of observations increases, the Gaussian zone should cover more grounds... but not in the “tails”.

This quick note shows the intuition of the convergence and presents the diﬀerence between distributions.

Take the sum of of random independent variables Xi with finite variance under distribution '(X). Assume 0 mean for simplicity (and symmetry, absence of skewness to simplify).

A more useful formulation is the Kolmogorov or what we can call "Russian" approach of working with bounds:

2See Loeve for a presentation of the method of truncation used by Kolmogorov in the early days before Lyapunov started using characteristic functions.

Figure 5.4: Q-Q Plot of N Sums of variables distributed according to the Student T with 3 de-grees of freedom, N=50, compared to the Gaus-sian, rescaled into standard deviations. We see on both sides a higher incidence of tail events.

10⁶simulations

So the distribution is going to be:

✓ 1

Z u u

e ^Z2² dZ

◆

,for u  z  u

inside the “tunnel” [-u,u] –the odds of falling inside the tunnel itself, and

Z u 1

Z'⁰(N )dz + Z 1

Z'⁰(N )dz

outside the tunnel, in [ u, u],where '⁰(N )is the n-summed distribution of '.

How '⁰(N )behaves is a bit interesting here –it is distribution dependent.

Before continuing, let us check the speed of convergence per distribution. It is quite interesting that we the ratio of observations in a given sub-segment of the distribution is in proportion to the expected frequency _N^N1^u^u

1 where N^uu, is the numbers of observations falling between -u and u. So the speed of convergence to the Gaussian will depend on

N^u_u

N¹₁ as can be seen in the next two simulations.

To have an idea of the speed of the widening of the tunnel ( u, u) under summation, consider the symmetric (0-centered) Student T with tail exponent ↵= 3, with density

2a³

⇡(a²+x²)², and variance a². For large “tail values” of x, P (x) ! ⇡x^2a³⁴. Under summation of N variables, the tail P (⌃x) will be ^{2N a}_⇡x4³. Now the center, by the Kolmogorov version of the central limit theorem, will have a variance of Na² in the center as well, hence

P (⌃ x) = e ^{2a2 N}^x2 p2⇡ap

N Setting the point u where the crossover takes place,

5.2. PREASYMPTOTICS AND CENTRAL LIMIT IN THE REAL WORLD 105

Figure 5.5: The Widening Center. Q-Q Plot of variables distributed according to the Stu-dent T with 3 degrees of freedom compared to the Gaussian, rescaled into standard deviation, N=500. We see on both sides a higher incidence of tail events. 10⁷simulations.

2000 4000 6000 8000 10 000

N u

Figure 5.6: The behavior of the

"tunnel" under summation

e ^2aN^x2 p2⇡ap

N '2N a³

⇡x⁴ , hence u⁴e ^2aN^u2 ' ^p^22a^p³^p_⇡^{aN N}, which produces the solution

±u = ±2ap N

s W

✓ 1

2N^1/4(2⇡)^1/4

◆ ,

where W is the Lambert W function or product log which climbs very slowly³, particu-larly if instead of considering the sum u we rescaled by 1/ap

Note about the crossover. See the competing Nagaev brothers, s.a. S.V. Nagaev(1965,1970,1971,1973), and A.V. Nagaev(1969) etc. There are two sets of inequalities, one lower one below which

the sum is in regime 1 (thin-tailed behavior), an upper one for the fat tailed behavior, where the cumulative function for the sum behaves likes the maximum . By Nagaev (1965) For a regularly varying tail, where E (|X|^m) <1 the minimum of the crossover

3Interestingly, among the authors on the paper on the Lambert W function figures Donald Knuth:

Corless, R. M., Gonnet, G. H., Hare, D. E., Jeﬀrey, D. J., Knuth, D. E. (1996). On the LambertW function. Advances in Computational mathematics, 5(1), 329-359.

should be to the left ofq

2 1 N log(N )(normalizing for unit variance) for the right tail (and with the proper sign adjustment for the left tail).

So P>P X^N i

Generalizing for all exponents > 2. More generally, using the reasoning for a broader set and getting the crossover for powelaws of all exponents:

p4 since the standard deviation is aq

↵

In document EDUCACIÓN, MOVILIDAD, MIGRACIÓN Y EXILIO: (página 92-105)