• No se han encontrado resultados

Transformaciones de la Identidad ocupacional de mujeres en contextos de migración.

B. Objetivos Específicos:

4. Marco Teórico

4.2. Transformaciones de la Identidad ocupacional de mujeres en contextos de migración.

We will now resume the consideration of weak convergence of distributions that we began in Section1.3.4. Asymptotic distributions are the basis for the concept of asymptotic expectation, discussed in Section1.3.8below.

In many interesting cases, the limiting distribution of a sequence {Xn} is degenerate. The fact that {Xn} converges in probability to some given constant may be of interest, but in statistical applications, we are likely to be interested in how fast it converges, and what are the characteristics the sequence of the probability distribution that can be used for “large samples”. In this section we discuss how to modify a sequence so that the convergence is not degenerate. Statistical applications are discussed in Section3.8.

Normalizing Constants

Three common types of sequences{Xn}of interest are iid sequences, sequences of partial sums, and sequences of order statistics. Rather than focusing on the sequence {Xn}, it may be more useful to to consider a sequence of linear transformations of Xn,{Xn−bn}, where the form ofbn is generally different for iid sequences, sequences of partial sums, and sequences of order statistics.

Given a sequence of constants bn, ifXn−bn d

→0, we may be interested in the rate of convergence, or other properties of the sequence asn becomes

large. It may be useful to magnify the difference Xn −bn by use of some normalizing sequence of constants an:

Yn=an(Xn−bn). (1.192) While the distribution of the sequence {Xn−bn} may be degenerate, the sequence {an(Xn−bn)}may have a distribution that is nondegenerate, and this asymptotic distribution may be useful in statistical inference. (This ap- proach is called “asymptotic inference”.) We may note that even though we are using the asymptotic distribution of{an(Xn−bn)}, for a reasonable choice of a sequence of normalizing constants{an}, we sometimes refer to it as the asymptotic distribution of {Xn} itself, but we must remember that it is the distribution of the normalized sequence, {an(Xn−bn)}.

The shift constants generally serve to center the distribution, especially if the limiting distribution is symmetric. Although linear transformations are often most useful, we could consider sequences of more general transforma- tions ofXn; instead of{an(Xn−bn)}, we might consider{hn(Xn)}, for some sequence of functions {hn}.

The Asymptotic Distribution of {g(Xn)}

Applications often involve a differentiable Borel scalar functiong, and we may be interested in the convergence of {g(Xn)}. (The same general ideas apply when g is a vector function, but the higher-order derivatives quickly become almost unmanageable.) When we have {Xn} converging in distribution to X +b, what we can say about the convergence of {g(Xn)} depends on the differentiability ofg atb.

Theorem 1.46

Let X and {Xn} be random variables (k-vectors) such that an(Xn−bn)

d

→X, (1.193)

whereb1, b2, . . .is a sequence of constants such thatlimn→∞bn=b <∞, and a1, a2, . . .is a sequence of constant scalars such thatlimn→∞an =∞ or such

that limn→∞an=a >0. Now letg be a Borel function fromIRk toIRthat is

continuously differentiable at each bn. Then

an(g(Xn)−g(bn))→d (∇g(b))TX. (1.194) Proof. This follows from a Taylor series expansion of g(Xn) and Slutsky’s theorem.

A common application of Theorem1.46arises from the simple corollary for the case whenXin expression (1.193) has the multivariate normal distribution Nk(0, Σ) and∇g(b)6= 0:

an(g(Xn)−g(bn))→d Y, (1.195) where Y Nk(0,(∇g(b))TΣ∇g(b)).

One reason limit theorems such as Theorem1.46are important is that they can provide approximations useful in statistical inference. For example, we often get the convergence of expression (1.193) from the central limit theorem, and then the convergence of the sequence {g(Xn)} provides a method for determining approximate confidence sets using the normal distribution, so long as g(b) 6= 0. This method in asymptotic inference is called thedelta method, and is illustrated in Example1.25below. It is particularly applicable when the asymptotic distribution is normal.

The Case of ∇g(b) = 0

Suppose∇g(b) = 0 in equation (1.194). In this case the convergence in distri- bution is to a degenerate random variable, which may not be very useful. If, however, Hg(b)6= 0 (where Hg is the Hessian ofg), then we can use a second order the Taylor series expansion and get something useful:

2a2n(g(Xn)−g(bn)) d

→XTHg(b)X, (1.196) where we are using the notation and assuming the conditions of Theorem1.46. Note that whilean(g(Xn)−g(bn)) may have a degenerate limiting distribution at 0,a2

n(g(Xn)−g(bn)) may have a nondegenerate distribution. (Recalling that limn→∞an =∞, we see that this is plausible.) Equation (1.196) allows us also to get the asymptotic covariance for the pairs of individual elements ofXn.

Use of expression (1.196) is called a second order delta method, and is illustrated in Example1.25.

Example 1.25 an asymptotic distribution in a Bernoulli family Consider the Bernoulli family of distributions with parameterπ. The variance of a random variable distributed as Bernoulli(π) is g(π) = π(1π). Now, suppose X1, X2, . . .iid∼Bernoulli(π). Since E(Xn) = π, we may be interested in the distribution ofTn =g(Xn) =Xn(1−Xn).

From the central limit theorem (Theorem1.38), √

n(Xn−π)→N(0, π(1−π)), (1.197) and so if π 6= 1/2, g0(π) 6= 0, we can use the delta method from expres-

sion (1.194) to get √

n(Tn−g(π))→N(0, π(1−π)(1−2π)2). (1.198) Ifπ= 1/2,g0(π) = 0 and this is a degenerate distribution, so we cannot

use the delta method. Let’s use expression (1.196). The Hessian is particularly simple.

First, we note that in this case, the CLT yields√n(X−1/2)→N 0,14. Hence, if we scale and square, we get 4n(X−1

2)2 d→χ21, or 4n(Tn−g(π))→d χ21.

We can summarize the previous discussion and the special results of Ex- ample1.25as follows (assuming all of the conditions on the objects involved),

√ n(Tn−bn)→N(0, σ2) g0(b) = 0 g00(b)6= 0   =⇒2n (g(Tn)−g(bn))2 σ2g00(b) d →χ21. (1.199)

Higher Order Expansions

Suppose the second derivatives ofg(b) are zero. We can easily extend this to higher order Taylor expansions in Theorem 1.47 below. (Note that because higher order Taylor expansions of vector expressions can become quite messy, in Theorem1.47we useY = (Y1, . . . , Yk) in place ofXas the limiting random variable.)

Theorem 1.47

Let Y and {Xn} be random variables (k-vectors) such that an(Xn−bn)

d →Y,

wherebnis a constant sequence anda1, a2, . . .is a sequence of constant scalars

such thatlimn→∞an=∞. Now letgbe a Borel function fromIRk toIRwhose mth order partial derivatives exist and are continuous in a neighborhood ofb

n,

and whosejth, for 1jm1, order partial derivatives vanish atb. Then m!amn(g(Xn)−g(bn)) d → k X i1=1 · · · k X im=1 ∂mg ∂xi1· · ·∂xim x=b Yi1· · ·Yim. (1.200)

Expansion of Statistical Functions

*** refer to functional derivatives, Sections0.1.13and 0.1.13.

Variance Stabilizing Transformations

The fact that the variance in the asymptotic distribution in expression (1.198) depends on π may complicate our study of Tn and its relationship to π. Of course, this dependence results initially from the variance π(1−π) in the asymptotic distribution in expression (1.197). If g(π) were chosen so that

(g(π)0)2= (π(1π)−1), the variance in an expression similar to (1.198) would be constant (in fact, it would be 1).

Instead of g(π) =π(1−π) as in Example 1.25, we can use a solution to the differential equation

g0(π) =π(1π)−1/2.

One solution is g(t) = 2 arcsin(√t), and following the same procedure in Example 1.25 but using this function for the transformations, we have 2√n arcsin(√Xn)−arcsin(√π)→d N(0,1).

A transformation such as this is called avariance stabilizing transformation

for obvious reasons.

Example 1.26 variance stabilizing transformation in a normal fam- ily

Consider the normal family of distributions with known mean 0 and variance σ2, and supposeX

1, X2, . . .iid∼N(0, σ2). Since E(Xi2) =σ2, we may be inter- ested in the distribution ofTn=PXi2/n. We note that V(Xi2) = 2σ4, hence, the central limit theorem gives

n 1 n n X i=1 Xi2−σ2 ! →N(0,2σ4).

Following the ideas above, we seek a transformation g(σ2) such that (g02))2σ4 is constant wrt σ2. A solution to the differential equation that expresses this relationship isg(t) = log(t), and as above, we have

√ n log 1 n n X i=1 Xi2 ! −log σ2 ! →N(0,2). (1.201)

Order Statistics and Quantiles

The asymptotic distributions of order statisticsXkn:nare often of interest. The

asymptotic properties of “central” order statistics are different from those of “extreme” order statistics.

A sequence of central order statistics{X(k:n)}is one such that for givenπ∈ ]0,1[,k/n→πasn→ ∞. (Notice thatkdepends onn, but we will generally not use the notationkn.) As we suggested on page64, the expected value of the kthorder statistic in a sample of sizen, if it exists, should be approximately the same as thek/nquantile of the underlying distribution. Under mild regularity conditions, a sequence of asymptotic central order statistics can be shown to converge in expectation toxπ, theπquantile.

Sample quantiles (defined on page 64) are the ordinary quantiles in the sense of equation (1.13) of the discrete distribution defined by the sample, X1, . . . , Xn, which has CDFFn(x), the ECDF, as defined in equation (1.34); that is, theπsample quantile is

xπ=Fn−1(π). (1.202)

Properties of quantiles, of course, are different for discrete and continuous distributions. In the following, for 0 < π < 1 we will assume that F(xπ) is twice differentiable in some neighborhood of xπ and F00 is bounded and F0(xπ)>0 in that neighborhood. Denote F0(x) asf(x), and letFn(x) be the

ECDF. Now, write the kth order statistic as X(k:n)=xπ−

Fn(xπ)−π f(xπ)

+Rn(π). (1.203) This is called the Bahadur representation, afterBahadur(1966), who showed thatRn(π)→0 asn→ ∞.Kiefer(1967) determined the exact order ofRn(π), so equation (1.203) is sometimes called the Bahadur-Kiefer representation. The Bahadur representation is useful in studying asymptotic properties of central order statistics.

There is some indeterminacy in relating order statistics to quantiles. In the Bahadur representation, for example, the details are slightly different if nπ happens to be an integer. (The results are the same, however.) Consider a slightly different formulation for a set of morder statistics. The following result is due toGhosh(1971).

Theorem 1.48

Let X1, . . . , Xn be iid random variables with PDFf. Fork=n1, . . . , nm≤n,

letλk∈]0,1[be such thatnk=dnλke+1. Now suppose0< λ1<· · ·< λm<1

and for each k, f(xλk)>0. Then the asymptotic distribution of the random

m-vector

n1/2(X(n1:n)−xλ1), . . . , n

1/2(X

(nm:n)−xλm)

ism-variate normal with mean of 0, and covariance matrix whosei, jelement

is

λi(1−λj) f(xλi)f(xλj)

.

For a proof of this theorem, see David and Nagaraja(2003).

A sequence of extreme order statistics{X(k:n)}is one such thatk/n→0 or k/n→1 asn→ ∞. Sequences of extreme order statistics from a distribution with bounded support generally converge to a degenerate distribution, while those from a distribution with unbounded support do not have a meaningful distribution unless the sequence is normalized in some way. We will consider asymptotic distributions of extreme order statistics in Section 1.4.3.

We now consider some examples of sequences of order statistics. In Ex- amples 1.27 and 1.28 below, we obtain degenerate distributions unless we introduce a normalizing factor. In Example1.29, it is necessary to introduce a sequence of constant shifts.

Example 1.27 asymptotic distribution of min or max order statis- tics from U(0,1)

Suppose X1, . . . , Xn are iid U(0,1). The CDFs of the min and max, X(1:n) andX(n:n), are easy to work out. Forx∈[0,1],

FX(1:n)(x) = 1−Pr(X1> x, . . . , Xn> x)

= 1−(1−x)n and

FX(n:n)(x) =x

n.

Notice that these are beta distributions, as we saw in Example1.17.

Both of these extreme order statistics have degenerate distributions. For X(1:n), we haveX(1:n)→d 0 and E X(1:n)= 1 n+ 1 and so lim n→∞E X(1:n) = 0. This suggests the normalizationnX(1:n). We have

Pr nX(1:n)≤x = 1− 1x n n →1e−x x >0. (1.204) This is the CDF of a standard exponential distribution. The distribution of nX(1:n)is more interesting than that ofX(1:n).

ForX(n:n), we haveX(n:n) d →1 and E X(n:n)= n n+ 1 and so lim n→∞E X(n:n) = 1,

and there is no normalization to yield a nondegenerate distribution.

Now consider the asymptotic distribution of central order statistics from U(0,1).

Example 1.28 asymptotic distribution of a central order statistic fromU(0,1)

Let X(k:n) be the kth order statistic from a random sample of size n from U(0,1) and let

Y =nX(k:n).

Using Equation (1.138) with the CDF of U(0,1), we have fY(y) = n k y n k−1 1 y n n−k I[0,1](y). Observing that lim n→∞ n k = n k−1 k! , we have for fixed k,

lim n→∞fY(y) = 1 Γ(k)y k−1e−yI [0,∞[(y), (1.205) that is, the limiting distribution of{nX(k:n)}is gamma with scale parameter 1 and shape parameter k. (Note, of course, k! = kΓ(k).) For finite values of n, the asymptotic distribution provides better approximations when k/n is relatively small. When k is large,n must be much larger in order for the asymptotic distribution to approximate the true distribution closely.

Ifk= 1, the PDF in equation (1.205) is the exponential distribution, as shown in Example1.27. Fork→n, however, we must apply a limit similar to what is done in equation (1.204).

While the min and max of the uniform distribution considered in Exam- ple 1.27are “extreme” values, the more interesting extremes are those from distributions with infinite support. In the next example, we consider an ex- treme value that has no bound. In such a case, in addition to any normaliza- tion, we must do a shift.

Example 1.29 extreme value distribution from an exponential dis- tribution

LetX(n:n)be the largest order statistic from a random sample of sizenfrom an exponential distribution with PDF e−xI

¯IR+(x) and let Y =X(n:n)−log(n). We have lim n→∞Pr(Y ≤y) = e −e−y (1.206) (Exercise1.65). The distribution with CDF given in equation (1.206) is called an extreme value distribution. There are two other classes of “extreme value distributions”, which we will discuss in Section1.4.3. The one in this example, which is the most common one, is called a type 1 extreme value distribution or a Gumbel distribution.