OTHER TOPICS IN THE SINGLE EQUATION LINEAR MODEL

(1)

MicroEconometría Avanzada

Ignacio Lobato (ITAM)

(2)

(3)

Estimated explanatory and instrumental variables

(see Wooldridge, chap 6)

Consider the linear model:

Y = β₀+Z0β+U, with E(U) =0,

where

Z=h(S,δ) with E_kZ_k2 <∞,

andh is a known function, δ an unknown element of the parameter

space ∆,S is a d 1 vector of observable variables.

There is also a k 1 vector of functions gsuch that,

X=g(S,λ), E_kX_k2 <∞ andE(XU) =0,

for some functiong and an unknown parameter vector λ2Λ.

(4)

Some remarks are in order:

1 Some elements _δ and _λare known, which means that some

components ofZ andXare observable.

2 Some elements of ZandX are identical.

(5)

EXAMPLE: Z= 0 B B B B @

Z(1)

δ1+Z(2)0δ2

X(1)

1 C C C C A

, δ= δ1 δ2 , X= 0 B @

X(1) X(2)

Φ Z(2) X0λ1

λ2 λ3

1 C A, λ=

2 4 λλ12

λ3 3 5,

with Φ the standard normal distribution.

(6)

We haveiid observations(Yi,Si),i =1, ...,n of(Y,S).

It is also assumed that there are estimators of δ andλ,δn and λn

respectively, such that:

n1/2(δn δ) =Op(1) and n1/2(λn λ) =Op(1).

Assuming that the order and rank conditions are satis…ed and, hence,

βis identi…ed, the IV estimator is:

βIV_n = En ~ZnX~

0

n En X~n~X

0

n 1

En ~Xn~Z

0

n 1

En ~Zn~X

0

n En ~XnX~

0

n 1

En ~XnY

with ~Zni =h(Si,δn),X~ni=g(Si,λn),and for any vectora,

(7)

ICONSISTENCY : Assume thatZ andXare di¤erentiable in a neighborhood of δ and λ,respectively,

and there exist functions h˙ :Rd _!_R+ _and_g_˙ _:_Rd _!_R+ _{such that:}

sup

d₂∆krδ

h(S,d)_k h˙(S) with Ehh˙(S)2i<∞,

sup

l₂Λkrλ

g(S,l)_k g˙(S) withEhg˙(S)2i<∞,

with

rbf (b¯) = ∂

∂b0f (b) _b₌b¯

.

There may be some rows of zeroes in _rδh(S,d) andrλh(S,d).

(8)

Notice that,

En ~ZnX~n0 = En ZX0 +En ~Zn Z X~n X

0

+En ~Zn Z X +En Z ~Xn X

0

(9)

There exists δ¯(njl): ¯δ

(jl)

n δ kδn δk,and

¯

λn(jl) : λ

(jl)

n λ kλn λksuch that,

En ~Zn Z X~n X

0

jl

= En rδh S,δ¯ (jl)

n (δn δ) (λn λ)0rλg S,λ¯ (jl)

n

0

jl

En h˙ (S)g˙ (S) kδn δk kλn λk = Op(1) Op n 1/2 Op n 1/2

= Op n 1 alll,j =1, ...,k,

provided that h˙(S) g˙ (S) satis…es aLLN.

(10)

Also,

En ~Zn Z X En h˙(S)kXk kδn δk+Op n 1/2

= Op(1) Op n 1/2 +Op n 1/2 .

En Z ~Xn X

0

En(kZkg˙ (S)) kλn λk+Op n 1/2

= Op(1) Op n 1/2 +Op n 1/2 .

Therefore, after applying the LLN,

En ~Zn~X

0

(11)

Likewise,

En ~XnX~

0

n = E XX0 +op(1)

En ~XnY = E(XY) +op(1)

Hence,

βIV

n =

h

E ZX0 E XX0 1E XZ0 +op(1) i 1

h

E ZX0 E XX0 1E(XY) +op(1)i = β+op(1)

(12)

IASYMPTOTIC NORMALITY :Assume that hand gare twice di¤erentiable in a neighborhood ofδ andλ,respectively,

and there exist functions h¨ :Rd _!R+ andg¨ :Rd _!R+ such that:

sup

d2∆krδδh(S,d)k

¨

h(S) with Ehh¨(S)2i<∞,

sup

l₂Λkrλλ

g(S,l)_k g¨(S) with Ehg¨(S)2i<∞,

with

rbbf (b¯) =

∂

∂b∂b0f (b) _b=¯b

(13)

Notice that:

n1/2 βIV

n β = En

~

Z_nX~0_n En X~n~X0n 1

En X~n~Z0n 1

En ~Zn~X

0

n En ~XnX~

0

n 1

n1/2En ~Xn Z ~Zn

0

β+n1/2En X~nU .

First,

n1/2En ~Xn Z ~Zn

0

β=

= n1/2En X Z ~Zn

0

β

| {z }

=_En₍Xn1/2₍_δ

n δ)0rδh(S,δ)0)β+Op(n1/2k(δn δ)_k2)

+n1/2En ~Zn Z X~n X

0

β

| {z }

=Op(n 1/2₎

=En Xn1/2(δn δ)0rδh(S,δ)0 β+Op n

1/2

(14)

Now, sincevec(ABC) = (C0 A)vec(B)and by the LLN,

En Xn1/2(δn δ)0rδh(S,δ)0 β

= En h

β0rδh(S,δ) X

i

n1/2(δn δ)

=Eh β0rδh(S,δ) X

i

(15)

REMARK: When the coe¢ cients βcorresponding to the estimated explanatory variables are zero,

En X~n Z ~Zn

0

β=0

and

Eh β0rδh(S,δ) X

i

=0

Also,

n1/2En ~XnU = n1/2En(XU) +n1/2En ~Xn X U

= n1/2En(XU) +E Urλg(S,λ)0 n

1/2_(λ n λ)

+Op n1/2_kλn λk2

(16)

REMARK: In many instances, E U_r_λg(S,λ) =0,e.g.

g X( )_,

λ ,whereX( )are observable exogenous variables, and

E U_jX( ) ₌₀_.

Therefore,

n1/2 βIV

n β =

h

E ZX0 E XX0 1E XZ0 +op(1) i 1

h

E ZX0 E XX0 1+op(1)i

n

n1/2En(XU) +E h

β0rδh(S,δ) X

i

n1/2(δn δ)

+E U_rλg(S,λ) n

1/2_(λ n λ)

o

(17)

Assuming that,

n1/2 0 @ En(X

U) (δn δ)

(λn λ) 1 A d

!N(0,G),

then,

n1/2 βIV_n β !d B 1E ZX0 E XX0 1Nk+1 0,M0GM

(18)

with

B=E ZX0 E XX0 1E XZ0 and M=

0 B @

Ik+1

Eh β0rδh(S,δ) X

i₀

E U_r_λg(S,λ) 0

(19)

Thus, βIV

n is CAN with

AsyVar βIV

n =

1

nB

1_CB 1₀_,

and

C=E ZX0 E XX0 1M0GME XX0 1E XZ0 ,

which can be estimated from data.

(20)

REMARK: In many circumstances,δn andλn are estimators

computed from other sample than (Yi,Si),i =1, ...,n. Samples may

have di¤erent sample sizes, which introduces even more trouble. In practice, it seems reasonable, cheap and easy to use the bootstrap for approximating the distribution of the IV estimator.

REMARK: The discussion is perfectly valid in the OLS case, just apply the results with Z=X.

REMARK: If both

Eh β0rδh(S,δ) X

i

= 0and E U_r_λg(S,λ) =0,

M =

0 @

Ik+1

0 0

1 A

and, as usual,

E 0 _E 0 1

(21)

IMPORTANT CONCLUSION:

If both, the coe…cients βcorreponding to the generated Xare zero

and _r_λg(S,λ)andU are uncorrelated, then the asymptotic

distribution of βIV is identical to the asymptotic distribution of the

corresponding IV estimator with observable variables, i.e. with known

parameters.

The same conclusion is applicable to the OLS estimator, but now we require that _r_λh(S,λ) andU are uncorrelated (Z=X).

Otherwise, when we have generated explanatory o instrumental variables, the asymptotic distribution changes.

(22)

Testing homoskedasticity

Consider the linear model:

Y =β₀+Z0β+U with E(U) =0.

Using OLS,we want to test the assumption,

H0 :E ZZ0U2 =E ZZ0 E U2 ,

versus

H1:E ZZ0U2 6=E ZZ0 E U2 .

Half of the restrictions are redundant inH0,which can alternatively

written as,

(23)

Let Zbe the k(k+1)/2-vector with the di¤erent components of

ZZ0 and the linear proyector,

L U2 1,Z =δ0+Z0δ.

Then,H0 andH1 can alternatively be written as,

H0:δ =0versus H1 :δ6= 0.

A suitable estimator of δ is:

δn =Vn Z

1

Cn Z,Un2 ,

whereUni =Yi β_0n Z0_iβ_n are the OLS residuals.

(24)

Using standard arguments, under suitable moment conditions,

Cn Z,Un2 Cn Z,U2 =op n 1/2 ,

and

δn =δ+Vn Z 1

Cn Z,U2 δ0 Z0δ +op n 1/2 .

Therefore, by the CLT, δn is CAN with

AsyVar(δn) =

1

nV Z

1

DV Z 1,

with

D=E Z E Z Z E Z 0 U2 δ0 Z0δ

2

whose corresponding estimator is,

\

AsyVar(δn) =

1

nVn Z 1

DnVn Z 1

,

(25)

We can test the null by means of the Wald test statistic:

Wn =δ0nAsyVar\ (δn) 1δn,

and under H0,

Wn d

!χ2k(k+1)

2

.

Alternatively, we could use the asymptotically equivalentLM statistic:

LMn =nR_n2,

whereR2

n is the coe¢ cient of determination in the model:

U_ni2 = δ0+Zi0δ+error i =1, ...,n.

(26)

In practice, many people prefer to test the necessary condition forH0 :

β0C U2,ZZ0 β

| {z }

=C₍U2_,_L₍_Y_j₁_,_Z₎2₎

= C U2,Z 0β

| {z }

=C(U2_,_L₍_Y

j1,Z)) =0.

IfRˆ_n2 is the coe¢ cient of determination in the model:

U_ni2 =γ₀+Ln(Yj1,Z=Zi)γ₁+Ln(Yj1,Z=Zi)2γ₂+error

for i =1, ...,n,

the test statistic is theLM test:

d

LMn =nRˆ_n2.

Under H0,

d LMn

d

(27)

Finite sample asymptotics

Suppose that we have a CAN estimator θn of θ0, θ0 scalar for

presentation purposes. When θn is CAN, we have, under suitable

regularity conditions, that a Berry-Essen bound is satis…ed, i.e.

sup

x

Pr θn θ0

AsyVar(θn)1/2 x

!

Φ(x) Cn 1/2 eachn =1,2, ..,

for some suitable constantC,andΦ is the standard normal

distribution, i.e.,

Φ(w) =

Z w

∞φ(w¯)dw¯ with φ(w) =

1

(2π)1/2 exp

1 2w

2 _.

For instance, ifWi,i =1, ..,n are iid with mean µand variance σ2,

sup

x

Pr n1/2(Wn¯ µ) x Φ(x) 33

4

EjW1 µj3

σ3 n

1/2_,

(28)

I(Isidro)Edgeworth Expansions: In many cases of practical

relevance, for a given sample size n,if θn is CAN, the distribution of n1/2(θn θ0)can be expanded as a power series inn 1/2,known as

Edgeworth Expansion, for each n=1,2, ....

Pr θn θ0

AsyVar(θn)1/2 x

!

=Φ(x) + 1

n1/2p1(x)φ(x)

+1

np2(x)φ(x)...+

1

nj/2pj(x)φ(x) +...

whereφ(x)is the standard normal density, and the p_j0s are

polynomials, of degree at most 3j 1,with coe¢ cients depending on

cumulants of θn θ0.

(29)

ICornish-Fisher Expansions: De…ne x_α as,

Pr θn θ0

AsyVar(θn)1/2 x_α

!

=α,

the Cornish-Fisher expansion is the inversion of the Edgeworth expansion, and it has the form,

x_α =z_α+ 1

n1/2p11(zα) +

1

np21(zα) +...+

1

nj/2pj1(zα) +...

where thep_j10 s are polynomial derivable from thep_j0s,andzα solves

Φ(zα) =α.

See, e.g., Hall (1992),The Bootstrap and Edgeworth Expansions,

Springer.

(30)

Bootstrap approximations

A bootstrap approximation often provides a more accurate approximation to the exact distribution of statistics than the asymptotic one.

Again, suppose that we have an estimator θn = θ(FWn)of

θ0 =θ(FW).

Let _Wn =fW1, ...,Wngbe a random sample of FW.Also, we have an estimator ofFW,FˆWn say, which can be the sample distribution function, FWn,a parametric estimator, or even a smooth estimator. Let _W_n =_fW₁, ...,W_n_gbe a random sample of FˆWn,e.g. if

ˆ

FWn =FWn,Wn areiid observations from a distribution assigning a

(31)

We may be interested in estimating E[R(_Wn,FW)],for a particular functionR,which represent di¤erent distributional features of θn,say e.g.

1

R(Wn,FW) = (θn θ(FW))!Bias(θn) =E(θn θ(FW))

2

R(_Wn,FW) = (θn E(θn))2 !V(θn) =E h

(θn E(θn))2 i

3

R(_Wn,FW) =1 θn θ(F_W) \

AsyVar(θn) w

!F_θ_n _θ₍F_W) \

AsyVar(θn)

(w) =E 1 _θ_n _θ₍_F

W)

\

AsyVar(θn) w

!

.

(32)

The bootstrap analog ofθn,θ_n,is the estimator computed with W_n,

For instance,θn =θ FˆWn ,whereFˆWn(w) =n 1∑ni=11_fW_i w_g is

the empirical distribution based on _W_n =_fW₁, ...,W_n_g,which are

iid variables with distribution FˆWn.

Then we have that the bootstrap analog ofE[R(Wn,FW)]is,

E [R(_W_n,FWn)] =E[R(Wn,FWn)j Wn],

(33)

Some approximations of distributional features of θn are:

1

d

Bias(θn) =E (θn) θ(FWn)

2

b

V(θn) =E h

(θn E (θn)) 2i

.

3 _{The bootstrap estimate of the quantil}

(34)

Frequently,E (R(Wn,FWn))is very expensive to compute,

but it can be approximated very accurately in practice generating B

resamples:

Wn(b) = n

W₁(b), ...Wn(b) o

, b =1, ...,B.

Then, we approximate E (R(_Wn,FWn)), as accurately as desired, by

EB(R(Wn,FWn)) = 1

B B

∑

b=1

(35)

Example. Suppose that the object of interest is the bias

R(_Wn,FW) = (θn θ(FW))!Bias(θn) =E(θn θ(FW)) we are going to estimate it by

EB(R(Wn,FWn)) = 1

B B

∑

b=1

R _Wn(b) ,FWn

where we use as FˆWn the sample distribution function, FWn,(this is also called nonparametric boostrap)

This means doing this:

1. GenerateB independent random samples from FWn

2. For each of them estimate θ,call each estimate: θ_b,b=1, ...B

3. Then, the bootstrap estimate of the bias is just

1

B B

∑

b=1

θ_b θn

Note that this is the analogue in the boostrap world of

Eθn θ

(36)

Examples in Regression:

INaive: _W_n is a random sample with replacement of_Wn,i.e. a

random sample of FWn.

IWild: Take Z1, ...,Zn …xed, then Wi = (Yi ,Zi) and,

Y_i =β_0n+Zi0βn+Uni Vi,

whereVi areiid variables with mean zero and variance 1.

(37)

Improving accuracy of asymptotic approximations by bootstrap: Supposeη_n =E [R(Wn,FW)]is an asymptotically pivotal statistic

(its distribution does not depend on any unknown feature ofFW,and

η_n =R(Wn,FWn)its bootstrap analog. Then we have:

Pr(η_n x) =Φ(x) +n 1/2q(x)φ(x) +O n 1 ,

whereq is an even quadratic polynomial.

Similarly:

Pr(η_n xj Wn) =Φ(x) +n 1/2qˆn(x)φ(x) +Op n 1 ,

whereqnˆ (x) is obtained by replacing unknowns inq(x)by

n1/2 consistent estimators and, as a result, ˆ

qn(x) q(x) =Op n 1/2 .

(38)

Therefore,

Pr(η_n xj Wn) Pr(ηn x)

= n 1/2 0 B B

@q_|(x) _{zqnˆ (x)_} Op₍n 1/2₎

1 C C

Aφ(x) +Op n 1

= Op n 1 .

So, the boostrap approximation is asymptotically better than the typical normal approximation that is

Pr(η_n x) Φ(x) =O n 1/2

(39)

The previous discussion was for random sample, in time series you typically need to create some blocks to preserve the time dependence

-Moving block bootstrap (MBB)

-Stationary boootstrap

-Subsampling

-sometimes use wild