MicroEconometría Avanzada
Ignacio Lobato (ITAM)
Estimated explanatory and instrumental variables
(see Wooldridge, chap 6)
Consider the linear model:
Y = β0+Z0β+U, with E(U) =0,
where
Z=h(S,δ) with EkZk2 <∞,
andh is a known function, δ an unknown element of the parameter
space ∆,S is a d 1 vector of observable variables.
There is also a k 1 vector of functions gsuch that,
X=g(S,λ), EkXk2 <∞ andE(XU) =0,
for some functiong and an unknown parameter vector λ2Λ.
Some remarks are in order:
1 Some elements δ and λare known, which means that some
components ofZ andXare observable.
2 Some elements of ZandX are identical.
EXAMPLE: Z= 0 B B B B @
Z(1)
δ1+Z(2)0δ2
X(1)
1 C C C C A
, δ= δ1 δ2 , X= 0 B @
X(1) X(2)
Φ Z(2) X0λ1
λ2 λ3
1 C A, λ=
2 4 λλ12
λ3 3 5,
with Φ the standard normal distribution.
We haveiid observations(Yi,Si),i =1, ...,n of(Y,S).
It is also assumed that there are estimators of δ andλ,δn and λn
respectively, such that:
n1/2(δn δ) =Op(1) and n1/2(λn λ) =Op(1).
Assuming that the order and rank conditions are satis…ed and, hence,
βis identi…ed, the IV estimator is:
βIVn = En ~ZnX~
0
n En X~n~X
0
n 1
En ~Xn~Z
0
n 1
En ~Zn~X
0
n En ~XnX~
0
n 1
En ~XnY
with ~Zni =h(Si,δn),X~ni=g(Si,λn),and for any vectora,
ICONSISTENCY : Assume thatZ andXare di¤erentiable in a neighborhood of δ and λ,respectively,
and there exist functions h˙ :Rd !R+ andg˙ :Rd !R+ such that:
sup
d2∆krδ
h(S,d)k h˙(S) with Ehh˙(S)2i<∞,
sup
l2Λkrλ
g(S,l)k g˙(S) withEhg˙(S)2i<∞,
with
rbf (b¯) = ∂
∂b0f (b) b=b¯
.
There may be some rows of zeroes in rδh(S,d) andrλh(S,d).
Notice that,
En ~ZnX~n0 = En ZX0 +En ~Zn Z X~n X
0
+En ~Zn Z X +En Z ~Xn X
0
There exists δ¯(njl): ¯δ
(jl)
n δ kδn δk,and
¯
λn(jl) : λ
(jl)
n λ kλn λksuch that,
En ~Zn Z X~n X
0
jl
= En rδh S,δ¯ (jl)
n (δn δ) (λn λ)0rλg S,λ¯ (jl)
n
0
jl
En h˙ (S)g˙ (S) kδn δk kλn λk = Op(1) Op n 1/2 Op n 1/2
= Op n 1 alll,j =1, ...,k,
provided that h˙(S) g˙ (S) satis…es aLLN.
Also,
En ~Zn Z X En h˙(S)kXk kδn δk+Op n 1/2
= Op(1) Op n 1/2 +Op n 1/2 .
En Z ~Xn X
0
En(kZkg˙ (S)) kλn λk+Op n 1/2
= Op(1) Op n 1/2 +Op n 1/2 .
Therefore, after applying the LLN,
En ~Zn~X
0
Likewise,
En ~XnX~
0
n = E XX0 +op(1)
En ~XnY = E(XY) +op(1)
Hence,
βIV
n =
h
E ZX0 E XX0 1E XZ0 +op(1) i 1
h
E ZX0 E XX0 1E(XY) +op(1)i = β+op(1)
IASYMPTOTIC NORMALITY :Assume that hand gare twice di¤erentiable in a neighborhood ofδ andλ,respectively,
and there exist functions h¨ :Rd !R+ andg¨ :Rd !R+ such that:
sup
d2∆krδδh(S,d)k
¨
h(S) with Ehh¨(S)2i<∞,
sup
l2Λkrλλ
g(S,l)k g¨(S) with Ehg¨(S)2i<∞,
with
rbbf (b¯) =
∂
∂b∂b0f (b) b=¯b
Notice that:
n1/2 βIV
n β = En
~
ZnX~0n En X~n~X0n 1
En X~n~Z0n 1
En ~Zn~X
0
n En ~XnX~
0
n 1
n1/2En ~Xn Z ~Zn
0
β+n1/2En X~nU .
First,
n1/2En ~Xn Z ~Zn
0
β=
= n1/2En X Z ~Zn
0
β
| {z }
=En(Xn1/2(δ
n δ)0rδh(S,δ)0)β+Op(n1/2k(δn δ)k2)
+n1/2En ~Zn Z X~n X
0
β
| {z }
=Op(n 1/2)
=En Xn1/2(δn δ)0rδh(S,δ)0 β+Op n
1/2
Now, sincevec(ABC) = (C0 A)vec(B)and by the LLN,
En Xn1/2(δn δ)0rδh(S,δ)0 β
= En h
β0rδh(S,δ) X
i
n1/2(δn δ)
=Eh β0rδh(S,δ) X
i
REMARK: When the coe¢ cients βcorresponding to the estimated explanatory variables are zero,
En X~n Z ~Zn
0
β=0
and
Eh β0rδh(S,δ) X
i
=0
Also,
n1/2En ~XnU = n1/2En(XU) +n1/2En ~Xn X U
= n1/2En(XU) +E Urλg(S,λ)0 n
1/2(λ n λ)
+Op n1/2kλn λk2
REMARK: In many instances, E Urλg(S,λ) =0,e.g.
g X( ),
λ ,whereX( )are observable exogenous variables, and
E UjX( ) =0.
Therefore,
n1/2 βIV
n β =
h
E ZX0 E XX0 1E XZ0 +op(1) i 1
h
E ZX0 E XX0 1+op(1)i
n
n1/2En(XU) +E h
β0rδh(S,δ) X
i
n1/2(δn δ)
+E Urλg(S,λ) n
1/2(λ n λ)
o
Assuming that,
n1/2 0 @ En(X
U) (δn δ)
(λn λ) 1 A d
!N(0,G),
then,
n1/2 βIVn β !d B 1E ZX0 E XX0 1Nk+1 0,M0GM
with
B=E ZX0 E XX0 1E XZ0 and M=
0 B @
Ik+1
Eh β0rδh(S,δ) X
i0
E Urλg(S,λ) 0
Thus, βIV
n is CAN with
AsyVar βIV
n =
1
nB
1CB 10,
and
C=E ZX0 E XX0 1M0GME XX0 1E XZ0 ,
which can be estimated from data.
REMARK: In many circumstances,δn andλn are estimators
computed from other sample than (Yi,Si),i =1, ...,n. Samples may
have di¤erent sample sizes, which introduces even more trouble. In practice, it seems reasonable, cheap and easy to use the bootstrap for approximating the distribution of the IV estimator.
REMARK: The discussion is perfectly valid in the OLS case, just apply the results with Z=X.
REMARK: If both
Eh β0rδh(S,δ) X
i
= 0and E Urλg(S,λ) =0,
M =
0 @
Ik+1
0 0
1 A
and, as usual,
E 0 E 0 1
E 0 E 0 1
IMPORTANT CONCLUSION:
If both, the coe…cients βcorreponding to the generated Xare zero
and rλg(S,λ)andU are uncorrelated, then the asymptotic
distribution of βIV is identical to the asymptotic distribution of the
corresponding IV estimator with observable variables, i.e. with known
parameters.
The same conclusion is applicable to the OLS estimator, but now we require that rλh(S,λ) andU are uncorrelated (Z=X).
Otherwise, when we have generated explanatory o instrumental variables, the asymptotic distribution changes.
Testing homoskedasticity
Consider the linear model:
Y =β0+Z0β+U with E(U) =0.
Using OLS,we want to test the assumption,
H0 :E ZZ0U2 =E ZZ0 E U2 ,
versus
H1:E ZZ0U2 6=E ZZ0 E U2 .
Half of the restrictions are redundant inH0,which can alternatively
written as,
Let Zbe the k(k+1)/2-vector with the di¤erent components of
ZZ0 and the linear proyector,
L U2 1,Z =δ0+Z0δ.
Then,H0 andH1 can alternatively be written as,
H0:δ =0versus H1 :δ6= 0.
A suitable estimator of δ is:
δn =Vn Z
1
Cn Z,Un2 ,
whereUni =Yi β0n Z0iβn are the OLS residuals.
Using standard arguments, under suitable moment conditions,
Cn Z,Un2 Cn Z,U2 =op n 1/2 ,
and
δn =δ+Vn Z 1
Cn Z,U2 δ0 Z0δ +op n 1/2 .
Therefore, by the CLT, δn is CAN with
AsyVar(δn) =
1
nV Z
1
DV Z 1,
with
D=E Z E Z Z E Z 0 U2 δ0 Z0δ
2
whose corresponding estimator is,
\
AsyVar(δn) =
1
nVn Z 1
DnVn Z 1
,
We can test the null by means of the Wald test statistic:
Wn =δ0nAsyVar\ (δn) 1δn,
and under H0,
Wn d
!χ2k(k+1)
2
.
Alternatively, we could use the asymptotically equivalentLM statistic:
LMn =nRn2,
whereR2
n is the coe¢ cient of determination in the model:
Uni2 = δ0+Zi0δ+error i =1, ...,n.
In practice, many people prefer to test the necessary condition forH0 :
β0C U2,ZZ0 β
| {z }
=C(U2,L(Yj1,Z)2)
= C U2,Z 0β
| {z }
=C(U2,L(Y
j1,Z)) =0.
IfRˆn2 is the coe¢ cient of determination in the model:
Uni2 =γ0+Ln(Yj1,Z=Zi)γ1+Ln(Yj1,Z=Zi)2γ2+error
for i =1, ...,n,
the test statistic is theLM test:
d
LMn =nRˆn2.
Under H0,
d LMn
d
Finite sample asymptotics
Suppose that we have a CAN estimator θn of θ0, θ0 scalar for
presentation purposes. When θn is CAN, we have, under suitable
regularity conditions, that a Berry-Essen bound is satis…ed, i.e.
sup
x
Pr θn θ0
AsyVar(θn)1/2 x
!
Φ(x) Cn 1/2 eachn =1,2, ..,
for some suitable constantC,andΦ is the standard normal
distribution, i.e.,
Φ(w) =
Z w
∞φ(w¯)dw¯ with φ(w) =
1
(2π)1/2 exp
1 2w
2 .
For instance, ifWi,i =1, ..,n are iid with mean µand variance σ2,
sup
x
Pr n1/2(Wn¯ µ) x Φ(x) 33
4
EjW1 µj3
σ3 n
1/2,
I(Isidro)Edgeworth Expansions: In many cases of practical
relevance, for a given sample size n,if θn is CAN, the distribution of n1/2(θn θ0)can be expanded as a power series inn 1/2,known as
Edgeworth Expansion, for each n=1,2, ....
Pr θn θ0
AsyVar(θn)1/2 x
!
=Φ(x) + 1
n1/2p1(x)φ(x)
+1
np2(x)φ(x)...+
1
nj/2pj(x)φ(x) +...
whereφ(x)is the standard normal density, and the pj0s are
polynomials, of degree at most 3j 1,with coe¢ cients depending on
cumulants of θn θ0.
ICornish-Fisher Expansions: De…ne xα as,
Pr θn θ0
AsyVar(θn)1/2 xα
!
=α,
the Cornish-Fisher expansion is the inversion of the Edgeworth expansion, and it has the form,
xα =zα+ 1
n1/2p11(zα) +
1
np21(zα) +...+
1
nj/2pj1(zα) +...
where thepj10 s are polynomial derivable from thepj0s,andzα solves
Φ(zα) =α.
See, e.g., Hall (1992),The Bootstrap and Edgeworth Expansions,
Springer.
Bootstrap approximations
A bootstrap approximation often provides a more accurate approximation to the exact distribution of statistics than the asymptotic one.
Again, suppose that we have an estimator θn = θ(FWn)of
θ0 =θ(FW).
Let Wn =fW1, ...,Wngbe a random sample of FW.Also, we have an estimator ofFW,FˆWn say, which can be the sample distribution function, FWn,a parametric estimator, or even a smooth estimator. Let Wn =fW1, ...,Wngbe a random sample of FˆWn,e.g. if
ˆ
FWn =FWn,Wn areiid observations from a distribution assigning a
We may be interested in estimating E[R(Wn,FW)],for a particular functionR,which represent di¤erent distributional features of θn,say e.g.
1
R(Wn,FW) = (θn θ(FW))!Bias(θn) =E(θn θ(FW))
2
R(Wn,FW) = (θn E(θn))2 !V(θn) =E h
(θn E(θn))2 i
3
R(Wn,FW) =1 θn θ(FW) \
AsyVar(θn) w
!Fθn θ(FW) \
AsyVar(θn)
(w) =E 1 θn θ(F
W)
\
AsyVar(θn) w
!
.
The bootstrap analog ofθn,θn,is the estimator computed with Wn,
For instance,θn =θ FˆWn ,whereFˆWn(w) =n 1∑ni=11fWi wg is
the empirical distribution based on Wn =fW1, ...,Wng,which are
iid variables with distribution FˆWn.
Then we have that the bootstrap analog ofE[R(Wn,FW)]is,
E [R(Wn,FWn)] =E[R(Wn,FWn)j Wn],
Some approximations of distributional features of θn are:
1
d
Bias(θn) =E (θn) θ(FWn)
2
b
V(θn) =E h
(θn E (θn)) 2i
.
3 The bootstrap estimate of the quantil
Frequently,E (R(Wn,FWn))is very expensive to compute,
but it can be approximated very accurately in practice generating B
resamples:
Wn(b) = n
W1(b), ...Wn(b) o
, b =1, ...,B.
Then, we approximate E (R(Wn,FWn)), as accurately as desired, by
EB(R(Wn,FWn)) = 1
B B
∑
b=1Example. Suppose that the object of interest is the bias
R(Wn,FW) = (θn θ(FW))!Bias(θn) =E(θn θ(FW)) we are going to estimate it by
EB(R(Wn,FWn)) = 1
B B
∑
b=1R Wn(b) ,FWn
where we use as FˆWn the sample distribution function, FWn,(this is also called nonparametric boostrap)
This means doing this:
1. GenerateB independent random samples from FWn
2. For each of them estimate θ,call each estimate: θb,b=1, ...B
3. Then, the bootstrap estimate of the bias is just
1
B B
∑
b=1θb θn
Note that this is the analogue in the boostrap world of
Eθn θ
Examples in Regression:
INaive: Wn is a random sample with replacement ofWn,i.e. a
random sample of FWn.
IWild: Take Z1, ...,Zn …xed, then Wi = (Yi ,Zi) and,
Yi =β0n+Zi0βn+Uni Vi,
whereVi areiid variables with mean zero and variance 1.
Improving accuracy of asymptotic approximations by bootstrap: Supposeηn =E [R(Wn,FW)]is an asymptotically pivotal statistic
(its distribution does not depend on any unknown feature ofFW,and
ηn =R(Wn,FWn)its bootstrap analog. Then we have:
Pr(ηn x) =Φ(x) +n 1/2q(x)φ(x) +O n 1 ,
whereq is an even quadratic polynomial.
Similarly:
Pr(ηn xj Wn) =Φ(x) +n 1/2qˆn(x)φ(x) +Op n 1 ,
whereqnˆ (x) is obtained by replacing unknowns inq(x)by
n1/2 consistent estimators and, as a result, ˆ
qn(x) q(x) =Op n 1/2 .
Therefore,
Pr(ηn xj Wn) Pr(ηn x)
= n 1/2 0 B B
@q|(x) {zqnˆ (x)} Op(n 1/2)
1 C C
Aφ(x) +Op n 1
= Op n 1 .
So, the boostrap approximation is asymptotically better than the typical normal approximation that is
Pr(ηn x) Φ(x) =O n 1/2
The previous discussion was for random sample, in time series you typically need to create some blocks to preserve the time dependence
-Moving block bootstrap (MBB)
-Stationary boootstrap
-Subsampling
-sometimes use wild