Macro Econometría Avanzada
ESTIMATION IN THE LINEAR MODEL
Parameter Identi…cation
De…nition: For any parametric model, we say that the parameters are
identi…ed if, for a given r 1 vector of functions
q :Rp Θ!Rr
we have that
θ 2Θ Rr
is the only value in Θsuch that:
E[q(W,θ)] =0.
(We can also say that the vector of functions q identi…es the
Often, also there exists a functionQ such that,
θ=arg min
t2Θ E
[Q(W,t)].
Next we see a couple of examples where identi…cation is not straightforward
Example 1. Let the model beE(Y jX) =Xθ0, where
θ0=4,andX is a symmetric rv with EX4=EX6,-this happens for instance for the
N(0,0.2).
A method of moments estimator would be based in the sample analogues of the conditions
Eh Y Xθ 1i=Eh Y Xθ Xi=0 (1)
This is an ine¢ cient GMM estimator since the instruments 1 andX
are arbitrary.
Note that these two instruments do not identify θ0 since θ =6 also
Example 2. Let the model beE(Y jX) =θ20X +θ0X2 with
θ0 =5/4 andV(Y jX) constant. Suppose the researcher properly speci…es the model and chooses the optimal instrument
W0=2θ0X+X2.
In this case θ = 5/4 also veri…es
E[(Y θ2X θX2)W0] =0 whenX follows a N( 1,1).
W0 is not a feasible instrument because θ0 is unknown. If the
researcher had employedW =2θX+X2, then θ= 5/4 and also
One typical solution is to restrict the parametric space Θwhere θ
moves
other solution is to restrict the behavior of the conditioning variableX
EXAMPLES:
Quasi-Likelihood Function:
Let FW be the d.f. ofW
andgW :Rp Θ!R+ a parametric speci…cation of the density
function (or probability function) ofW,
which may not be equal to the true density (or probability function)
of W.
We can de…ne, under suitable regularity conditions:
θ=arg max
t2Θ E
[lngW(W,t)],
or as the solution to,
∂
∂θE[lngW(W,θ)] =0
Some motivation for the previous slide
Think about a r.v. X with pdff(X),
and now consider any alternative pdf, g(X)
-obviuosly with same
support-note that g(X)is a valid pdf, so that
Z
g(X)dX =1
and now consider
Elogg(X)
by Jensen
Elogh(X) logEh(X)
so that
Elogg(X)
f(X) logE
g(X)
f(X)
= log
Z g(X)
f(X)f(X)dX =log
Z
g(X)dX =0
that is
Elogg(X) Elogf(X)
that isElogg(X) is maximized when you chooseg =f
motivation-Linear Predictors:
Consider the linear predictorL(YjZ=z) =β0+βz,the
parameters are identi…ed as the solution of,
E Y β0 Z0β = 0
E Z Y β0 Z0β = 0
that is,
β0 = E(Y) E(Z)0βand β=V(Z) 1C(Y,Z),
provided that V(Z)is non-singular. We can also write,
β0
β = arg minb,b02ΘE
h
Y b0 Z0b 2i
LAD predictor and α th Quantile predictor:
The parameters of theLAD predictor are de…ned as:
β0
β =arg minb,b02ΘE Y b0 Z
0b
(or through the FOC),
and the parameters of theα th Quantile predictor are de…ned as,
β0
β =arg minb,b02ΘE Y b0 Z
0b + (2α 1) Y b
0 Z0b
In some cases models are not properly speci…ed and hence parameters are not identi…ed.
For instance in the linear model we need V(Z)to be non-singular, that is, if we specify
E(YjZ1,Z2) = β0+β1Z1+β2Z2+β3(Z1+Z2)
we can not identify the β0s
or if we specify
E(YjZ1,Z2) =β0+β1Z1+β2Z2β3
if the true β2 is zero then β3 is not identi…ed
These are examples of models that are wrongly speci…ed: there is no way of identifying the parameters.
We focus on properly speci…ed models, and the interesting issue is to
construct or to …nd the functionq or Q to be able to identify the
Sampling Distribution and Moment Estimators
Henceforth, we assume that we observe a random sampleW1, ...Wn
from the distribution of FW,i.e. W1, ...Wn are independent copies of W.
Recalling thatFW(w) =E1fW wg
The sample analog ofFW is,
FWn(w) =
1
n
n
∑
i=11fWi wg,
known as sample, or empirical, distribution function.
FWn is a random variable taking values in the space of functions with
jump discontinuities.
WhenW is continuous, it can be interpreted as the cumulative
distribution of a discrete uniform r.v. that takes values W1, ...Wn
For a generic functionm,
E[m(W)] =
Z
Rp+1m(w)FW(dw)
is naturally estimated by,
En[m(W)] = Z
Rp+1m(w)FWn(dw)
= 1
n
n
∑
i=1m(Wi),
which is known as sample, or empirical, expectation of m(W). In particular, if m(W) =WW0,
En WW0 =
1
n
n
∑
i=1EXAMPLES:
Sample or empirical mean: The empirical analog of µW =E(W)is:
Wn = En[W]
= 1
n
n
∑
i=1Sample or empirical variance and covariance matrix: The sample analog of V(W)is:
Vn(W) = En (W En(W)) (W En(W))0
= 1
n
n
∑
i=1Wi Wn Wi Wn 0.
The sample or empirical covariances are the elements outside the principal diagonal of Vn(W);i.e. the sample analog of the vector of
covariances C(Y,Z)is:
Cn(Y,Z) = En[(Y En(Y)) (Z En(Z))]
= 1
n
n
∑
i=1The Analog Principle: Method of Moments
Ifθ 2Θ is the only value such that:
E[q(W,θ)] =0,
the empirical version ofθ is θn,the solution to:
En[q(W,θn)] =0 !Z estimator (2)
(Z from zero) If also,
θ=arg min
t2Θ E
[Q(W,t)],
then
θn =arg min
t2Θ E
n[Q(W,t)] !M estimator. (3)
Notice that parameters can also be written, for a given function η as:
θ =η(FW)
and the estimator as,
θn =η(FWn).
This estimation approach, which consists of substituting FW byFWn
in expressions de…ning (identifying) parameters by means of moment restrictions
EXAMPLES:
Maximum Quasi-Likelihood Estimator:
θn =arg max
t2Θ E
n[lngW(W,t)],
or as the solution to,
∂
∂θEn[lngW(W,θn)]
| {z }
=0
Method of Moments and Ordinary Least Squares:
En Y β0n Z0βn = 0
En Z Y β0n Z0βn = 0
provided that Vn(Z) is non-singular; that is,
β0n =En(Y) En(Z)0βn and βn =Vn(Z) 1Cn(Y,Z),
We can also write,
β0n
βn = arg minb,b02ΘEn
h
Y b0 Z0b 2i
,
Empirical Linear Predictor:
Ln(Yj1,Z=z) = β0n+z0βn.
Remark: All the properties of L (see e.g. tarea 1) are inherited byLn
(see tarea 2).
An empirical version of a parametric regression function can also be obtained in a similar way.
LAD and Quantile Linear Regression Estimators
The parameters of theLAD predictor are estimated by:
β0n
βn =arg minb,b02ΘEn Y b0 Z
0b
and the parameters of theα th Quantile predictor are estimated by
β0n
βn =arg minb,b02ΘEn Y b0 Z
0b + (2α 1) Y b
When theα th quantile regression model is linear, β0n,β0n 0 are
the estimators of their parameters; i.e. the curve
QYjZ,α(z) = β0+z0β
is estimated by
Maximum Conditional Quasi-Likelihood Estimation
Consider the linear regression model:
Y = β0+Z0β+U with E(UjZ) =0 a.s.
Assuming a particular parametric conditional density function for U,
fUjZ say, depending on a set of parametersγ,
the parameter vector β0,β0 0 can be identi…ed in some occasions as
θ =
0 @ ββ0
γ
1
A= arg max
(b0,b0,g0)2Θ
E lnfUjZ Y b0 Z0b;g ,
In particular, if the true conditional density function belong to the linear exponential family of distributions,
(e.g. Normal, Poisson, Exponential, etc),
and the assumed conditional density also belongs to this family,
The sample analog ofθ is the maximum conditional quasi-likelihood
(MCQL) estimator:
θn =
0 @
β0n βn γn
1
A= arg max
(b0,b0,g0)2Θ
En lnfUjZ Y b0 Z0b;g .
The estimators of βare more e¢ cient than OLS when fUjZ is
correctly speci…ed, and the resulting estimator is known as maximum conditional likelihood (MCL) estimator.
Even whenfUjZ isincorrectly speci…ed, MCQL may have interesting
Examples:
For Gaussian MCQL,
fUjZ u;σ2 = 1 σφ
u
σ ,
whereφ is the standard normal density.
The Gaussian MCQL estimates of β0 and βare identical to the OLS
estimates, and σ2 is estimated by:
σ2n =En
h
Y β0n Z0βn 2
i
.
Another interesting choice, for robustness purposes, is the Laplace density,
fUjZ(u) = 1
2exp( juj).
The Laplace MCQL estimates of β0 and β are identical to the LAD
Semiparametric E¢ ciency
One way of justifying the use of the analogue principle is that if the
only information of the DGP one has is that,
E[q(W,θ)] =0,
then, under some conditions, the estimator θn de…ned by the
equations:
En[q(W,θn)] =0,
is, asymptotically, the most e¢ cient (achieves a semiparametric e¢ ciency bound)
(If very interested see chapters 8 and 9 in Manski’s book Analog Estimation Methods in Econometrics
Frish-Waugh-Lowell Theorem
Consider the correlation model:
Y =β0+Z01β1+Z02β2+U, with E(U) =0 and E(UZ1) =E(UZ2) =0. It is easy to proof from "tarea 1" that:
β1 =V(Z1 L(Z1j1,Z2)) 1C(Y L(Yj1,Z2),Z1 L(Z1j1,Z2)). Proof: SinceL is a linear operator,
L(Yj1,Z2) =β0+L(Z1j1,Z2)0β1+Z02β2 and,
Y L(Yj1,Z2) = [Z1 L(Z1j1,Z2)]0β1+U
The Frish-Waugh-Lowell Theorem says that the OLS estimate of β1
can be written as:
β1n = Vn(Z1 Ln(Z1j1,Z2)) 1
Cn(Y Ln(Yj1,Z2),Z1 Ln(Z1j1,Z2)).
That is, β1n is the OLS estimator in the arti…cial linear model
(partitionated model):
We call predicted value of Yi to:
Yni = Ln(Yj1,Z=Zi)
= β0n+Z0iβn,
and the corresponding OLS residual
Uni = Yi Ln(Yj1,Z=Zi)
= Yi Yni.
The partitioned linear model can be written in terms of residuals as:
Uni(Y) = U(Z1)0
ni β1+errori,
with
Uni(Y) = Yi Ln(Yj1,Z2=Z2i) and U(Z1)
Analysis of Variance
In assignment 1 we proved that:
V(Y) = V(L(Yj1,Z)) +V(Y L(Yj1,Z)).
The empirical analog of this relation is:
Vn(Y) = Vn(Ln(Yj1,Z)) +Vn(Y Ln(Yj1,Z)),
and the sample coe¢ cient of determination is:
Rn2 = Vn(Ln(Yj1,Z)) Vn(Y)
=1 Vn(Y Ln(Yj1,Z))
Vn(Y)
Remark:
En(Ln(Yj1,Z)) =En(Y) =Y¯n,
and we can write:
Rn2 = En h
(Ln(Yj1,Z) Y¯n)2
i
Vn(Y)
= 1 En
h
(Y Ln(Yj1,Z))2
i
Vn(Y)
,
remember that σ2n = En
h
(Y Ln(Yj1,Z))2
i
is the Gaussian