ESTIMATION IN THE LINEAR MODEL The analog principle

(1)

Macro Econometría Avanzada

(2)

ESTIMATION IN THE LINEAR MODEL

(3)

Parameter Identi…cation

De…nition: For any parametric model, we say that the parameters are

identi…ed if, for a given r 1 vector of functions

q :Rp Θ_!Rr

we have that

θ 2Θ Rr

is the only value in Θsuch that:

E[q(W,θ)] =0.

(We can also say that the vector of functions q identi…es the

(4)

Often, also there exists a functionQ such that,

θ=arg min

t₂Θ E

[Q(W,t)].

(5)

Next we see a couple of examples where identi…cation is not straightforward

Example 1. Let the model beE(Y _jX) =Xθ0_{, where}

θ0=4,andX is a symmetric rv with EX4₌_EX6_,_{-this happens for instance for the}

N(0,0.2).

A method of moments estimator would be based in the sample analogues of the conditions

Eh Y Xθ ₁i₌_Eh _Y _Xθ _Xi₌₀ ₍₁₎

This is an ine¢ cient GMM estimator since the instruments 1 andX

are arbitrary.

Note that these two instruments do not identify θ0 since θ =6 also

(6)

Example 2. Let the model beE(Y _jX) =θ20X +θ0X2 with

θ0 =5/4 andV(Y jX) constant. Suppose the researcher properly speci…es the model and chooses the optimal instrument

W0=2θ0X+X2.

In this case θ = 5/4 also veri…es

E[(Y θ2X θX2)W0] =0 whenX follows a N( 1,1).

W0 is not a feasible instrument because θ0 is unknown. If the

researcher had employedW =2θX+X2, then θ= 5/4 and also

(7)

One typical solution is to restrict the parametric space Θwhere θ

moves

other solution is to restrict the behavior of the conditioning variableX

(8)

EXAMPLES:

Quasi-Likelihood Function:

Let FW be the d.f. ofW

andgW :Rp Θ!R+ a parametric speci…cation of the density

function (or probability function) ofW,

which may not be equal to the true density (or probability function)

of W.

We can de…ne, under suitable regularity conditions:

θ=arg max

t₂_Θ E

[lngW(W,t)],

or as the solution to,

∂

∂θE[lngW(W,θ)] =0

(9)

Some motivation for the previous slide

Think about a r.v. X with pdff(X),

and now consider any alternative pdf, g(X)

-obviuosly with same

support-note that g(X)is a valid pdf, so that

Z

g(X)dX =1

and now consider

Elogg(X)

(10)

by Jensen

Elogh(X) logEh(X)

so that

Elogg(X)

f(X) logE

g(X)

f(X)

= log

Z _g₍_X₎

f(X)f(X)dX =log

Z

g(X)dX =0

that is

Elogg(X) Elogf(X)

that isElogg(X) is maximized when you chooseg =f

(11)

motivation-Linear Predictors:

Consider the linear predictorL(Y_jZ=z) =β₀+βz,the

parameters are identi…ed as the solution of,

E Y β₀ Z0β = 0

E Z Y β₀ Z0β = 0

that is,

β₀ = E(Y) E(Z)0βand β=V(Z) 1C(Y,Z),

provided that V(Z)is non-singular. We can also write,

β₀

β = arg minb,b0₂ΘE

h

Y b0 Z0b 2i

(12)

LAD predictor and α th Quantile predictor:

The parameters of theLAD predictor are de…ned as:

β₀

β =arg min_b_,_b0₂_ΘE Y b0 Z

0_b

(or through the FOC),

and the parameters of theα th Quantile predictor are de…ned as,

β₀

β =arg min_b_,_b0₂_ΘE Y b0 Z

0_b _{+ (}₂_α ₁₎ _Y _b

0 Z0b

(13)

In some cases models are not properly speci…ed and hence parameters are not identi…ed.

For instance in the linear model we need V(Z)to be non-singular, that is, if we specify

E(Y_jZ1,Z2) = β₀+β₁Z1+β₂Z2+β₃(Z1+Z2)

we can not identify the β0s

or if we specify

E(Y_jZ1,Z2) =β₀+β₁Z1+β₂Z₂β3

if the true β₂ is zero then β₃ is not identi…ed

These are examples of models that are wrongly speci…ed: there is no way of identifying the parameters.

We focus on properly speci…ed models, and the interesting issue is to

construct or to …nd the functionq or Q to be able to identify the

(14)

Sampling Distribution and Moment Estimators

Henceforth, we assume that we observe a random sampleW1, ...Wn

from the distribution of FW,i.e. W1, ...Wn are independent copies of W.

Recalling thatFW(w) =E1_fW wg

The sample analog ofFW is,

FWn(w) =

1

n

∑

i=1

1_f_W_i _w_g,

known as sample, or empirical, distribution function.

FWn is a random variable taking values in the space of functions with

jump discontinuities.

WhenW is continuous, it can be interpreted as the cumulative

distribution of a discrete uniform r.v. that takes values W1, ...Wn

(15)

For a generic functionm,

E[m(W)] =

Z

Rp+1m(w)FW(dw)

is naturally estimated by,

En[m(W)] = Z

Rp+1m(w)FWn(dw)

= 1

n

∑

i=1

m(Wi),

which is known as sample, or empirical, expectation of m(W). In particular, if m(W) =WW0,

En WW0 =

1

n

∑

i=1

(16)

EXAMPLES:

Sample or empirical mean: The empirical analog of µ_W =E(W)is:

Wn = En[W]

= 1

n

∑

i=1

(17)

Sample or empirical variance and covariance matrix: The sample analog of V(W)is:

Vn(W) = En (W En(W)) (W En(W))0

= 1

n

∑

i=1

Wi Wn Wi Wn 0.

The sample or empirical covariances are the elements outside the principal diagonal of Vn(W);i.e. the sample analog of the vector of

covariances C(Y,Z)is:

Cn(Y,Z) = En[(Y En(Y)) (Z En(Z))]

= 1

n

∑

i=1

(18)

The Analog Principle: Method of Moments

Ifθ 2Θ is the only value such that:

E[q(W,θ)] =0,

the empirical version ofθ is θn,the solution to:

En[q(W,θn)] =0 !Z estimator (2)

(Z from zero) If also,

θ=arg min

t₂Θ E

[Q(W,t)],

then

θn =arg min

t₂Θ E

n[Q(W,t)] !M estimator. (3)

(19)

Notice that parameters can also be written, for a given function η as:

θ =η(FW)

and the estimator as,

θn =η(FWn).

This estimation approach, which consists of substituting FW byFWn

in expressions de…ning (identifying) parameters by means of moment restrictions

(20)

EXAMPLES:

Maximum Quasi-Likelihood Estimator:

θn =arg max

t₂_Θ E

n[lngW(W,t)],

or as the solution to,

∂

∂θEn[lngW(W,θn)]

| {z }

=0

(21)

Method of Moments and Ordinary Least Squares:

En Y β₀_n Z0β_n = 0

En Z Y β₀_n Z0β_n = 0

provided that Vn(Z) is non-singular; that is,

β₀_n =En(Y) En(Z)0β_n and β_n =Vn(Z) 1Cn(Y,Z),

We can also write,

β₀_n

β_n = arg min_b_,_b0₂_ΘEn

h

Y b0 Z0b 2i

,

(22)

Empirical Linear Predictor:

Ln(Yj1,Z=z) = β₀_n+z0β_n.

Remark: All the properties of L (see e.g. tarea 1) are inherited byLn

(see tarea 2).

An empirical version of a parametric regression function can also be obtained in a similar way.

(23)

LAD and Quantile Linear Regression Estimators

The parameters of theLAD predictor are estimated by:

β₀_n

β_n =arg min_b_,_b0₂_ΘEn Y b0 Z

0_b

and the parameters of theα th Quantile predictor are estimated by

β₀_n

β_n =arg min_b_,_b0₂_ΘEn Y b0 Z

0_b _{+ (}₂_α ₁₎ _Y _b

(24)

When theα th quantile regression model is linear, β₀_n,β0_n 0 are

the estimators of their parameters; i.e. the curve

QY_jZ,α(z) = β0+z0β

is estimated by

(25)

Maximum Conditional Quasi-Likelihood Estimation

Consider the linear regression model:

Y = β₀+Z0β+U with E(UjZ) =0 a.s.

Assuming a particular parametric conditional density function for U,

f_U_j_Z say, depending on a set of parametersγ,

the parameter vector β₀,β0 0 can be identi…ed in some occasions as

θ =

0 @ ββ0

γ

1

A= arg max

(b0,b0,g0)₂_Θ

E lnf_U_j_Z Y b0 Z0b;g ,

(26)

In particular, if the true conditional density function belong to the linear exponential family of distributions,

(e.g. Normal, Poisson, Exponential, etc),

and the assumed conditional density also belongs to this family,

(27)

The sample analog ofθ is the maximum conditional quasi-likelihood

(MCQL) estimator:

θn =

0 @

β₀_n β_n γ_n

1

A= arg max

(b0,b0,g0)₂Θ

En lnfU_jZ Y b0 Z0b;g .

The estimators of βare more e¢ cient than OLS when fU_jZ is

correctly speci…ed, and the resulting estimator is known as maximum conditional likelihood (MCL) estimator.

Even whenf_U_j_Z isincorrectly speci…ed, MCQL may have interesting

(28)

Examples:

For Gaussian MCQL,

f_U_j_Z u;σ2 = 1 σφ

u

σ ,

whereφ is the standard normal density.

The Gaussian MCQL estimates of β₀ and βare identical to the OLS

estimates, and σ2 is estimated by:

σ2n =En

h

Y β₀_n Z0β_n 2

i

.

Another interesting choice, for robustness purposes, is the Laplace density,

f_U_j_Z(u) = 1

2exp( juj).

The Laplace MCQL estimates of β₀ and β are identical to the LAD

(29)

Semiparametric E¢ ciency

One way of justifying the use of the analogue principle is that if the

only information of the DGP one has is that,

E[q(W,θ)] =0,

then, under some conditions, the estimator θn de…ned by the

equations:

En[q(W,θn)] =0,

is, asymptotically, the most e¢ cient (achieves a semiparametric e¢ ciency bound)

(If very interested see chapters 8 and 9 in Manski’s book Analog Estimation Methods in Econometrics

(30)

Frish-Waugh-Lowell Theorem

Consider the correlation model:

Y =β₀+Z01β1+Z02β2+U, with E(U) =0 and E(UZ1) =E(UZ2) =0. It is easy to proof from "tarea 1" that:

β₁ =V(Z1 L(Z1j1,Z2)) 1C(Y L(Yj1,Z2),Z1 L(Z1j1,Z2)). Proof: SinceL is a linear operator,

L(Y_j1,Z2) =β₀+L(Z1j1,Z2)0β₁+Z02β2 and,

Y L(Y_j1,Z2) = [Z1 L(Z1j1,Z2)]0β₁+U

(31)

The Frish-Waugh-Lowell Theorem says that the OLS estimate of β₁

can be written as:

β₁_n = Vn(Z1 Ln(Z1j1,Z2)) 1

Cn(Y Ln(Yj1,Z2),Z1 Ln(Z1j1,Z2)).

That is, β₁_n is the OLS estimator in the arti…cial linear model

(partitionated model):

(32)

We call predicted value of Yi to:

Yni = Ln(Yj1,Z=Zi)

= β₀_n+Z0iβn,

and the corresponding OLS residual

Uni = Yi Ln(Yj1,Z=Zi)

= Yi Yni.

The partitioned linear model can be written in terms of residuals as:

U_ni(Y) = U(Z1)0

ni β1+errori,

with

U_ni(Y) = Yi Ln(Yj1,Z2=Z2i) and U(Z1)

(33)

Analysis of Variance

In assignment 1 we proved that:

V(Y) = V(L(Y_j1,Z)) +V(Y L(Y_j1,Z)).

The empirical analog of this relation is:

Vn(Y) = Vn(Ln(Yj1,Z)) +Vn(Y Ln(Yj1,Z)),

and the sample coe¢ cient of determination is:

R_n2 = Vn(Ln(Yj1,Z)) Vn(Y)

=1 Vn(Y Ln(Yj1,Z))

Vn(Y)

(34)

Remark:

En(Ln(Yj1,Z)) =En(Y) =Y¯n,

and we can write:

R_n2 = En h

(Ln(Yj1,Z) Y¯n)2

i

Vn(Y)

= 1 En

h

(Y Ln(Yj1,Z))2

i

Vn(Y)

,

remember that σ2_n = En

h

(Y Ln(Yj1,Z))2

i

is the Gaussian