# ESTIMATION IN THE LINEAR MODEL The analog principle

(1)

## Macro Econometría Avanzada

(2)

ESTIMATION IN THE LINEAR MODEL

(3)

## Parameter Identi…cation

De…nition: For any parametric model, we say that the parameters are

identi…ed if, for a given r 1 vector of functions

q :Rp Θ!Rr

we have that

θRr

is the only value in Θsuch that:

E[q(W,θ)] =0.

(We can also say that the vector of functions q identi…es the

(4)

Often, also there exists a functionQ such that,

θ=arg min

t2Θ E

[Q(W,t)].

(5)

Next we see a couple of examples where identi…cation is not straightforward

Example 1. Let the model beE(Y jX) =Xθ0, where

θ0=4,andX is a symmetric rv with EX4=EX6,-this happens for instance for the

N(0,0.2).

A method of moments estimator would be based in the sample analogues of the conditions

Eh Y Xθ 1i=Eh Y Xθ Xi=0 (1)

This is an ine¢ cient GMM estimator since the instruments 1 andX

are arbitrary.

Note that these two instruments do not identify θ0 since θ =6 also

(6)

Example 2. Let the model beE(Y jX) =θ20X +θ0X2 with

θ0 =5/4 andV(Y jX) constant. Suppose the researcher properly speci…es the model and chooses the optimal instrument

W0=2θ0X+X2.

In this case θ = 5/4 also veri…es

E[(Y θ2X θX2)W0] =0 whenX follows a N( 1,1).

W0 is not a feasible instrument because θ0 is unknown. If the

researcher had employedW =2θX+X2, then θ= 5/4 and also

(7)

One typical solution is to restrict the parametric space Θwhere θ

moves

other solution is to restrict the behavior of the conditioning variableX

(8)

EXAMPLES:

Quasi-Likelihood Function:

Let FW be the d.f. ofW

andgW :Rp Θ!R+ a parametric speci…cation of the density

function (or probability function) ofW,

which may not be equal to the true density (or probability function)

of W.

We can de…ne, under suitable regularity conditions:

θ=arg max

t2Θ E

[lngW(W,t)],

or as the solution to,

∂θE[lngW(W,θ)] =0

(9)

Some motivation for the previous slide

Think about a r.v. X with pdff(X),

and now consider any alternative pdf, g(X)

-obviuosly with same

support-note that g(X)is a valid pdf, so that

Z

g(X)dX =1

and now consider

Elogg(X)

(10)

by Jensen

Elogh(X) logEh(X)

so that

Elogg(X)

f(X) logE

g(X)

f(X)

= log

Z g(X)

f(X)f(X)dX =log

Z

g(X)dX =0

that is

Elogg(X) Elogf(X)

that isElogg(X) is maximized when you chooseg =f

(11)

motivation-Linear Predictors:

Consider the linear predictorL(YjZ=z) =β0+βz,the

parameters are identi…ed as the solution of,

E Y β0 Z0β = 0

E Z Y β0 Z0β = 0

that is,

β0 = E(Y) E(Z)0βand β=V(Z) 1C(Y,Z),

provided that V(Z)is non-singular. We can also write,

β0

β = arg minb,b02ΘE

h

Y b0 Z0b 2i

(12)

LAD predictor and α th Quantile predictor:

The parameters of theLAD predictor are de…ned as:

β0

β =arg minb,b02ΘE Y b0 Z

0b

(or through the FOC),

and the parameters of theα th Quantile predictor are de…ned as,

β0

β =arg minb,b02ΘE Y b0 Z

0b + (2α 1) Y b

0 Z0b

(13)

In some cases models are not properly speci…ed and hence parameters are not identi…ed.

For instance in the linear model we need V(Z)to be non-singular, that is, if we specify

E(YjZ1,Z2) = β0+β1Z1+β2Z2+β3(Z1+Z2)

we can not identify the β0s

or if we specify

E(YjZ1,Z2) =β0+β1Z1+β2Z2β3

if the true β2 is zero then β3 is not identi…ed

These are examples of models that are wrongly speci…ed: there is no way of identifying the parameters.

We focus on properly speci…ed models, and the interesting issue is to

construct or to …nd the functionq or Q to be able to identify the

(14)

## Sampling Distribution and Moment Estimators

Henceforth, we assume that we observe a random sampleW1, ...Wn

from the distribution of FW,i.e. W1, ...Wn are independent copies of W.

Recalling thatFW(w) =E1fW wg

The sample analog ofFW is,

FWn(w) =

1

n

n

## ∑

i=1

1fWi wg,

known as sample, or empirical, distribution function.

FWn is a random variable taking values in the space of functions with

jump discontinuities.

WhenW is continuous, it can be interpreted as the cumulative

distribution of a discrete uniform r.v. that takes values W1, ...Wn

(15)

For a generic functionm,

E[m(W)] =

Z

Rp+1m(w)FW(dw)

is naturally estimated by,

En[m(W)] = Z

Rp+1m(w)FWn(dw)

= 1

n

n

## ∑

i=1

m(Wi),

which is known as sample, or empirical, expectation of m(W). In particular, if m(W) =WW0,

En WW0 =

1

n

n

## ∑

i=1

(16)

EXAMPLES:

Sample or empirical mean: The empirical analog of µW =E(W)is:

Wn = En[W]

= 1

n

n

## ∑

i=1

(17)

Sample or empirical variance and covariance matrix: The sample analog of V(W)is:

Vn(W) = En (W En(W)) (W En(W))0

= 1

n

n

## ∑

i=1

Wi Wn Wi Wn 0.

The sample or empirical covariances are the elements outside the principal diagonal of Vn(W);i.e. the sample analog of the vector of

covariances C(Y,Z)is:

Cn(Y,Z) = En[(Y En(Y)) (Z En(Z))]

= 1

n

n

i=1

(18)

## The Analog Principle: Method of Moments

Ifθ 2Θ is the only value such that:

E[q(W,θ)] =0,

the empirical version ofθ is θn,the solution to:

En[q(W,θn)] =0 !Z estimator (2)

(Z from zero) If also,

θ=arg min

t2Θ E

[Q(W,t)],

then

θn =arg min

t2Θ E

n[Q(W,t)] !M estimator. (3)

(19)

Notice that parameters can also be written, for a given function η as:

θ =η(FW)

and the estimator as,

θn =η(FWn).

This estimation approach, which consists of substituting FW byFWn

in expressions de…ning (identifying) parameters by means of moment restrictions

(20)

EXAMPLES:

Maximum Quasi-Likelihood Estimator:

θn =arg max

t2Θ E

n[lngW(W,t)],

or as the solution to,

∂θEn[lngW(W,θn)]

| {z }

=0

(21)

Method of Moments and Ordinary Least Squares:

En Y β0n Z0βn = 0

En Z Y β0n Z0βn = 0

provided that Vn(Z) is non-singular; that is,

β0n =En(Y) En(Z)0βn and βn =Vn(Z) 1Cn(Y,Z),

We can also write,

β0n

βn = arg minb,b02ΘEn

h

Y b0 Z0b 2i

,

(22)

Empirical Linear Predictor:

Ln(Yj1,Z=z) = β0n+z0βn.

Remark: All the properties of L (see e.g. tarea 1) are inherited byLn

(see tarea 2).

An empirical version of a parametric regression function can also be obtained in a similar way.

(23)

## LAD and Quantile Linear Regression Estimators

The parameters of theLAD predictor are estimated by:

β0n

βn =arg minb,b02ΘEn Y b0 Z

0b

and the parameters of theα th Quantile predictor are estimated by

β0n

βn =arg minb,b02ΘEn Y b0 Z

0b + (2α 1) Y b

(24)

When theα th quantile regression model is linear, β0n,β0n 0 are

the estimators of their parameters; i.e. the curve

QYjZ,α(z) = β0+z0β

is estimated by

(25)

## Maximum Conditional Quasi-Likelihood Estimation

Consider the linear regression model:

Y = β0+Z0β+U with E(UjZ) =0 a.s.

Assuming a particular parametric conditional density function for U,

fUjZ say, depending on a set of parametersγ,

the parameter vector β0,β0 0 can be identi…ed in some occasions as

θ =

0 @ ββ0

γ

1

A= arg max

(b0,b0,g0)2Θ

E lnfUjZ Y b0 Z0b;g ,

(26)

In particular, if the true conditional density function belong to the linear exponential family of distributions,

(e.g. Normal, Poisson, Exponential, etc),

and the assumed conditional density also belongs to this family,

(27)

The sample analog ofθ is the maximum conditional quasi-likelihood

(MCQL) estimator:

θn =

0 @

β0n βn γn

1

A= arg max

(b0,b0,g0)2Θ

En lnfUjZ Y b0 Z0b;g .

The estimators of βare more e¢ cient than OLS when fUjZ is

correctly speci…ed, and the resulting estimator is known as maximum conditional likelihood (MCL) estimator.

Even whenfUjZ isincorrectly speci…ed, MCQL may have interesting

(28)

Examples:

For Gaussian MCQL,

fUjZ u;σ2 = 1 σφ

u

σ ,

whereφ is the standard normal density.

The Gaussian MCQL estimates of β0 and βare identical to the OLS

estimates, and σ2 is estimated by:

σ2n =En

h

Y β0n Z0βn 2

i

.

Another interesting choice, for robustness purposes, is the Laplace density,

fUjZ(u) = 1

2exp( juj).

The Laplace MCQL estimates of β0 and β are identical to the LAD

(29)

## Semiparametric E¢ ciency

One way of justifying the use of the analogue principle is that if the

only information of the DGP one has is that,

E[q(W,θ)] =0,

then, under some conditions, the estimator θn de…ned by the

equations:

En[q(W,θn)] =0,

is, asymptotically, the most e¢ cient (achieves a semiparametric e¢ ciency bound)

(If very interested see chapters 8 and 9 in Manski’s book Analog Estimation Methods in Econometrics

(30)

## Frish-Waugh-Lowell Theorem

Consider the correlation model:

Y =β0+Z01β1+Z02β2+U, with E(U) =0 and E(UZ1) =E(UZ2) =0. It is easy to proof from "tarea 1" that:

β1 =V(Z1 L(Z1j1,Z2)) 1C(Y L(Yj1,Z2),Z1 L(Z1j1,Z2)). Proof: SinceL is a linear operator,

L(Yj1,Z2) =β0+L(Z1j1,Z2)0β1+Z02β2 and,

Y L(Yj1,Z2) = [Z1 L(Z1j1,Z2)]0β1+U

(31)

The Frish-Waugh-Lowell Theorem says that the OLS estimate of β1

can be written as:

β1n = Vn(Z1 Ln(Z1j1,Z2)) 1

Cn(Y Ln(Yj1,Z2),Z1 Ln(Z1j1,Z2)).

That is, β1n is the OLS estimator in the arti…cial linear model

(partitionated model):

(32)

We call predicted value of Yi to:

Yni = Ln(Yj1,Z=Zi)

= β0n+Z0iβn,

and the corresponding OLS residual

Uni = Yi Ln(Yj1,Z=Zi)

= Yi Yni.

The partitioned linear model can be written in terms of residuals as:

Uni(Y) = U(Z1)0

ni β1+errori,

with

Uni(Y) = Yi Ln(Yj1,Z2=Z2i) and U(Z1)

(33)

## Analysis of Variance

In assignment 1 we proved that:

V(Y) = V(L(Yj1,Z)) +V(Y L(Yj1,Z)).

The empirical analog of this relation is:

Vn(Y) = Vn(Ln(Yj1,Z)) +Vn(Y Ln(Yj1,Z)),

and the sample coe¢ cient of determination is:

Rn2 = Vn(Ln(Yj1,Z)) Vn(Y)

=1 Vn(Y Ln(Yj1,Z))

Vn(Y)

(34)

Remark:

En(Ln(Yj1,Z)) =En(Y) =Y¯n,

and we can write:

Rn2 = En h

(Ln(Yj1,Z) Y¯n)2

i

Vn(Y)

= 1 En

h

(Y Ln(Yj1,Z))2

i

Vn(Y)

,

remember that σ2n = En

h

(Y Ln(Yj1,Z))2

i

is the Gaussian

Actualización...

Actualización...