• No se han encontrado resultados

An Approach to Macroeconomic Quasi-Experimentation

N/A
N/A
Protected

Academic year: 2018

Share "An Approach to Macroeconomic Quasi-Experimentation"

Copied!
44
0
0

Texto completo

(1)

VI. Regression Discontinuity

An Approach to Macroeconomic

Quasi-Experimentation

(2)

Universe of Counterfactual Outcome

Probability Space = ( Ω

,

ω,

P

)

Sample Space Event Probability Measure

Objective: To construct

P

for a given experiment

P

=

ω

[0

,

1]

But... How should we interpret regression estimates?

y

=

α

1

+

β

1

X

1

+

β

2

X

2

+

ε

β

1

= (

X

0

(3)

VI. Regression Discontinuity

Treatment and Control group Comparability

Ideal World

Conduct Experiments

Causal inference: Random assignment of treatment and control

Interpretation not to be heavily "model dependent"

Real World

Self selection. measurement error, omitted variable bias, simultaneity

No counterfactuals

Econometrics: "Statistics with bad data"

Selection on Observables

–Mostly assume CIA–

Selection on Unobservables

-Often as hard to

justify-Further Readings:

-Angrist and Pischke (2008) “Mostly Harmless Econometrics"

(4)

Identification Strategy

How can I approximate observational data to an experiment in the

absence of random assignment?

Non-Conventional Approach to Macroeconomics

Propensity Score Matching

Diff-in-Diff

Regression Discontinuity Design

Event Study

Conventional Approach to Macroeconomics

VARs (Reduced, structural, factor-augmented, etc.)

Principal components

Hazard Functions

ARs (ARIMA, ARCH models)

Cluster Analysis

(5)

VI. Regression Discontinuity

Unidentified Questions

Do children do better at school if they start at 6 or 7?

Counterfactuals: Test score given that I started school at age 6 had I started at age 7

Test score given that I started school at age 7 had I started at age 6

Two groups: one starts at 6, the other at 7, compare test scores in first

grade

Bias: One group is older when taking the test

Group that starts at 6 take test in 2nd grade, group that starts at 7 takes

test in 1st grade

(6)

Impact on income if parents have 1 vs 2 children?

Use of twins: more comparable with parents that have 1 child

Do hospitals make people healthier or sicker?

National Health Interview Survey 2005

Group

Sample Size

Men Health Status

St. Error

Hospital

7,774

2.21

0.014

No Hospital

90,049

2.93

0.003

(7)

VI. Regression Discontinuity

Tennessee STAR Experiment 1985-1989

$12 million USD

11,600 children in kindergarten, 3 treatments:

Small classes 13-17

Regular classes 22-25 with part time teacher’s aid

(8)

Do smoking lead to lower infant birth-weight?

Treatment status: mother’s smoking status

Outcome: Birth weight of infants

Problems: age is related to both treatment status and outcome

older mothers have heavier infants

(9)

VI. Regression Discontinuity

Counterfactuals of interest

Outcome (infants’ birth-weight) of mothers who smoked if they had chosen not to

smoke

(10)

Hospital Example:

Group

Sample Size

Men Health Status

St. Error

Hospital

7,774

2.21

0.014

No Hospital

90,049

2.93

0.003

Trivial example... but resembles

other non-trivial questions

What is the variable of interest?

[

y

1i

|

D

i

= 1]

[

y

0i

|

D

i

= 1]

Problem: only observe one outcome per person

y

i

=

D

i

y

1i

+ (1

D

i

)

y

0i

y

1i

|

D

i

= 1 Observed

y

1i

|

D

i

= 0

Hypothetical

y

0i

|

D

i

= 0 Observed

y

0i

|

D

i

= 1

Hypothetical

Where

y

i

=

Health Status

D

i

=

1

hospital

(11)

VI. Regression Discontinuity

Group Mean Health Status

Hospital 2.21

No Hospital 2.93

E

[

y

1i

|

D

i

= 1]

E

[

y

0i

|

D

i

= 0]

What I observe

(2

.

21

2

.

93)

=

E

[

y

1i

|

D

i

= 1]

E

[

y

0i

|

D

i

= 1]

|

{z

}

E[y1iy0i|Di= 1]

+

E

[

y

0i

|

D

i

= 1]

E

[

y

0i

|

D

i

= 0]

|

{z

}

Bias<0

Potential Outcomes framework

Regression framework

y

i

=

D

i

y

1i

+ (1

D

i

)

y

0i

-vs-

y

i

=

α

+

ρ

D

i

+

η

i

y

i

=

y

0i

+ (

y

1i

+

y

0i

)

D

i

(12)

y

i

=

α

+

ρ

D

i

+

η

i

E

[

y

0i

]

y

0i

E

[

y

0i

]

E

[

y

i

|

D

i

= 1] =

α

+

ρ

+

E

[

ηi

|

D

i

= 1]

E

[

y

i

|

D

i

= 0] =

α

+

E

[

ηi

|

D

i

= 0]

E

[

y

i

|

D

i

= 1]

E

[

y

i

|

D

i

= 0] =

ρ

+

E

[

ηi

|

D

i

= 1]

E

[

ηi

|

D

i

= 0]

|

{z

}

E[y0i|Di= 1]−E[y0i|Di= 0]

BIAS

(13)

VI. Regression Discontinuity

E

[

y

i

|

D

i

= 1]

E

[

y

i

|

D

i

= 0] =

ρ

+

E

[

η

i

|

D

i

= 1]

E

[

η

i

|

D

i

= 0]

|

{z

}

E[y0i|Di= 1]−E[y0i|Di= 0]

BIAS

Solution:

-Controlled Experiment

-Bouncer in front of Hospital

Optimistic view

Conditional Independence Assumption (CIA)

(14)

E

[

y

i

|

X

i

,

D

i

= 1]

E

[

y

i

|

X

i

,

D

i

= 0] = (

what i want

)+

E

[

y

0i

|

X

i

,

D

i

= 1]

E

[

y

0i

|

X

i

,

D

i

= 0]

|

{z

}

0

Non- Binnary Treatment:

# years of school

S

i

What individual “

i

" would earn for any value of S

Y

si

f(s)

CIA

y

si

S

i

|

Xi

s

(15)

VI. Regression Discontinuity

Example 2: Going to College

y

1i

- earning had "i " gone to college

y

0i

- earning had "i " not gone to college

C

i

= 1 go to college

C

i

= 0 don’t go to college

I observe:

E

[

y

i

|

C

i

= 1]

E

[

y

i

|

C

i

= 0] =

E

[

y

1i

y

0i

|

C

i

= 1] +

E

[

y

0i

|

C

i

= 1]

E

[

y

0i

|

C

i

= 0]

|

{z

}

(16)

Bad Control Problem

College -vs- no college

/

Blue -vs- white collar

Y

i

=

C

i

y

i1

+ (1

c

i

)

y

i0

w

i

=1 - white collar

W

i

=

C

i

w

i1

+ (1

c

i

)

w

i0

w

i1

- white collar & C =1

w

i0

- white collar & C =0

CIA

E

[

y

i

|

c

i

= 1]

E

[

y

i

|

c

i

= 0] =

E

[

y

1i

y

0i

]

E

[

w

i

|

c

i

= 1]

E

[

w

i

|

c

i

= 0] =

E

[

w

1i

w

0i

]

(17)

VI. Regression Discontinuity

Bad Control Problem cont.

diff in

y

i

between college graduates and others (without collage) given that they are white

collar. i.e. want:

y

i1

y

i0

|

w

i1

E

[

y

i

|

w

i

= 1

,

C

i

= 1]

E

[

y

i

|

w

i

= 1

,

C

i

= 0]

=

E

[

y

i1

|

w

i1

= 1

,

C

i

= 1]

E

[

y

i0

|

w

0i

= 1

,

C

i

= 0]

E

[

y

i1

|

w

i1

= 1]

E

[

y

i0

|

w

0i

= 1]

+ CIA

=

E

[

y

i1

|

w

i1

= 1]

E

[

y

i0

|

w

i1

= 1]

|

{z

}

E[yi1−yi0|wi1]

+

E

[

y

i0

|

w

i1

= 1]

E

[

y

i0

|

w

0i

= 1]

|

{z

}

BIAS

Casual effect on college on those

Any college student

Gets a white collar job

that work in white collar job

who gets white

without college

when they have a college degree

collar job

E

[

y

0i

]

(e.g Bill Gates)

(18)

Matching & Propensity Score Functions

Estimates the effects of a treatment by accounting for covariates that predict

receiving treatment

-Rosenbaum and Rubin (1983)

Easier with categorical variables (Matching)

Harder with continuous variables (Propensity Score Matching)

(19)

VI. Regression Discontinuity

Example: Training Program

Variables (All binary except income):

Treatment

Black

Hispanic

Married

Degree

Income

l l l l l l l l l l l l l l l l

P

s

$

s(¯

y

is

¯

y

0s)

donde

$

s

=

(20)

Key Assumption

Prosperity Score Theorem:

Corollary of CIA

(CIA)

(PST)

y

0

i

,

y

1

i

D

i

|

X

i

−→

y

0

i

,

y

1

i

D

i

|

P

(

X

i

)

4 Steps to Propensity Score Matching:

1

Estimate Propensity Score

2

Matching

3

Stratification

(21)

VI. Regression Discontinuity

Propensity score matching methodology

Propensity Score: conditional probability of receiving treatment given

X

i

Through the use of a logistic model or through generalized boosted modeling

P

(

X

i) =

Pr

(

D

i

= 1

|

X

i) For each

i

within the sample

Pr

(

D

i

= 1

|

X

i

) = Φ(

X

i0

δ

)

(22)

Propensity score matching methodology

Propensity Score: conditional probability of receiving treatment given

X

i

Through the use of a logistic model or through generalized boosted modeling

Matching: Find Individuals with no treatment, with similar levels of Propensity Scores as

to those with treatment

Stratification:

(23)

VI. Regression Discontinuity

Matching methods explained

Propensity scores for treated and control groups

Matching methods: for each treated observation i, we need to find matches

of control observations(s) j with similar characteristics

Matching with or without replacement

Matching without replacement: each control observation is used no

more than one time as a match for a treated observation.

(24)
(25)

VI. Regression Discontinuity

Nearest neighbor matching

For each treated observation

i

, select a control observation

j

that has the closest

x

.

min

k

pi

pj

k

Radius matching

Each treated observation i is matched with control observation j that fall within a

specified radius.

k

p

i

p

j

k

<

r

Kernel matching

Each treated observation i is matched with several control observations, with

weights inversely proportional to the distance between treated and control

observations.

With matching based on propensity scores, the weights are defined as:

w

(

i

,

j

) =

K(

pjpi

h

)

P

n0 j=1

K(

pjpi h

)

Here h is the bandwidth parameter.

Stratification or interval matching

(26)

To Recap

Average treatment effect on the treated (ATET)

ATET is the difference between the outcomes of treated and the outcomes of the

treated observations if the had not been treated.

ATET

=

E

(∆

|

D

= 1) =

E

(

y

1

|

x

,

D

= 1)

E

(

y

0

|

x

,

D

= 1)

The second term is a counterfactual so it is not observable and needs to be

estimated.

Propensity score method

After matching on propensity scores we can compare the outcomes of treated and

control observations

ATET

=

E

(∆

|

p

(

x

)

D

= 1) =

E

(

y

1

|

p

(

x

)

,

D

= 1)

E

(

y

0

|

x

,

D

= 0)

Empirical estimation

Each treated observation

i

is matched

j

control observations and their outcomes

y

0

are weighed by

w

.

ATET

=

1

n

1

X

i∈(D=1)

[

y

1,i

X

j

(27)

VI. Regression Discontinuity

Diff-in-Diff

Scale problem: Non-linearities in Outcome. What if control group

had been higher?

(28)

Good

Better

(29)

VI. Regression Discontinuity

Key Assumption

Trend in control group approximates what would have occurred in treatment group in

the absence of treatment

"weaker version of CIA"

DD

=

E

(∆

treated

control

|

D

= 1)

=

E

[(

y

1,t+1

y

1,t

)

(

y

0,t+1

y

0,t

)

|

x

,

D

= 1]

Regression Framework:

y

i

=

β

0

+

β

1

D

Post

+

β

2

D

Treat

+

β

3

D

Post

D

Treat

+ (

β

4

X

i

) +

εi

E

[

y

|

X

i

,

D

Post

= 1

,

D

Treat

= 1] =

β

0

+

β

1

+

β

2

+

β

3

E

[

y

|

X

i

,

D

Post

= 0

,

D

Treat

= 1] =

β

0

+

β

2

E

[

y

|

X

i

,

D

Post

= 1

,

D

Treat

= 0] =

β

0

+

β

1

E

[

y

|

X

i

,

D

Post

= 0

,

D

Treat

= 0] =

β

0

β

1

+

β

3

β

1

(30)

Event Study

Event Date

(31)

VI. Regression Discontinuity

Intro

Hahn et al. (2001): “RDD require seemingly mild assumptions compared to tose needed

for other non-experimental approaches"

Lee (2008): “one need not assume the RDD isolates treatment variation that is ‘as good

as randomized’; instead, such randomized variation is a

consequence

of agents’ inability

to

precisely

control the assignment variable near the known cutoff"

Precise sorting around the cutoff is a sign of self-selection

Non-Parametric (i.e. local linear regression -using only data close to cutoff) and

Parametric (i.e. functional form like a low-order polynomial) estimation should be seen

as complementary. In practice, they lead to the computation of the exact same statistic.

Disadvatanges

Statistical power is lower than randomized experiments of equal sample size

(higher Type-II error)

(32)

Regression Discontinuity Design

Closest cousin of a randomized experiment

Deterministic rule that assigns treatment in a discontinuous fashion

D

i

=

1

if x

i

x

0

0

if x

i

<

x

0

RD Scatterplot: Positive Treatment Effect

RD Scatterplot: No Treatment Effect

(33)

VI. Regression Discontinuity

(34)

Randomized Experiments: Treatment and control groups are divided on the

basis of a randomly generating number.

For example, let

µ

follow a uniform distribution with range [0,4]. Units with

µ

= 2 receive

treatment, units with

µ <

2 get placebo

Think of RDD where assignment variable

is

X

=

µ

and cutoff=2

Only difference:

X is independent if Y

i(1)

and Y

i

(0)

so,

E

[

Y

i

(1)

|

X

=

c

]

,

E

[

Y

i(0)

|

X

=

c

]

(35)

VI. Regression Discontinuity

Examples

PSAT/NMSQT: Top 16,000 test-takers get a scholarship

A small difference in test score can means a discontinuous jump in

scholarship amount (Thistlewaite & Campbell 1960)

School Class Size: Maimonides’ Rule -No more than 40 kids in a

class in Israel

40 kids in school means 40 kids per class. 41 kids means two classes with

20 and 21. (Angrist & Lavy 1999)

Union Elections: If employers want to unionize, NLRB holds election

50%: the employer doesn’t recognize the union, and 50% + 1 means the

employer is required to "bargain in good faith" (DiNardo & Lee 2004)

Air Pollution and Home Values: Clean Air Act’s National Ambient

Air Quality Standards

(36)

Thistlewaite & Campbell 1960

A small difference in test score

discontinuous jump in scholarship amount

Y

i

=

α

+

τ

D

i

+

X

0

i

β

+

i

B

0

A

00

= lim

ε↓0

E

[

Y

i

|

X

i

=

c

+

ε

]

lim

ε↑0

(37)

VI. Regression Discontinuity

Nonlinear RD

(38)

Issues with Causal Inference

Causal inference is possible because of the continuity of the underlying functions

E

[

Y

1

|

X

=

c

] and

E

[

Y

0

|

X

=

c

]

Can use average outcome right below cutoff (denied treatment) as counterfactuals

for those right above cutoff (treated)

Limitation: data closer than c’ and c” yield no observations. RDD is fundamentally an

extrapolation-based approach

Since data is required away from cutoff, estimates will depend on chosen functional form

(i.e. suppose

β

= 0)

(39)

VI. Regression Discontinuity

Assumptions

CIA is trivially met: Conditional on

x

, treatment dummy is a constant

(

recall CIA:

y

0

i

,

y

1

i

D

i

|

X

i

)

But, overlap assumption is violated: it is not possible to observe units with

either

D

= 0 or

D

= 1 for a given value of

X

(40)

RDD Testable Implications

McCrary’s Test

if density of the running variable exhibits a discontinuity at the cutoff. Test

estimates density separately on either side of cutoff and provides a Wald

estimate (Ho: no discontinuity)

Locally Balanced Covariates

Values of baseline covariates should not differ for observations above and

below the cutoff

Treatment should be unrelated to past values of outcome variables

(41)

VI. Regression Discontinuity

Example of Treatment Manipulation

D. Density of Income

With Pre-Announcement and Manipulation

(42)

RDD: Valid or Invalid?

Example

Suppose there are 2 types of students: A and B

A are aware of benefits when exceeding 50% threshold on test

B are ignorant and less able

Suppose 50% of questions are trivial (but students make careless errors

which can be avoided by double-checking)

Only A would double-check in order to guarantee benefit

Results: Those who barely pass: combination of A and B. Those who

barely failed: only type B -not valid counterfactual!

If questions are not trivial (no guarantee of benefit regardless of

(43)

VI. Regression Discontinuity

Challenges

The existence of a treatment being a discontinuous function of an

assignment variable is not sufficient to justify the validity of an RD design

(44)

Literature

Angrist and Pischke (2008): “Mostly harmless Econometrics"

Lee and Lemieux (2013): “Regression Discontinuity Designs in Social

Sciences"

Lee and Lemieux (2010): “Regression Discontinuity Designs in Economics"

Referencias

Documento similar

The concept of breakdown probability profile is useful to describe the break- down properties of any estimate obtained through resampling, not just the bagged median or the

As we use a logistic regression framework to derive the coefficients of the index, we can also interpret the stress index as a probability of the UK financial markets being in a

Thus, the main contributions of this paper are (i) a strategy to detect relevant behavioral changes based on a ranking mechanism in a given share of hosts, (ii) an approach to model

Building on this model, we provide the following con- tributions: (i) we demonstrate through an extensive nu- merical evaluation that energy consumption and through- put performance

Abstract: In this paper we use an approach based on the maximum principle to characterise the minimiser of a family of nonlocal and anisotropic energies I α defined on

No obstante, como esta enfermedad afecta a cada persona de manera diferente, no todas las opciones de cuidado y tratamiento pueden ser apropiadas para cada individuo.. La forma

Minimally invasive percutaneous plate osteosynthesis (MIPPO) technique applied in the treatment of humeral shaft distal fractures through a lateral approach.. Minimal invasive

Secondly, from an explanatory point of view and through logistic regression models, we aimed to estimate which structural or independent variables best explain the changes in the