VI. Regression Discontinuity
An Approach to Macroeconomic
Quasi-Experimentation
Universe of Counterfactual Outcome
Probability Space = ( Ω
,
ω,
P
)
Sample Space Event Probability Measure
Objective: To construct
P
for a given experiment
P
=
ω
→
[0
,
1]
But... How should we interpret regression estimates?
y
=
α
1
+
β
1
X
1
+
β
2
X
2
+
ε
β
1
= (
X
0
VI. Regression Discontinuity
Treatment and Control group Comparability
Ideal World
Conduct Experiments
Causal inference: Random assignment of treatment and control
Interpretation not to be heavily "model dependent"
Real World
Self selection. measurement error, omitted variable bias, simultaneity
No counterfactuals
Econometrics: "Statistics with bad data"
Selection on Observables
–Mostly assume CIA–
Selection on Unobservables
-Often as hard to
justify-Further Readings:
-Angrist and Pischke (2008) “Mostly Harmless Econometrics"
Identification Strategy
How can I approximate observational data to an experiment in the
absence of random assignment?
Non-Conventional Approach to Macroeconomics
Propensity Score Matching
Diff-in-Diff
Regression Discontinuity Design
Event Study
Conventional Approach to Macroeconomics
VARs (Reduced, structural, factor-augmented, etc.)
Principal components
Hazard Functions
ARs (ARIMA, ARCH models)
Cluster Analysis
VI. Regression Discontinuity
Unidentified Questions
Do children do better at school if they start at 6 or 7?
Counterfactuals: Test score given that I started school at age 6 had I started at age 7Test score given that I started school at age 7 had I started at age 6
Two groups: one starts at 6, the other at 7, compare test scores in first
grade
Bias: One group is older when taking the test
Group that starts at 6 take test in 2nd grade, group that starts at 7 takes
test in 1st grade
Impact on income if parents have 1 vs 2 children?
Use of twins: more comparable with parents that have 1 child
Do hospitals make people healthier or sicker?
National Health Interview Survey 2005
Group
Sample Size
Men Health Status
St. Error
Hospital
7,774
2.21
0.014
No Hospital
90,049
2.93
0.003
VI. Regression Discontinuity
Tennessee STAR Experiment 1985-1989
$12 million USD
11,600 children in kindergarten, 3 treatments:
Small classes 13-17
Regular classes 22-25 with part time teacher’s aid
Do smoking lead to lower infant birth-weight?
Treatment status: mother’s smoking status
Outcome: Birth weight of infants
Problems: age is related to both treatment status and outcome
older mothers have heavier infants
VI. Regression Discontinuity
Counterfactuals of interest
Outcome (infants’ birth-weight) of mothers who smoked if they had chosen not to
smoke
Hospital Example:
Group
Sample Size
Men Health Status
St. Error
Hospital
7,774
2.21
0.014
No Hospital
90,049
2.93
0.003
Trivial example... but resembles
other non-trivial questions
What is the variable of interest?
[
y
1i|
D
i= 1]
−
[
y
0i|
D
i= 1]
Problem: only observe one outcome per person
y
i=
D
iy
1i+ (1
−
D
i)
y
0iy
1i|
D
i= 1 Observed
y
1i|
D
i= 0
Hypothetical
y
0i|
D
i= 0 Observed
y
0i|
D
i= 1
Hypothetical
Where
y
i=
Health Status
D
i=
1
hospital
VI. Regression Discontinuity
Group Mean Health Status
Hospital 2.21
No Hospital 2.93
E
[
y
1i|
D
i= 1]
−
E
[
y
0i|
D
i= 0]
→
What I observe
(2
.
21
−
2
.
93)
=
E
[
y
1i|
D
i= 1]
−
E
[
y
0i|
D
i= 1]
|
{z
}
E[y1i−y0i|Di= 1]
+
E
[
y
0i|
D
i= 1]
−
E
[
y
0i|
D
i= 0]
|
{z
}
Bias<0
Potential Outcomes framework
Regression framework
y
i=
D
iy
1i+ (1
−
D
i)
y
0i-vs-
y
i=
α
+
ρ
D
i+
η
iy
i=
y
0i+ (
y
1i+
y
0i)
D
i↓
↓
y
i=
α
+
ρ
D
i+
η
i↓
↓
E
[
y
0i]
y
0i−
E
[
y
0i]
E
[
y
i|
D
i= 1] =
α
+
ρ
+
E
[
ηi
|
D
i= 1]
E
[
y
i|
D
i= 0] =
α
+
E
[
ηi
|
D
i= 0]
E
[
y
i|
D
i= 1]
−
E
[
y
i|
D
i= 0] =
ρ
+
E
[
ηi
|
D
i= 1]
−
E
[
ηi
|
D
i= 0]
|
{z
}
E[y0i|Di= 1]−E[y0i|Di= 0]
BIAS
VI. Regression Discontinuity
E
[
y
i|
D
i= 1]
−
E
[
y
i|
D
i= 0] =
ρ
+
E
[
η
i|
D
i= 1]
−
E
[
η
i|
D
i= 0]
|
{z
}
E[y0i|Di= 1]−E[y0i|Di= 0]
BIAS
Solution:
-Controlled Experiment
-Bouncer in front of Hospital
Optimistic view
Conditional Independence Assumption (CIA)
E
[
y
i|
X
i,
D
i= 1]
−
E
[
y
i|
X
i,
D
i= 0] = (
what i want
)+
E
[
y
0i|
X
i,
D
i= 1]
−
E
[
y
0i|
X
i,
D
i= 0]
|
{z
}
0
Non- Binnary Treatment:
# years of school
⇒
S
iWhat individual “
i
" would earn for any value of S
⇒
Y
si≡
f(s)
CIA
⇒
y
si⊥
S
i|
Xi
∀
s
VI. Regression Discontinuity
Example 2: Going to College
y
1i- earning had "i " gone to college
y
0i- earning had "i " not gone to college
C
i= 1 go to college
C
i= 0 don’t go to college
I observe:
E
[
y
i|
C
i= 1]
−
E
[
y
i|
C
i= 0] =
E
[
y
1i−
y
0i|
C
i= 1] +
E
[
y
0i|
C
i= 1]
−
E
[
y
0i|
C
i= 0]
|
{z
}
Bad Control Problem
College -vs- no college
/
Blue -vs- white collar
Y
i=
C
iy
i1+ (1
−
c
i)
y
i0w
i=1 - white collar
W
i=
C
iw
i1+ (1
−
c
i)
w
i0w
i1- white collar & C =1
w
i0- white collar & C =0
CIA
⇒
E
[
y
i|
c
i= 1]
−
E
[
y
i|
c
i= 0] =
E
[
y
1i−
y
0i]
E
[
w
i|
c
i= 1]
−
E
[
w
i|
c
i= 0] =
E
[
w
1i−
w
0i]
VI. Regression Discontinuity
Bad Control Problem cont.
diff in
y
ibetween college graduates and others (without collage) given that they are white
collar. i.e. want:
y
i1−
y
i0|
w
i1E
[
y
i|
w
i= 1
,
C
i= 1]
−
E
[
y
i|
w
i= 1
,
C
i= 0]
=
E
[
y
i1|
w
i1= 1
,
C
i= 1]
−
E
[
y
i0|
w
0i= 1
,
C
i= 0]
E
[
y
i1|
w
i1= 1]
−
E
[
y
i0|
w
0i= 1]
+ CIA
=
E
[
y
i1|
w
i1= 1]
−
E
[
y
i0|
w
i1= 1]
|
{z
}
E[yi1−yi0|wi1]
+
E
[
y
i0|
w
i1= 1]
−
E
[
y
i0|
w
0i= 1]
|
{z
}
BIAS
Casual effect on college on those
Any college student
Gets a white collar job
that work in white collar job
who gets white
without college
when they have a college degree
collar job
≈
E
[
y
0i]
(e.g Bill Gates)
Matching & Propensity Score Functions
Estimates the effects of a treatment by accounting for covariates that predict
receiving treatment
-Rosenbaum and Rubin (1983)Easier with categorical variables (Matching)
Harder with continuous variables (Propensity Score Matching)
VI. Regression Discontinuity
Example: Training Program
Variables (All binary except income):
Treatment
Black
Hispanic
Married
Degree
Income
l l l l l l l l l l l l l l l l
P
s
$
s(¯y
is−
¯
y
0s)donde
$
s=
Key Assumption
Prosperity Score Theorem:
Corollary of CIA
(CIA)
(PST)
y
0
i
,
y
1
i
⊥
D
i
|
X
i
−→
y
0
i
,
y
1
i
⊥
D
i
|
P
(
X
i
)
4 Steps to Propensity Score Matching:
1
Estimate Propensity Score
2
Matching
3
Stratification
VI. Regression Discontinuity
Propensity score matching methodology
Propensity Score: conditional probability of receiving treatment given
X
iThrough the use of a logistic model or through generalized boosted modeling
P
(
X
i) =Pr
(
D
i= 1
|
X
i) For eachi
within the sample
Pr
(
D
i= 1
|
X
i) = Φ(
X
i0δ
)
Propensity score matching methodology
Propensity Score: conditional probability of receiving treatment given
X
iThrough the use of a logistic model or through generalized boosted modeling
Matching: Find Individuals with no treatment, with similar levels of Propensity Scores as
to those with treatment
Stratification:
VI. Regression Discontinuity
Matching methods explained
Propensity scores for treated and control groups
Matching methods: for each treated observation i, we need to find matches
of control observations(s) j with similar characteristics
Matching with or without replacement
Matching without replacement: each control observation is used no
more than one time as a match for a treated observation.
VI. Regression Discontinuity
Nearest neighbor matching
For each treated observation
i
, select a control observation
j
that has the closest
x
.
min
k
pi
−
pj
k
Radius matching
Each treated observation i is matched with control observation j that fall within a
specified radius.
k
p
i
−
p
j
k
<
r
Kernel matching
Each treated observation i is matched with several control observations, with
weights inversely proportional to the distance between treated and control
observations.
With matching based on propensity scores, the weights are defined as:
w
(
i
,
j
) =
K(
pj−pi
h
)
P
n0 j=1K(
pj−pi h
)
Here h is the bandwidth parameter.
Stratification or interval matching
To Recap
Average treatment effect on the treated (ATET)
ATET is the difference between the outcomes of treated and the outcomes of the
treated observations if the had not been treated.
ATET
=
E
(∆
|
D
= 1) =
E
(
y
1|
x
,
D
= 1)
−
E
(
y
0|
x
,
D
= 1)
The second term is a counterfactual so it is not observable and needs to be
estimated.
Propensity score method
After matching on propensity scores we can compare the outcomes of treated and
control observations
ATET
=
E
(∆
|
p
(
x
)
D
= 1) =
E
(
y
1|
p
(
x
)
,
D
= 1)
−
E
(
y
0|
x
,
D
= 0)
Empirical estimation
Each treated observation
i
is matched
j
control observations and their outcomes
y
0are weighed by
w
.
ATET
=
1
n
1X
i∈(D=1)
[
y
1,i−
X
j
VI. Regression Discontinuity
Diff-in-Diff
Scale problem: Non-linearities in Outcome. What if control group
had been higher?
Good
Better
VI. Regression Discontinuity
Key Assumption
Trend in control group approximates what would have occurred in treatment group in
the absence of treatment
"weaker version of CIA"
DD
=
E
(∆
treated−
∆
control|
D
= 1)
=
E
[(
y
1,t+1−
y
1,t)
−
(
y
0,t+1−
y
0,t)
|
x
,
D
= 1]
Regression Framework:
y
i=
β
0+
β
1D
Post+
β
2D
Treat+
β
3D
PostD
Treat+ (
β
4X
i) +
εi
E
[
y
|
X
i,
D
Post= 1
,
D
Treat= 1] =
β
0+
β
1+
β
2+
β
3E
[
y
|
X
i,
D
Post= 0
,
D
Treat= 1] =
β
0+
β
2E
[
y
|
X
i,
D
Post= 1
,
D
Treat= 0] =
β
0+
β
1E
[
y
|
X
i,
D
Post= 0
,
D
Treat= 0] =
β
0β
1+
β
3β
1Event Study
Event Date
VI. Regression Discontinuity
Intro
Hahn et al. (2001): “RDD require seemingly mild assumptions compared to tose needed
for other non-experimental approaches"
Lee (2008): “one need not assume the RDD isolates treatment variation that is ‘as good
as randomized’; instead, such randomized variation is a
consequence
of agents’ inability
to
precisely
control the assignment variable near the known cutoff"
Precise sorting around the cutoff is a sign of self-selection
Non-Parametric (i.e. local linear regression -using only data close to cutoff) and
Parametric (i.e. functional form like a low-order polynomial) estimation should be seen
as complementary. In practice, they lead to the computation of the exact same statistic.
Disadvatanges
Statistical power is lower than randomized experiments of equal sample size
(higher Type-II error)
Regression Discontinuity Design
Closest cousin of a randomized experiment
Deterministic rule that assigns treatment in a discontinuous fashion
D
i=
1
if x
i≥
x
00
if x
i<
x
0RD Scatterplot: Positive Treatment Effect
RD Scatterplot: No Treatment Effect
VI. Regression Discontinuity
Randomized Experiments: Treatment and control groups are divided on the
basis of a randomly generating number.
For example, let
µ
follow a uniform distribution with range [0,4]. Units with
µ
= 2 receive
treatment, units with
µ <
2 get placebo
Think of RDD where assignment variable
is
X
=
µ
and cutoff=2
Only difference:
X is independent if Y
i(1)and Y
i(0)
so,
E
[
Y
i(1)
|
X
=
c
]
,
E
[
Y
i(0)|
X
=
c
]
VI. Regression Discontinuity
Examples
PSAT/NMSQT: Top 16,000 test-takers get a scholarship
A small difference in test score can means a discontinuous jump in
scholarship amount (Thistlewaite & Campbell 1960)
School Class Size: Maimonides’ Rule -No more than 40 kids in a
class in Israel
40 kids in school means 40 kids per class. 41 kids means two classes with
20 and 21. (Angrist & Lavy 1999)
Union Elections: If employers want to unionize, NLRB holds election
50%: the employer doesn’t recognize the union, and 50% + 1 means the
employer is required to "bargain in good faith" (DiNardo & Lee 2004)
Air Pollution and Home Values: Clean Air Act’s National Ambient
Air Quality Standards
Thistlewaite & Campbell 1960
A small difference in test score
→
discontinuous jump in scholarship amount
Y
i=
α
+
τ
D
i+
X
0
i
β
+
i
B
0−
A
00= lim
ε↓0
E
[
Y
i|
X
i=
c
+
ε
]
−
lim
ε↑0
VI. Regression Discontinuity
Nonlinear RD
Issues with Causal Inference
Causal inference is possible because of the continuity of the underlying functions
E
[
Y
1|
X
=
c
] and
E
[
Y
0|
X
=
c
]
Can use average outcome right below cutoff (denied treatment) as counterfactuals
for those right above cutoff (treated)
Limitation: data closer than c’ and c” yield no observations. RDD is fundamentally an
extrapolation-based approach
Since data is required away from cutoff, estimates will depend on chosen functional form
(i.e. suppose
β
= 0)
VI. Regression Discontinuity
Assumptions
CIA is trivially met: Conditional on
x
, treatment dummy is a constant
(
recall CIA:
y
0
i
,
y
1
i
⊥
D
i
|
X
i
)
But, overlap assumption is violated: it is not possible to observe units with
either
D
= 0 or
D
= 1 for a given value of
X
RDD Testable Implications
McCrary’s Test
if density of the running variable exhibits a discontinuity at the cutoff. Test
estimates density separately on either side of cutoff and provides a Wald
estimate (Ho: no discontinuity)
Locally Balanced Covariates
Values of baseline covariates should not differ for observations above and
below the cutoff
Treatment should be unrelated to past values of outcome variables
VI. Regression Discontinuity
Example of Treatment Manipulation
D. Density of Income
With Pre-Announcement and Manipulation
RDD: Valid or Invalid?
Example
Suppose there are 2 types of students: A and B
A are aware of benefits when exceeding 50% threshold on test
B are ignorant and less able
Suppose 50% of questions are trivial (but students make careless errors
which can be avoided by double-checking)
Only A would double-check in order to guarantee benefit
Results: Those who barely pass: combination of A and B. Those who
barely failed: only type B -not valid counterfactual!
If questions are not trivial (no guarantee of benefit regardless of
VI. Regression Discontinuity