Model
The second model to be estimated builds on equation 3.4, and adds the spatial social influence
variables SNR
i and SNiW for both the residence and workplace. Let
Hi = Xi BiR BiW ARi AWi Ci βH = βX βBR βBR βAR βAW βC 0
Ti∗ can now be modeled as:
Ti∗ = αRSNiR+ αWSNiW + HiβH + νi+ εi (3.5)
where αR and αW are hypothesized to be positive, reflecting positive neighborhood peer
Endogeneity
There are several sources of potential endogeneity. First, unobserved heterogeneity at the
tract level could result in biased estimates if it happens to be correlated with the social
influence measures. A particular unobserved feature of the neighborhood may be responsible
for increasing public transit use of everyone in the neighborhood, which would be reflected
in a higher transit use rate as well as a greater probability of an observed commuter being
a public transit user. In such a case, the estimated coefficients would be partially reflecting
the effect of these unobserved factors rather than the effect of social influence. Second,
neighborhood sorting may also lead to biased estimates. People with similar preferences or
characteristics may choose to live or work in the same neighborhoods, and these preferences
and characteristics may be responsible for their commute mode choices rather than social
influence. Third, exploring social influence gives rise to the reflection problem discussed by
Manski (1993).
The first two sources of potential endogeneity are the ones of concern for this analysis. The
third source, the reflection problem, is not a concern because the social influence variable of
interest is a field variable which is estimated for the all residents or workers in a given tract.
Therefore, it is reasonable to assume that it is unlikely that the commute mode choice of a
single commuter would affect the choices of everyone living or working in the neighborhood.
Although the analysis controls for a rich set of neighborhood and household characteristics,
the first two sources of potential endogeneity are still a concern because the control variables
may not capture everything. To address the potential endogeneity, an IV is constructed.
Instrumental Variable
determinant of public transit use. It is reasonable to believe that the commute distance of
an individual’s neighbors or coworkers does not influence the individual’s choice of commute
mode except through its influence on the neighbors’ or coworkers’ choice of commute mode.
However, the commute distance itself of an individual may be correlated with that of the
neighbors or coworkers. Nevertheless, this does not affect the exclusion restriction of the IV
because the individual’s commute distance is in fact controlled for in the model. Therefore,
the weighted average of the commute distance of neighbors or coworkers is a suitable IV for
their public transit use.
The IV is constructed by using the LEHD Origin-Destination (OD) data. The LEHD OD
files provide the number of workers in each pair of home and work neighborhoods, where
each neighborhood is a census block. To construct the IV, the data are first aggregated
to the census tract level since this is the definition of the neighborhood in this study, and
then the destination tracts for each origin tract are identified. The Euclidean distance is
then computed between the centroid of the origin tract and each of the centroids of all the
associated destination tracts. Therefore, for every origin tract j, where j = 1, ..., J and J =
number of origin tracts, a Euclidean distance EDO
j,kj is computed, where kj = 1, ..., Kj and Kj = number of destination tracts paired with tract j. Thus, for each origin tract, there are
Kj associated Euclidean distances, EDOj,kj.
A weighted average of these distances is then computed, with the weights being the corre-
sponding fraction of total commuters living in the origin tract and commuting to a given
destination tract. That is, for every origin tract j, weights, wj,kOj, for all destination tracts k
are computed, such that
wOj,k j = workersO j,kj workersO j where workersO
j is the total number of workers commuting from tract j, and workersOj,kj is the number of workers commuting from tract j to tract kj. The weighted average Euclidean
distance for origin tracts, EDO
j is then computed as:
EDOj = Kj X kj=1 wOj,kj × EDO j,kj
Each origin tract now has a weighted average distance variable, which is used as an IV for the
residence tract transit use rate. Hence, ZiR, the IV for SNiR, is matched to the corresponding
EDOj , such that
ZiR= EDOj f or R = O
The algorithm is repeated to construct an IV for the workplace social influence variable,
where the origin neighborhoods, jk, for each destination neighborhood, k, are identified,
and then the Euclidean distances, EDD
k,jk, between each destination tract centroid and the associated origin tracts centroids are computed. The weighted average of these distances,
EDD
k, are then calculated, where the weights, wDk,jk reflect the corresponding fraction of total commuters working in the destination tract and commuting from a given origin tract.
Therefore, wk,jD k = workersD k,jk workersD k EDkD = Jk X jk=1 wDk,j k × ED D k,jk ZiW = EDDk f or W = D where workersD
k is the total number of workers commuting to tract k, workersDk,jk is the number of workers commuting from tract jk to tract k, and ZiW is the IV for SNiW.
The model to be estimated is now:
where SNi = SNiR SNiW , α = αR αW 0 , Zi = ZiR ZiW 0 , ZR
i and ZiW represent the
exogenous variables (the constructed average Euclidean commute distance) for the residence
and workplace tracts respectively, and Π represents the reduced form coefficients. The error
terms, εi and µi, are assumed to be jointly normally distributed, and the model is estimated
using Maximum Likelihood Estimation.