Table 1.1 provides an overview of the data used in this paper. The primary data source is a set of merged transaction and assessment records for every home sale in Massachusetts from 1993 through 2008. These data come from DataQuick, a large real estate data firm, and include detailed information on a host of house and transaction characteristics, includ-
ing the sales price, house size, lot size and precise location.8 Most importantly for my
purposes, the data include the year the house was built, as recorded in the assessment data. This allows me to pinpoint the time and location of all new units built.
A comparison with published Census data on building permits suggests that the year the house is listed as being built corresponds most closely to the year permits are issued, after which it takes almost a year to start and finish the house. Therefore I add one to the year of construction in the data to correspond to the year a house is finished and sold, which is what my theory incorporates. The comparison with permits data also indicates that the
DataQuick data capture about half of all new units built in Massachusetts, with the fraction captured decreasing somewhat towards the end of the sample because a house must be sold at least once to show up in the data at all. I drop the final year of the DataQuick construction sample, 2008, and include year fixed effects in all of my estimates to capture the changing coverage over time.
Even so, it is likely that the DataQuick data are not fully representative of new con- struction in Massachusetts; that is, the new units included are not a random sample. In particular, large apartment buildings with many jointly owned units will not show up in my data, which means that the sample consists entirely of single-family homes and condomini- ums. This is an important limitation of my analysis, and the results should be considered as primarily applying to the construction of owner-occupied units.
I calculate the existing stock of housing in each tract-year by taking the stock as reported in the 2000 Census and extrapolating forward and backward using the number of new units
reported in the DataQuick data. I assume that there is no depreciation, so that Kj,t =
Kj,t−1+Ij,t. My estimate of depreciation for the Boston metropolitan area from Paciorek
(2011a) was less than 0.003.
To explore the effects of zoning, I focus on Massachusetts, particularly the Boston area, for two reasons. First, it is generally seen as a having a housing market that is highly regulated, with new development made both very difficult and very expensive by local gov- ernments and community opposition (Glaeser and Ward 2009, Gyourko et al. 2008). Con- sequently, over the last few decades, rising demand has led to substantial increases in house prices with relatively little expansion of the housing stock. This contrasts sharply with the massive population growth but generally subdued house prices in major metropolitan areas in the South, such as Atlanta or Dallas.
Second, as in much of New England, zoning and housing supply regulation in Mas- sachusetts are controlled primarily at the town or city level, with some oversight by the
state government. Because county governments are vestigial or even nonexistent, I do not have to worry about overlapping zoning regimes or multiple authorities over new construc-
tion.9 Perhaps as a result of this relatively simple governmental structure, Massachusetts
has two unique publicly available and data sets on zoning and other regulatory constraints on housing supply.
The first is a geographic information system (GIS) data layer produced by MassGIS, a state agency, that details the primary use zoning at every point in Massachusetts. In addition to providing detailed information on zoning codes, MassGIS also assigns a primary use variable that enables direct comparison of zoning across different municipal regimes, at the cost of losing substantial detail. Using GIS software, I calculate the fraction of land in each tract that is assigned a given primary use code, such as “R1”, which is single-family residential with a minimum lot size of at least 80,000 square feet, or about 1.8 acres. The full list of codes and their descriptions are provided in Table 2.2. I employ other MassGIS data layers to calculate the total land area of each Census tract as well as the fraction of that area that is covered by water or protected from development due to public ownership or other legal constraint. I also exclude areas that are steeply sloped and therefore unsuitable
for development, following Saiz (2010).10
Figure 2.2 shows each zoning code’s share of new housing built from 1994 to 2009, as well as its share of Massachusetts land area that is not steeply sloped, covered by water or protected from development. While these values do not condition on any characteristics, such as prices, it is informative to compare the two sets of bars. Particularly restrictive single-family zoning, such as that represented by codes RA or R1, has a much lower share of new construction that of land area, while the reverse is true for codes R3 through R5. Multifamily housing is only permitted in very small parts of the state, although it accounts
9One exception to this rule is the Cape Cod Commission, a regional planning authority with substantial
control over development in Barnstable County. Cape Cod lies outside my area of analysis, however.
for a larger share of housing than of land. Interestingly, some 10 percent of new housing is built on land that is, by this measure, not zoned for residential development. This may be because of miscoding, because a variance was issued, or because the land was rezoned after it was coded by MassGIS. Regardless, I include the OTH category in my estimates to account for this phenomenon.
The second data source on housing regulation is a database produced by the Pioneer Institute, a local public policy research group, in cooperation with Harvard’s Rappaport
Institute for greater Boston.11 The Pioneer database includes a large number of variables
that detail the regulatory environment as of 2004 in 187 towns and cities in eastern Mas-
sachusetts (Dain 2005, Glaeser and Ward 2009).12 The information was compiled from
interviews with local government officials and reviews of legal documents.
The Pioneer data consist of more than 150 different variables, which is far too many to analyze individually. I keep five indicator variables that vary over time within the sam- ple as municipalities were observed to change their zoning codes. These variables include whether the municipality has provisions for cluster zoning (“Cluster”) or inclusionary zon- ing (“Include”), limitations on the number of building permits issued per year in the town or for a given project (“Growphase”), bylaws limiting the development of wetlands (“Wet-
bylaw”), or sewer regulations more stringent than those of the state (“Septrule”).13
I take all other variables for which there is nearly full coverage of the municipalities in the sample and use factor analysis to narrow the dimensionality. I keep the first four
11The Pioneer data are somewhat similar to the Wharton Residential Land Use Regulation Index (Gyourko
et al. 2008), which I employ in Paciorek (2011a). Unfortunately for the purposes of comparison with that paper, the coverage of the Wharton index within Massachusetts is not sufficient for direct use here. The variables I construct from the Pioneer database are collectively quite highly correlated with the Wharton index, however.
12A map of the coverage is provided in Figure 2.1. The Pioneer data cover more than half of the munici-
palities in Massachusetts.
13For most of these variables, a higher value means less permissive zoning. The possible exceptions are
cluster zoning, which allows for more construction if new units are “clustered” on a portion of the lot, and inclusionary zoning, which provides incentives for building affordable housing.
factors, which correspond roughly to strict frontage requirements (“Front”), limitations on multifamily housing (“Mult”), pavement width of new subdivision roads (“Pave”), and the stringency of minimum land area requirements (“Land”). In all four cases a higher value corresponds to a more restrictive zoning regime by that dimension. All of the Pioneer variables, including my constructed factors, are listed in Table 2.3.
With the exception of the few Pioneer variables that explicitly vary over time, I as- sume throughout this paper that the zoning code and other regulatory variables are constant and exogenous over my time frame, from 1993 to 2008. While not ideal, this is likely to be a reasonable approximation of reality, particularly with respect to the zoning code, which seems to be written to perpetuate the characteristics of the preexisting housing stock (Glaeser and Ward 2009). Endogenizing the zoning code is a potentially interesting exten- sion, but it is likely to be more relevant in long-run studies of urban growth, rather than over the relatively short horizon in this paper.