The dependent variable of interest is change in policy. Here this is operationalized as change in the passage of a bill between two Congresses. More specifically, I analyze whether the outcome of the final version of a bill considered in a given chamber at timet-1changes when considered again at timet. By final version of a bill, I mean the official version of the bill at its latest point of consideration. To do this, I identify which bills are essentially the same between adjacent Congresses.
To match bills between Congresses, I first scrape the bill text of the final version of a bill considered by Congress. Using the text of each bill, I identify which bills are essentially copies of each other by calculating the amount of content that a bill in the current Congress took from a bill in the previous Congress. In doing so, I identify how much the final version of a bill considered in Congresstcopy and pasted from the previous Congress,t-1.2 The specific calculation I use to
estimate the degree of copying is a ratio of matches. A ratio of matches is the ratio of terms, in this case tri-grams or three word strings, shared between two documents, bills in adjacent congresses.3
2
A growing body of scholars have been using this and similar techniques to capture quantities of interest from the diffusion of frames in the media (Bail, 2014) to tracing policy ideas in legislation (Wilkerson, Smith and Stramp, 2015) to identifying copy and paste legislation (Garrett and Jansa, 2015).
3To pre-process the text, I lowercase the bills and generate a matrix of tri-grams used in each bill. Using this matrix, bills are then compared by calculating a ratio of matches. The ratio of matches formula is: |A∩B||B| . This is the number of terms shared between document A and document B divided by the number of terms in document B. It is a directional formula that tells how much document B shares in common with document A. In these calculations, the bills from Congress attare treated as B documents, while the bills from Congress att-1are treated as A documents in the formula. I do this in using thetextreusepackage (Mullen, 2016).
The result is a single statistic ranging from 0 to 1, where 0 means there is no shared content and 1 indicates that everything is the same.
The final step in the matching process is deciding at what threshold two bills can be said to be copies of each other. Here if the calculated ratio is greater than or equal to 0.90, then two bills are said to be copied. This means that 90% or more of the content of the bill considered in the current congress is the same as one or more bills introduced in the previous congress. This approach means that any bills that are incorporated into other bills or that have been altered in the amendment process in the more recent congress are not included.
For an example of this, compare the two following sentences:
• The president may from time to time give to the Congress Information of the State of the Union.
• The president shall from time to time give to the Congress Information of the State of the Union.
While all but one word in each sentence is the same in the two sentences, the actual meaning of the two sentences are different due to that one word: may or shall. If uni-grams, or single words, are used to compare the sentences, then the ratio of matches is: 0.94. However, if tri-grams or three word strings are used, then the ratio of matches is: 0.83. In the first, they would be said to be copied, but in the second, they are not. This is important for bill text because the language used between many bills is very similar and it is the injection of a single word, as above, can create large differences in policy.
Using these matched pairs, two dependent variables are created that indicate whether and how the bill outcome changed—one for the House and one for the Senate. There are four possibilities: the bill fails to pass both times, the bill passes and then fails, the bill fails and then passes, or the bill passes both times. Because the bill passing both times is an extremely rare event, the few bill pairs that fall into this category are dropped from the analysis. The policy area mainly addressed by each bill is provided by the Congressional Bills Project, which uses the Comparative Agendas Project’s coding scheme. By narrowing the analysis to these few bills, I can better leverage how changing strategies changes or does not change the outcome of a bill despite minimal variation. This is summarized in Table 4.1. As can be seen, in the House, 395 bills saw a change in outcome, where 205 bills went from being passed when first considered and then failing and 190 bills went from failing when first considered to passing. Additionally, 4620 bills did not see a change in outcome. In
the Senate, only 164 saw a change in outcome, where 96 bills were passed when first considered and then failed and 78 bills went from failing to passing. Additionally, 4848 bills did not see a change in outcome.
Table 4.1: Summary of Bill Outcomes Between Congresses by Chamber Failed:Failed Passed:Failed Failed:Passed Excluded
House 4620 205 190 8
Senate 4848 96 78 1
Note: Cells are the number of matched pairs.