matching subjects (Caliendo & Kopeinig, 2005). Caliendo and Kopeining (2005) identify five different algorithms, each with multiple variations, that all have some degree of trade-offs between bias and efficiency; thus, there is not one superior method.
Theoretically, with larger sample sizes, the variant matching methods should ultimately produce the same results. However, with smaller sample sizes, the performance of the estimator will depend on the structure of the data. For example, the overall proportion of treatment cases in the sample is an important factor. Each algorithm identifies the
appropriate match or matches in different ways, so ultimately the structure of the data and available controls should guide the decision of the algorithm selection.
The most commonly used matching method, nearest neighbor matching, is known to have several shortcomings – most notably bad matches if the nearest neighbor is far away. This is also the default method when using the Stata™ command “psmatch2.” Using nearest neighbor matching also presents a question of whether or not to allow
88 replacement, or whether to permit control cases to be matched with multiple treatment cases if it is the best available match. Allowing replacement undoubtedly increases the quality of the matches, but also increases the variance of the estimator. When not allowing replacement, the random sorting of the data is critical.
Another approach, caliper matching, sets a maximum distance in propensity score that a control can be from the targeted treatment. The specific level of the caliper is difficult to know prior to the trial and error process, which Caliendo and Kopeining (2005) present as a potential downside. Radius matching, used in conjunction with caliper matching, allows for the usage of all members within the caliper, not restricted to simply the nearest neighbor(s). Caliendo and Kopeining (2005) highlight the benefits to radius matching, explaining it “uses only as many comparison units as are available within the caliper and therefore allows for usage of extra (fewer) units when good matchers are (not) available” (p. 10). They argue this approach allows for oversampling without forcing bad matches, but conditioning matches on common support is important.
To select the matching method most appropriate for these data, I followed Arpino’s (2013) suggestion of exploring several methods through an iterative process to ultimately reduce the percentage of Standardized Bias (%SB). The %SB is a measure of the average imbalances in the covariates between the treatment and the control groups, and is related to checking for overlap, common support, and balance described in Step 3 below. Per this suggestion, using “psmatch2,” I explored seven different matching methods for the treatment of delay (in general) and the control of not delaying, matching on students’ demographic characteristics (gender, race, income, parents born in the U.S., parents’ marital status, parents’ education) and measures of academic preparation and
89 achievement (high school type, highest level of high school math, high school GPA, admissions test score). The results of these trials are shown in Table 10.
Table 10
Exploring Matching Methods to Improve Balance.
Replacement %SB
NN (1) No 4.52
NN (1) Yes 2.47
NN (3) Yes 2.09
NN (5) Yes 2.25
NN with caliper (0.01) Yes 2.47
NN (5) with caliper (0.01) Yes 2.20
Radius caliper (0.01) Yes 1.56
Source: U.S. Department of Education, National Center for Education Statistics, 2003-04 Beginning Postsecondary Students Longitudinal Study, Second Follow-up (BPS:04/09). Notes: NN = Nearest Neighbor, %SB = Percent Standardized Bias.
Ultimately, I selected the caliper and radius (.01) method as it yielded the lowest percent standardized bias using my variables. Caliendo and Kopeining (2005) also identify examining the standardized bias as an appropriate indicator for assessing the quality of the match. Specifically, they argue that biases below 3% or 5% are seen as sufficient. Because my study involved making several comparisons between different combinations of delayers, I also performed this exercise with other treatment conditions (i.e. matches based on different control and treatment groups) and consistently found radius caliper (.01) matching to be the most favorable method, and in every case the %SB was below 5%. As a robustness check, I also estimated the effects of delaying using other matching methods to see how sensitive the estimates were to the different methods. A comparison of the estimates of the effect of delaying on graduating with a bachelor’s degree using the caliper radius (.01) method and the nearest neighbor (3) method is shown in Appendix D. While the magnitudes of the odds ratios changed only slightly, the directions and levels of significance remained the same.
90 Next, in order to estimate the effects of postsecondary delay on the specified outcomes, two different matching models were estimated (using two different sets of covariates) to account for the fact that the relationship between delay choice and institutional and enrollment choices was indeterminable based on the data given. First, students were matched only on their pre-college characteristics, which included demographic and academic preparation and achievement information. Estimating the models in this way assumed that students made the decision to delay prior to deciding which type of institution to attend and how to enroll, and that the delay experience impacted the institutional and enrollment decisions. However, because it was not known if in fact students decided to delay prior to deciding where and how to enroll, I also estimated models that assumed that students made their delay decision after enrollment decisions (as is typically done with a gap year (Bull, 2006)). The second model matched students on their pre-college characteristics, as well as on their first-year enrollment
91 Table 11 displays the statistics, including the percentage of standardized bias, for each of the matched pairs examined in this study based on the two sets of covariates. Including different covariates in the models changed the %SB, though not substantially, and the %SB never exceeded 3.22. Additionally, this table displays the number and proportion of cases “on support,” which will be described in the next section.
92 Table 11
Caliper Radius (.01) Matching Statistics for All Match Pairs
Pre-college Experiences Only
Pre-college Experiences + Enrollment Choices
Reference Comparison N %SB OS %OS %SB OS %OS
No Delay Delayed 12,990 1.56 12,991 99.98 1.90 12,983 99.98
No Delay 1-Year Delay 12,120 0.89 12,111 99.89 1.09 12,116 99.89
2+ Year Delay 1-Year Delay 1,690 1.33 1,687 99.65 1.69 1,684 99.65
No Delay 2+Year Delay 12,170 1.68 12,169 99.98 2.16 12,161 99.98
No Delay Work Delay 12,750 1.53 12,743 99.98 1.88 12,736 99.98
Other Delay Work Delay 1,690 2.45 1,683 99.41 2.89 1,677 99.41
No Delay Other Delay 11,550 3.22 11,543 99.94 2.54 11,545 99.94
No Delay Travel Delay 11,810 1.15 11,810 99.97 0.98 11,797 99.97
Other Delay Travel Delay 1,690 2.05 1,673 98.82 1.21 1,690 98.82
No Delay Other Delay 12,480 1.62 12,478 99.97 1.66 12,479 99.97
No Delay "Gap Year" 11,782 0.99 11,782 100.0 1.01 11,777 100.0
Non-"Gap Yr" "Gap Year" 1,690 1.24 1,673 98.82 1.20 1,669 98.82
No Delay Non-"Gap Yr” 12,510 1.42 12,511 99.98 2.22 12,509 99.98
Source: U.S. Department of Education National Center for Education Statistics, 2003-04 Beginning Postsecondary Students Longitudinal Study, Second Follow-up (BPS:04/09). Notes: N is unweighted; SB = “Standardized Bias;” OS = “On Support”
Step 3: Checking for overlap, common support, and balance. After matching,