Although the empirical strategy presented thus far attempts to account for omitted vari- ables as well as possible, we can never be completely sure that the MRP estimates presented above have a causal interpretation. When thinking about methods for inferring causality, one immediately looks for sources of exogenous variation in the number of star players un- related to team revenues. In the current context, it is natural to think about using injuries and suspensions as potential sources of exogenous variation as these are plausibly random events that affect in the number of star players in a given season.
While injuries and suspensions are not identified directly in my data, I can use the per- centage of games played by each player in my individual-level data to narrow down the search for injuries and suspensions. To this end, for each measure of ex-post star football and basketball player, I collected a list of players that were designated as stars in year t
and played no more than half of their team’s games in year t+ 1. Then for each of these players, I manually searched their player biographies on the websites of their respective athletic programs to determine if the low games-played percentage was due to injury or suspension. Having identified star players that played no more than 50% of games in the
outliers. That is, the fact that expected stars in this category are statistically different than unexpected stars and much larger than the composite estimate in Table 15 might really be due to the “Harden” and “Humphries” effect if they happen to be extremely valuable to their college teams realative to the average unexpected star in that category. The reason we do not see this same pattern in the top 10 points scorers measure is likely beacuse these two players are not included in that measure, which just suggests that top points scorer might just be a fairly noisy measure of quality as it pertains to players ability to generate revenues.
subsequent season due to injury or suspension, I then determined which of these injuries or suspensions were “season ending.” I define a season ending injury or suspension as being one that causes a player to miss at least the last-half of the season and rule out players that were “plagued by injury.” For instance, I do not count a player who was injured for part of the season, then came back from injury to play the remaining games, even if they ended up playing only 40% of games that season. The reason I do this is that I want the injury or suspension to cleanly end a star player’s contribution to team output. For example, I do not want to count players that were injured, then came back from injury to help their team into the NCAA tournament or a bowl game as this will muddy any identification strategy using injured players to estimate a star player’s effect on revenues.
Table 29 reports the number of injured star players I was able to identify in the data. As seen in the table, there are shockingly few season ending injuries.89 This low number of identified injuries does not appear to be due to missing data. For instance, checking my team and individual-level data reveals that very few players and no teams are missing the number of games played, which would cause my initial screening to potentially miss a num- ber of injuries. Furthermore, in manually searching for the biographies of the screened list of players, I did not encounter anyone that I could not find a biography for, which makes sense given these are all star players. The likely explanation for the low number of identified injuries is simply that season ending injuries are not extremely common events, which is compounded by the fact that stars players are very rare in the first place. Despite the lim- ited number of injuries and suspensions in the data, I use this source of exogenous variation to supplement my previous estimates of star player MRP using instrumental variables and a generalized difference-in-difference approach.
89
Including players who had seasons “plagued by injury” does not add a significant number of injured stars to the data. Therefore, I continue with season ending injuries and suspensions because these potentially provide cleaner identification of the effect of interest.
2.8.1 Instrumental Variables
An instrumental variables framework appears to be the method of choice in dealing with potential exogeneity in the literature that attempts to estimate the MRP of star college ath- letes. However, as already discussed, there are issues with the validity of the instruments used in the literature, particularly with satisfying the exclusion restriction and the problem with weak instruments. Although there is no guarantee that using injuries will provide a better instrument, I use the number of injured star players last season as an instrument for the number of star players in the current season.90 This instrument should satisfy the exclusion restriction required for the instrumental variables estimator as it is plausible to assume the number of injured stars last year should only affect the team’s current revenues through its effect on the number of star players in in the current year.
I use this instrument to exactly identify the number of star players (Stari,t) in an instru-
mental variables estimation of star player MRP in Equation (2.1). All the control variables used in the fixed effects analysis mentioned in Section 2.4 are included as well as team, conference and year fixed effects. The instrumental variable regression results for football are reported in Table 30. Immediately we see that none of the estimates are statistically significant with the exception of the All-American estimates, which are significant at the 10% level. If one believes that the instrumental variables estimates captures the “true ef- fect” in a way that the OLS fixed effects estimates do not, then one might be tempted to conclude that star football players do not generate revenues for their teams. However, I would argue that these results should not be taken very seriously for two reasons.
First, the instrument is, for the most part, very weak since almost every F-statistic from the first stage regressions (reported at the bottom of the table) are less than ten.91 Further-
90In the remainder of the paper, when I refer to injuries, I am referring to injuries and suspensions.
91
more, even though the F-statistics for the instrument in the case of All-Americans appears quite strong, in this context, it might be misleading. The reason is that since there are so few injured stars, the instrument is going to contain mostly zeros. Likewise, there are very few star players so the variable Starsi,t will also contain mostly zeros. Therefore, the
instrument might be “highly predictive” overall because it correctly “predicts” when there are no star players even though it might almost never predict the number of star players for non-zero values of injured players and star players.
Second, since any omitted variables are likely to be positively correlated with both the number of star players and a team’s revenues, we would expect the instrumental variables estimates to belower than the OLS estimates reported in Table 13. Even if we think that the instrument is valid for All-Americans, the MRP estimate using instrumental variables is
larger than the MRP estimate for All-Americans using OLS. For the instrumental variable
estimator ˆβIV, one can show that in the univariate caseplimβˆIV =β+corr(z,u)corr(z,x)σ(u)σ(x) where z is the instrument for endogenousx and uis the error term in the regression. Recall that there are very few star players, which means that Starsi,t contains mostly zeros. Hence,
there is not a lot of variation in the endogenous variable, which will bias ˆβIV up as σ(x) is
very small.
For completeness, the instrumental variable results for basketball are reported in Table 31 for the only two measures of star player that had any injuries.92 Recall there was only one injured star player in the top 10 and 20 points scorers, and in the former case, the instrument happens to be collinear with some of the control variables in the regression, which is why there is no estimate for this measure. The instrumental variable estimates for basketball should not be taken seriously for the same reasons that the football estimates are suspect. Overall, the instrumental variables approach does not seem feasible given the
92
limited number of injuries and suspensions in the data. It is likely that we would need to observe many more seasons of player data to get enough star players and injuries or suspensions if we are to reliably identify the true effect when using them as instrumental variables.
2.8.2 Difference In Difference
Although the instrumental variables approach was not entirely feasible, I can use the loss of a star player due to injury or suspension to estimate star player MRP in a generalized Differences-In-Differences framework for multiple events. Consider the following model for
a single event
yi,t=β(di×pt) +αi+δt+i,t
where yi,t are revenues for team i in year t, di is an indicator for team ibeing treated in
that particular event andptis an indicator for treatment having occurred by periodt. Also
included are team (αi) and year (δt) fixed effects. Losing a star player due to injury from
one season to the next is one event and this event can affect different teams at different times. Following? I create a sample for each event and stack these samples, identifying each separate season (two adjacent years) as a cohort. This stacked data can then be represented by the following model
yict =βdict+αic+δtc+ict (2.5)
where c denotes the cohort and dict, the interaction between treatment and the post-
treatment period, withβ the difference-in-difference estimator. The fixed effectαiccontrols
for the treatment within each cohort whileδtccontrols for post-treatment within each cohort.
must be random, which seems plausible in this context. However, the additional parallel trends assumption needed for identification is that absent treatment, the change in team revenues for a team that loses a star due to injury would not have been different than the change in revenues for the teams that do not lose a star due to injury. The advantage of the multiple events framework is that it allows different teams to be treated at different points in time and the more events we have, the harder it is to argue that a particular set of treated teams is driving the result. That is, we would need to come up with a compelling story as to why the parallel trends assumption is violated for each unique event.
I will define a team as being treated in two ways. First, if the team had exactly one star player in year t, lost that player due to injury in t+ 1 and had zero star players in year
t+ 1. The second definition is identical to the first, except the treated team can have zero or one star players in yeart+ 1, that is, I allow the star to be replaced by exactly one other star player. The control group of teams are those that had exactly one star player in year
t, did not loose a star player due to injury, and had exactly one star player in year t+ 1. This definition of treatment and control groups is very restrictive. The consequence of this restriction is that only three measures of star football players (TDYds, PERTDsYDs, and TopPER) and two measures of star basketball players (Top 10 and 20 points scorers) have any treated teams in the sample. Furthermore, the number of treatment events is quite small, which attenuates some of the strengths of this multiple event approach. However, even though these definitions are restrictive, they help the identification strategy in two way. First, the parallel trends assumption is more likely to hold as I have defined treatment and control teams. The reason is that, as currently defined, the difference-in-difference regression will only be comparing teams that had exactly one star player in a season. Teams that have a lot of star players are likely much different than teams that have very few star players, particularly in terms of revenues. Hence, it is more plausible that the trends in revenues before and after treatment between the treated and untreated teams are similar in making this restriction since I will not be comparing teams that have a lot of star players with
those that have few. Second, these definitions of treatment and control provide the cleanest possible identification for the impact on revenues caused by the loss of a star player and will not be confounded by the team having a lot of other star players, gaining star players, or loosing multiple star players in a year.
Gaining Versus Losing a Star Player
Since the differences-in-differences estimator is identified in a completely different way than the OLS estimates using fixed effects, it might be useful to compare the estimates of a star player’s MRP under each method. However, the two methods produce MRP estimates with slightly different interpretations. The differences-in-differences method will estimate the team revenuelost when a team loses a star player due to injury while the OLS method using fixed effects or first-differences assumes gaining and losing a star player is symmetric in terms of its effect on revenues. Therefore, if we want to compare the estimates from using these two methods, it will be useful to know if there is any difference in the impact on team revenues between gaining and losing a star player under the OLS framework. To answer this, consider the following model
∆yi,t =η1∆Stari,t×1{∆≥0}+η2|∆Stari,t| ×1{∆<0}+ ∆Xi,tγ+θi+ ∆δt+ ∆δc+ ∆i,t
(2.6)
which is identical to the first-difference model of Equation (2.3) with the exception that the change in the number of star players (∆Stari,t) is decomposed into its non-negative and
negative domains with an indicator function.93
The estimation results of Equation (2.6) for football are reported in Table 32 along with
93Note that the indicator function splits the number of star players into its non-negative and negative
domains using the discrete measure of star player rather than the continuous measure. Also, the absolute value is taken over the negative domain to help improve the interpretation of the coefficient estimates in Tables 32 and 33.
F-statistics for the null hypothesis that the MRPs of gaining and losing a star player are the same in absolute value. For all star player measures aside from Heisman finalists, the MRP of gaining versus losing a star player are of similar magnitude (in absolute value) and the F-statistics imply that they are not statistically different from each other. The asymmetry in the MRP estimates for Heisman finalists is interesting, particularly because the MRP of losing a Heisman finalist is not negative (though statistically insignificant). This makes sense if gaining a Heisman finalist is extremely valuable for the team and if these players tend to have an affect on revenues after they leave the school. This idea is supported by the fact that lagged Heisman finalists have large MRP estimates that are highly statistically significant in the previous first-difference and fixed effects regressions. Table 33 reports the estimation results of Equation (2.6) for basketball. For all eight star player measures, the MRP estimates for gaining and losing a star are not statistically different from each other. Since these results indicate that the impact on revenues of gaining and losing a star football or basketball player is symmetric, we can reasonably compare the differences-in-differences MRP estimates in the next section with our previous OLS estimates for star player MRP.94
Difference In Difference Results
The differences-in-differences estimates for losing a star football player due to injury from Equation (2.5) are reported in Table 34.95 Standard errors are clustered by cohort-athletic
conference because teams more often play teams in their own conference and if one team
94
With the exception of Hesiman finalists, however, this measure is not included in the differences-in- differences analysis since there were no treated teams in the sample under this definition of star football player.
95The differences-in-differences regressions were run without additional control variables, which allows
me to use a slightly larger sample from 2000-2012. In theory, if the treatment is truly random, adding controls would only increase the precision of the estimates and should not change the point estimates. In unreported results, adding additional controls does tend to change the estimates, though not drastically so in most cases. Although the treatment is plausibly random, the likely reason for this is that there are so few observations that adding several additional controls makes it more difficult for the regressions to esimate the coefficients of interest precisely. Therefore, the limited statistical power in these regressions makes it difficult to distinguish between the point estimates changing due to concerns over the random treatment assumption versus the demands on the estimates imposed by additional controls. The results reported in Tables 34 and 35 with controls included are available upon request.
has a star player injured, it will likely enhance the prospects of its competitors, which could lead to them winning more often and generating higher revenues. Panel A of the table reports the results under the definition of treatment that restricts treated teams to having zero star players in yeart+ 1, while Panel B contains the results from the more permissive definition of treatment allowing for zero or one star players in year t+ 1.
The first thing to notice that these regressions contain very few observations leading to