When heteroscedasticity is detected, two different approaches can be taken to make the terms approximately equal: the weighted least squares approach and the variance stabilizing transformations approach, which transforms the dependent variable in a way that removes heteroscedasticity. This second approach is applicable when the variance of is a function of its mean (see Appendix D for more information on variable transformations).
LN Ii2 F FLN Xi ui 0
!
ui Ii F ! 0 LN( )Ii2 F W2 Wi cxi 2! 2 SSE SSE low variance high variance }F¨ª©E;n d 2k n d, k¸º¹ 2 2 2 Wi 2 Yi
The weighted least squares (WLS) approach to addressing heteroscedas- ticity in a regression model is a common procedure. Suppose, for example, that , where the are known constants. Such heterosce- dasticity models, in which the variance of the disturbances is assumed to be proportional to one of the regressors raised to a power, are extensively reviewed by Geary (1966), Goldfeld and Quandt (1972), Kmenta (1971), Lancaster (1968), Park (1966), and Harvey (1976). Constancy of variance is achieved by dividing both sides of the regression model
(4.11) by , yielding
. (4.12) Each is called a weight and is the result of minimizing the weighted sum of squares,
. (4.13)
When the , or a proportional quantity, are known, the weights are not difficult to compute. Estimates for the of Equation 4.12 are obtained by minimizing Equation 4.13 and are called the WLS estimates because each Y and X observation is weighted (divided) by its own (heteroscedastic) stan- dard deviation, .
The variance stabilizing transformations approach has the following basis: for any function f(Y) of Y with a continuous first derivative , a finite second derivative , and , it follows that
(4.14)
where lies between and . Squaring both sides of Equation 4.14 and with small yields
, (4.15)
where is the variance of with mean . To find a suitable trans- formation f of that would make approximately a constant, the following equation needs to be solved:
(4.16) VAR(Ii)!Wi2 !ciW2 c i Yi!F F0 1Xi1 ... FkXikIi, i!1,...,n ci Y ci i!F0 ciF1Xi1 ci ... FkXik ciIi ci, i!1,...,n [i!
ci 2 [iYi F FXi FkXik
§
0 1 1 2 ... Wi 2 F Wi df Y dd
f Y Qi !E Y
i f Y
i f
Qi !YiQi
fd
Qi 1YiQi
fdd
. 2 2 . Yi Qi Yi i Q
2 VAR f Y
i } d
f
Qi 2W Qi2
i W Qi i 2
Yi Qi Yi VAR f Y
i d
!
f Qi c W Qi i
where c is a constant. This transformation f is called a variance stabilizing transformation. For example, if , then Equation 4.16 yields . Frequently used transformations include the square root transformation when the error variance is proportional to an independent variable and (1/Xi) when the error variance is proportional to .
Example 4.1
A study of the effects of operating subsidies on transit system perfor- mance was undertaken for transit systems in the State of Indiana (see Karlaftis and McCarthy, 1998, for additional details and a review of the empirical literature). Data were collected (see Table 4.1 for available variables) for 17 transit systems over a 2-year period (1993 and 1994). Although these are panel data (they contain time-series data for a cross section of transit systems), they are of very “short” duration (only 2 years of data exist) and will be analyzed as if they were cross sectional. A brief overview of the data set suggests the possible existence of heteroscedas- ticity because systems of vastly different sizes and operating character- istics are pooled together. To determine significant factors that affect performance (performance is measured with the use of performance indicators), a linear regression model was estimated for each of these
indicators. The estimation results are shown in Tables 4.2 through 4.4.
For each of these indicators both plots of the disturbances and the Gold- feld–Quandt test (at the 5% significance level) indicate the existence of heteroscedasticity. As a result, a series of WLS estimations were under- taken and the results appear in the same tables.
In the absence of a valid hypothesis regarding the variable(s) causing heteroscedasticity, a “wild goose chase” might ensue, as in this example. The most striking result, which applies to all three estimations and for all weights, is that the slope parameters ( ) are (more) significant in
TABLE 4.1
Variables Available for the Analysis
Variable
Name Explanation
SYSTEM Transit system name
YEAR Year when data were collected
OE_RVM Operating expenses (U.S. dollars) per revenue vehicle mile
PAX_RVM Passengers per revenue vehicle mile
OE_PAX Operating expenses per passenger in U.S. dollars
POP Transit system catchment area population
VEH Number of vehicles available
EMP Total number of employees
FUEL Total annual gallons of fuel
C_LOC Total subsidies from local sources in U.S. dollars
C_STAT Total subsidies from state sources in U.S. dollars
C_FED Total subsidies from the federal government in U.S. dollars Wi !Qi
f
Qi !LN
Qi
Xi2
TABLE 4.2
Regression of Operating Expenses per Revenue Vehicle Mile on Selected Independent Variables (t statistics in parentheses)
Explanatory
Variable OLS WLS1 WLS2 WLS3
Constant 1.572 (8.554) 1.579 (11.597) 1.578 (9.728) 1.718 (11.221)
POP –2.168E-06 (0.535) 8.476E-07 (0.562) –9.407E-07 (–0.582) –4.698E-07 (–0.319)
VEH –8.762E-04 (0.025) 0.031 (2.242) 0.012 (1.191) 0.003 (0.466)
EMP 3.103E-03 (0.198) –0.011 (1.814) 0.003 (0.603) 0.001 (0.308)
FUEL –1.474E-05 (–3.570) –1.443E-05 (–9.173) –1.334E-05 (–7.741) –1.367E-05 (–8.501)
C_LOC 1.446E-06 (1.075) 1.846E-06 (4.818) 1.350E-06 (3.965) 1.320E-06 (5.052)
C_STAT 3.897E-06 (1.305) 1.352E-06 (1.390) 2.647E-06 (3.050) 2.961E-06 (4.183)
C_FED 2.998E-06 (1.990) 3.794E-06 (6.777) 3.221E-06 (5.799) 2.938E-06 (6.958)
R2 0.555 0.868 0.917 0.942
Note: WLS1weighted least squares (weight: FUEL1.1); WLS2weighted least squares (weight: VEH1.3); WLS3weighted least squares (weight: EMP1.6).
TABLE 4.3
Regression of Passengers per Revenue Vehicle Mile on Selected Independent Variables (t statistics in parentheses)
Explanatory
Variable OLS WLS1 WLS2
Constant 0.736 (7.888) 0.771 (9.901) 0.756 (11.689)
POP –3.857E-06 (–1.877) –1.923E-06 (–2.583) –1.931E-06 (–3.729)
VEH 5.318E-03 (0.299) 0.023 (3.336) 0.025 (13.193)
EMP –1.578E-03 (–0.198) –0.009 (–3.162) –0.010 (–13.026)
FUEL –2.698E-07 (–0.129) –7.069E-07 (–1.120) –4.368E-07 (–1.041)
C_LOC –3.604E-07 (–0.528) 4.982E-08 (0.341) 5.753E-08 (0.807)
C_STAT 1.164E-06 (0.768) –2.740E-07 (–0.620) –4.368E-07 (–2.089)
C_FED 1.276E-06 (1.669) 1.691E-06 (5.906) 1.726E-06 (11.941)
R2 0.558 0.859 0.982
Note: WLS1 weighted least squares (weight: POP2.1);WLS2weighted least squares (weight: VEH2.1).
TABLE 4.4
Regression of Operating Expenses per Passenger on Selected Independent Variables (t statistics in parentheses)
Explanatory
Variable OLS WLS1
Constant 2.362 (12.899) 1.965 (17.144)
POP 5.305E-06 (1.315) 3.532E-06 (3.123)
VEH –1.480E-02 (–0.425) –0.007 (–0.677)
EMP 7.339E-03 (0.470) 0.005 (1.018)
FUEL –1.015E-05 (–2.467) –1.028E-05 (–10.620)
C_LOC 1.750E-06 (1.307) 1.273E-06 (5.700)
C_STAT 5.460E-07 (0.183) 1.335E-06 (1.982)
C_FED 4.323E-08 (0.029) 7.767E-07 (1.794)
R2 0.460 0.894
the WLS estimation than in the OLS estimation. As previously noted, in the presence of heteroscedasticity OLS estimators of standard errors are biased and one cannot foretell the direction of this bias. In the present case study the bias is upward, implying that it overestimates the standard error. One would have to accept the WLS estimates as more trustworthy because they have explicitly accounted for the het- eroscedasticity problem.
From a policy perspective, the OLS results suggest the absence of a direct effect of operating subsidies on performance, whereas the WLS results suggest a generally degrading effect of subsidies on performance (with the exception of state-level subsidies on passengers per revenue vehicle miles). As a final point, the WLS estimation does not completely alleviate heteroscedasticity. As such, it may be appropriate to use variance stabi- lizing transformations to account fully for this problem.