La Legislatura de la Provincia del Neuquén Declara:

The least squares estimators are less useful for data sets with severe heteroscedas- ticity. One strategy is to use a variation of least squares estimation by weighting observations. The idea is that, when minimizing the sum of squared errors using heteroscedastic data, the expected variability of some observations is smaller than others. Intuitively, it seems reasonable that the smaller the variability of the response, the more reliable that response and the greater weight that it should receive in the minimization procedure. Weighted least squares is a technique that accounts for this “varying variability.”

Specifically, we use Section 3.2.3 assumptions E1, E2 and E4, with E3 replaced by E εi = 0 and Var εi = σ2/wi, so that the variability is proportional to a known

weight wi. For example, if unit of analysis i represents a geographical entity such

as a state, you might use the number of people in the state as a weight. Or if i represents a firm, you might use firm assets for the weighting variable. Larger values of wi indicate a more precise response variable through the smaller vari-

ability. In actuarial applications, weights are used to account for an exposure such as the amount of insurance premium, number of employees, size of the payroll, number of insured vehicles and so forth (further discussion is in Chapter 18).

This model can be readily converted to the “ordinary”least squares problem by multiplying all regression variables by √wi.That is, if we define yi∗= yi× √wi

and x_ij∗ = xij× √wi, then from assumption E1 we have

y_i∗= yi×√wi = (β0xi0+ β1xi1+ · · · + βkxik+ εi)√wi

= β0x_i∗0+ β1x_i∗1+ · · · + βkxik∗ + ε∗i,

where ε∗

i = εi× √wi has homoscedastic variance σ2. Thus, with the rescaled

variables, all inference can proceed as earlier.

This work has been automated in statistical packages where the user merely specifies the weights wiand the package does the rest. In terms of matrix algebra,

this procedure can be accomplished by defining an n× n weight matrix W = diag(wi) so that the ith diagonal element of W is wi. Extending equation (3.14),

for example, the weighted least squares estimates can be expressed as

bW LS =

X WX−1X Wy. (5.13)

Additional discussions of weighted least squares estimation will be presented in Section 15.1.1.

5.7.4 Transformations

Another approach that handles severe heteroscedasticity, introduced in Sec- tion 1.3, is to transform the dependent variable, typically with a logarithmic

5.8 Further Reading and References 179

transformation of the form y∗_{= ln y. As we saw in Section 1.3, transformations} can serve to “shrink”spread-out data and symmetrize a distribution. Through a change of scale, a transformation also changes the variability, potentially alter- ing a heteroscedastic dataset into a homoscedastic one. This is both a strength and limitation of the transformation approach –a transformation simultaneously

affects both the distribution and the heteroscedasticity. The transformation of the dependent variable affects both the skewness of the distribution and the heteroscedasticity. Power transformations, such as the logarithmic transform, are most useful

when the variability of the data grows with the mean. In this case, the transform will serve to “shrink”the data to a scale that appears to be homoscedastic. Con- versely, because transformations are monotonic functions, they will not help with patterns of variability that are nonmonotonic. Further, if your data is reasonably symmetric but heteroscedastic, a transformation will not be useful because any choice that mitigates the heteroscedasticity will skew the distribution.

When data are nonpositive, it is common to add a constant to each observation so that all observations are positive prior to transformation. For example, the transform ln(1+ y) accommodates the presence of zeros. One can also multiply

by a constant so that the approximate original units are retained. For example, the transform 100 ln(1+ y/100) may be applied to percentage data, where negative

percentages sometimes appear.

Our discussions of transformations have focussed on transforming dependent variables. As noted in Section 3.5, transformations of explanatory variables are also possible. This is because the regression assumptions condition on explanatory variables (Section 3.2.3). Some analysts prefer to transform variables to approximate normality, thinking of multivariate normal distributions as a foundation for regression analysis. Others are reluctant to transform explanatory variables because of the difficulties in interpreting resulting models. The approach taken here is to use transforms that are readily interpretable, such as those introduced in Section 3.5. Other transforms are certainly candidates to include in a selected model but they should provide substantial dividends in terms of fit or predictive power if they are difficult to communicate.

5.8 Further Reading and References

Long and Ervin (2000) gather compelling evidence for the use of alternative heteroscedasticity-consistent estimators of standard errors that have better finite sample performance than the classic versions. The large sample properties of empirical estimators have been established by Eicker (1967), Huber (1967), and White (1980) in the linear regression case. For the linear regression case, MacKinnon and White (1985) suggest alternatives that provide superior small- sample properties. For small samples, the evidence is based on (1) the biasedness of the estimators, (2) their motivation as jackknife estimators and (3) their performance in simulation studies.

Other measures of collinearity based on matrix algebra concepts involving eigenvalues, such as condition numbers and condition indices, are used by some analysts. See Belseley, Kuh, and Welsch (1980) for a solid treatment of collinearity and regression diagnostics. Hocking (2003) provides additional background

reading on collinearity and principal components. See Carroll and Ruppert (1988) for further discussions of transformations in regression.

Hastie, Tibshirani, and Friedman (2001) give an advanced discussion of model selection issues, focusing on predictive aspects of models in the language of machine learning.

Chapter References

Belseley, David A., Edwin Kuh, and Roy E. Welsch (1980). Regression Diagnostics: Identifying

Influential Data and Sources of Collinearity. Wiley, New York.

Bendel, R. B., and A. A. Afifi (1977). Comparison of stopping rules in forward “stepwise” regression. Journal of the American Statistical Association 72, 46–53.

Box, George E. P. (1980). Sampling and Bayes inference in scientific modeling and robustness (with discussion). Journal of the Royal Statistical Society, Ser. A, 143, 383–430. Breusch, T. S., and A. R. Pagan (1980). The Lagrange multiplier test and its applications to

model specification in econometrics. Review of Economic Studies, 47, 239–53.

Carroll, Raymond J., and David Ruppert (1988). Transformation and Weighting in Regression, Chapman-Hall, New York.

Eicker, F. (1967), Limit theorems for regressions with unequal and dependent errors. Proceed-

ings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, L. M. LeCam and J. Neyman, eds. University of California Press Berkeley, CA, 1:59–82. Hadi, A. S. (1988). Diagnosing collinearity-influential observations. Computational Statistics

and Data Analysis7, 143–59.

Hastie, Trevor, Robert Tibshirani, and Jerome Friedman (2001). The Elements of Statistical

Learning: Data Mining, Inference and Prediction.Springer-Verlag, New York.

Hocking, Ronald R. (2003). Methods and Applications of Linear Models: Regression and the

Analysis of Variance. Wiley, New York.

Huber, P. J. (1967). The behaviour of maximum likelihood estimators under non-standard conditions. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and

Probability, L. M. LeCam and J. Neyman, eds. University of California Press Berkeley, CA, 1:221–33.

Long, J. S., and L. H. Ervin (2000). Using heteroscedasticity consistent standard errors in the linear regression model. American Statistician 54, 217–24.

MacKinnon, J. G., and H. White (1985). Some heteroskedasticity consistent covariance matrix estimators with improved finite sample properties. Journal of Econometrics 29, 53–7. Mason, R. L., and R. F. Gunst (1985). Outlier-induced collinearities. Technometrics 27, 401–7. Picard, R. R., and K. N. Berk (1990). Data splitting. American Statistician 44, 140–47. Rencher, A. C., and F. C. Pun (1980). Inflation of R2_{in best subset regression. Technometrics}

22, 49–53.

Snee, R. D. (1977). Validation of regression models. Methods and examples. Technometrics 19, 415–28.

5.9 Exercises

5.1. You are doing regression with one explanatory variable and so consider the basic linear regression model yi = β0+ β1xi+ εi.

a. Show that the ith leverage can be simplified to

hii = 1 n+ (xi− x)2 (n− 1)s2 x .

In document HONORABLE LEGISLATURA PROVINCIAL PROVINCIA DEL NEUQUÉN DIARIO DE SESIONES. XL PERÍODO LEGISLATIVO 16a. SESIÓN ORDINARIA REUNIÓN Nº 20 (página 83-93)