5. Desarrollo
5.4. Implementación
5.4.2. Paquete Gameobject
¯. Furthermore the variance of ˆy∗ from the full model is not less than the variance of ˆy from the subset model. In terms of mean square error
V (ˆy∗) ≥ M SE(ˆy)
¯r is positive semidefinite.
4.3 Criteria for Evaluating Subset Regression Mod-els
Two key aspects of the variable selection problem are generating the subset models and deciding if one subset is better than another. In this section we discuss criteria for evaluating and comparing subset regression models.
4.3.1 Coefficient of Multiple Determination
A measure of the adequacy of a regression model that has been widely used is the coefficient of multiple determination, R2. Let R2p denote the coefficient of multiple determination for a subset regression model with p terms, that is, p − 1 regressors and an intercept term β0. Computationally
R2p = SSR(p)
Syy = 1 − SSE(p)
Syy (4.11)
where SSR(p) and SSE(p) denote the regression sum of squares and the residual sum of squares, respectively, for a p-term subset model. There are
K
p − 1
values of R2p for each value of p, one for each possible subset model of size p. Now R2p increases as p increases and is a maximum when p = K + 1. Therefore the analyst uses this criterion by adding regressors to the model up to the point where an additional variable is not useful in that it provides only a small increase in R2p. The general approach is illustrated in Figure 4.1, which represents a hypothetical plot of the maximum value of R2p for each subset of size p against p. Typically one examines a display such as this and then specifies the number of regressors for the final model as the point at which the ”knee”
in the curve becomes apparent.
Since we cannot find an ”optimum” value of R2 for subset regression model, we must look for a ”satisfactory” value. Aitkin [1974] has proposed one solution to this problem by providing a test by which all subset regression models that have an R2 not significantly different from the R2 for the full model can be identified. let
R20 = 1 − (1 − R2K+1)(1 + da,n,K) (4.12)
Figure 4.1: Plot of R2p against p
where
da,n,K = KFa,n,n−K−1 n − K − 1
and R2K+1 is the value of R2 for the full model. Aitkin calls any subset of regressor variables producing an R2 greater than R20 an R2-adequate (α) subset.
Generally it is not straightforward to use R2 as an criterion for choosing the number of regressor to include in the model. However, for a fixed number of variables p can be used to compare the
K
p − 1
subset models so generated. Models having large values of R2p are preferred.
4.3.2 Adjusted R
2To avoid difficulties of interpreting R2, some analysts prefer to use adjusted R2 statistic, defined for a P -term equation as
R¯2p = 1 − n − 1 n − p
(1 − R2p) (4.13)
The ¯R2p does not necessarily increase as additional regressors are introduced into the model. Infact Edward[1969], Haitovski[1969], and Seber [1977] showed that if s regressors are added to the model, ¯R2p+s will exceed ¯R2p iff the partial F -statistic for testing the significance of s additional regressors exceeds 1. Therefore optimum subset model can be chosen with maximum ¯R2p.
4.3.3 Residual Mean Square
The residual mean square for a subset regression model with p variables, M SE(p) = SSE(p)
n − p (4.14)
can be used as a model evaluation criterion. The general behavior of M SE(p) as p increases as in Figure 4.2. Because SSE(p) always decreases as p increases, M SE(p)
Figure 4.2: Plot of M SE(p) against p
initially decreases, then stabilizes, and eventually may increases. The eventual increase in M SE(p) occurs when the reduction in SSE(p) from adding a regressor to the model is not sufficient to compensate for the loss of one degree of freedom in the denominator of (4.14). That is, adding a regressor to a p-term model will cause M SE(p + 1) to be greater than M SE(p). Advocates of the M SE(p) criterion will plot M SE(p) against and base the choice of p on
1. the minimum M SE(p),
2. the value of p such that M SE(p) is approximately equal to M SE for the full model, or
3. a value of p near the point where the smallest M SE(p) turns upward.
The subset regression model that minimizes M SE(p) will also maximize ¯R2p. To see this, note that
R¯2p = 1 − n − 1
n − p(1 − Rp2)
= 1 − n − 1 n − p
SSE(p) Syy
= 1 − n − 1 Syy
SSE(p) n − p
= 1 − n − 1
Syy M SE(p)
Thus the criteria minimum M SE(p) and maximum ¯R2p are equivalent.
4.3.4 Mallows’ C
p-Statistics
Mallows [1964, 1966, 1973] has proposed a criterion that is related to the mean square error of a fitted value, that is,
E [ˆyi− E(yi)]2 = [E(yi) − E(ˆyi)]2+ V (ˆyi) (4.15) where E(yi) and E(ˆyi) are the expected responses from the true regression model and p-term subset model, respectively. Thus E(yi) − E(ˆyi) is the bias at the i-th data point.
Consequently the two terms on the right-hand side of (4.15) are the squared bias and variance components, respectively, of the mean square error. Let the total squared bias for a p-term equation be
SSB(p) =
n
X
i=1
[E(yi) − E(ˆyi)]2
and define the standardized total total mean square error as
Γp = 1 σ2
( n X
i=1
[E(yi) − E(ˆyi)]2+
n
X
i=1
V (ˆyi) )
= SSB(p) σ2 + 1
σ2
n
X
i=1
V (ˆyi) (4.16)
It can be shown that
n
X
i=1
V (ˆyi) = pσ2
and that the expected value of the residual sum of squares from a p-term equation is E [SSE(p)] = SSB(p) + (n − p)σ2
Substituting for Pn
i=1V (ˆyi) and SSB(p) in (4.15) gives Γp = 1
σ2 E [SSE(p)] − (n − p)σ2+ pσ2
= E[SSE(p)]
σ2 − n + 2p (4.17)
Suppose that ˆσ2 is a good estimate of σ2. Then replacing E[SSE(p)] by the observed value SSE(p) produces an estimate of Γp, say
Cp = SSE(p) ˆ
σ2 − n − 2p (4.18)
If the p-term model has negligible bias, then SSB(p) = 0. Consequently E[SSE(p)] = (n − p)σ2, and
E [Cp|Bias = 0] = (n − p)σ2
σ2 − n + 2p = p
When using the Cp criterion, it is helpful to construct a plot of Cp as a function of p for
Figure 4.3: Plot of Cp against p
each regression equation, such as shown in Figure 4.3. Regression equation with little bias will have values of Cp that fall near the line Cp = p (point A in Figure 4.3) while those equations with substantial bias will fall above this line (point B in Figure 4.3).
Generally small values of Cp are desirable. For example, although point C in Figure 4.3 is above the line Cp = p, it is below point A and thus represents a model with lower total error. It may be preferable to accept some bias in the equation to reduce the average prediction error.
To calculate Cp, need an unbiased estimate of σ2. Generally, we use the residual mean square for the model. It gives Cp = p = K + 1 for the full model. Using M SE(K + 1)
from the full model as an estimate of σ2 assumes that the full model has negligible bias. If the full model has several regressors that do not contribute significantly to the model (zero regression coefficients), then M SE(K + 1) will often overestimate σ2, and consequently the values of Cp will be small. If the Cp statistic is to work properly, a good estimate of σ2 must be used.