4 Apoyo Psicológico, una definición desde un paradigma de campo.
4.2. Apoyo psicológico desde el paradigma de la Terapia Gestalt.
4.2.1. Conceptos, definiciones y principios.
4.2.1.3 Proceso de contacto.
we could useW2=
0.9×fiberz+0.1×potassiumz
. However, the model perfor- mance would still be slightly below that of using fiber alone. Other alternatives include performing principal components analysis or simply omitting the variable potassium.
Now, depending on the task confronting the analyst, multicollinearity may not in fact present a fatal defect. Weiss [5] notes that multicollinearity “does not adversely affect the ability of the sample regression equation to predict the response variable.” He adds that multicollinearity does not significantly affect point estimates of the target variable, confidence intervals for the mean response value, or prediction intervals for a randomly selected response value. However, the data miner must therefore strictly limit the use of a multicollinear model to estimation and prediction of the target variable. Interpretation of the model would not be appropriate, since the individual coefficients may not make sense in the presence of multicollinearity.
VARIABLE SELECTION METHODS
To assist the data analyst in determining which variables should be included in a multiple regression model, several differentvariable selection methodshave been developed, including (1) forward selection, (2) backward elimination, (3) stepwise selection, and (4) best subsets. These variable selection methods are essentially algo- rithms to help construct the model with the optimal set of predictors.
PartialF-Test
To discuss variable selection methods, we first need to learn about the partial F-test. Suppose that we already have p variables in the model,x1,x2, . . . ,xp and we are interested in whether or not one extra variable x∗ should be included in the model. Recall our earlier discussion of the sequential sums of squares. Here we would calculate the extra (sequential) sum of squares from adding x∗ to the model given thatx1, x2, . . . ,xp are already in the model. Denote this quantity by SSextra=SS(x∗|x1,x2, . . . ,xp). Now, this extra sum of squares is computed by find- ing the regression sum of squares for the full model (includingx1,x2, . . . ,xp and x∗), denoted SSfull=SS(x1,x2, . . . ,xp,x∗), and subtracting the regression sum of squares from the reduced model (including onlyx1, x2, . . . ,xp), denoted SSreduced= SS(x1,x2, . . . ,xp). In other words,
SSextra=SSfull−SSreduced that is,
S S(x∗|x1, x2, . . . ,xp)=S S(x1, x2, . . . ,xp,x∗)−S S(x1,x2, . . . ,xp) The null hypothesis for the partialF-test is as follows:
r H
0: No, the SSextra associated with x∗ does not contribute significantly to the regression sum of squares for a model already containingx1,x2, . . . ,xp. Therefore, do not includex∗in the model.
SPH SPH
JWDD006-03 JWDD006-Larose November 25, 2005 17:26 Char Count= 0
124 CHAPTER 3 MULTIPLE REGRESSION AND MODEL BUILDING
The alternative hypothesis is:
r Ha: Yes, the SSextraassociated withx∗does contribute significantly to the re-
gression sum of squares for a model already containingx1, x2, . . . ,xp. There- fore, do includex∗in the model.
The test statistic for the partialF-test is
F(x∗|x1,x2, . . . ,xp)= SSextra MSEfull
where MSEfulldenotes the mean-squared error term from the full model, including x1, x2, . . . ,xp andx∗. This is known as thepartial F-statistic for x∗. When the null hypothesis is true, this test statistic follows anF1,n−p−2-distribution. We would therefore reject the null hypothesis whenF(x∗|x1, x2, . . . ,xp) is large or when its associated p-value is small.
An alternative to the partialF-test is thet-test. Now anF-test with 1 andn− p−2 degrees of freedom is equivalent to at-test withn−p−2 degrees of freedom. This is due to the distributional relationship thatF1,n−p−2=
tn−p−2
2
.Thus, either theF-test or thet-test may be performed. Similar to our treatment of thet-test earlier in the chapter, the hypotheses are given by
r H0: β∗=0
r Ha: β∗=0
The associated models are:
r UnderH0: y=β0+β1x1+ · · · +βpxp+ε r UnderHa: y=β0+β1x1+ · · · +βpxp+β∗x∗+ε
Under the null hypothesis, the test statistict =b∗/sb∗ follows at-distribution with n−p−2 degrees of freedom. Reject the null hypothesis when the two-tailedp-value, P(|t|>tobs),is small.
Finally, we need to discuss the difference between sequential sums of squares and partial sums of squares. The sequential sums of squares are as described earlier in the chapter. As each variable is entered into the model, the sequential sum of squares represents the additional unique variability in the response explained by that variable, after the variability accounted for by variables entered earlier in the model has been extracted. That is, theorderingof the entry of the variables into the model is germane to the sequential sums of squares.
On the other hand, ordering is not relevant to the partial sums of squares. For a particular variable, the partial sum of squares represents the additional unique variability in the response explained by that variable after the variability accounted for by all the other variables in the model has been extracted. Table 3.14 shows the difference between sequential and partial sums of squares, for a model with four predictors,x1,x2,x3, x4.
VARIABLE SELECTION METHODS 125
TABLE 3.14 Difference Between Sequential and Partial SS
Variable Sequential SS Partial SS
x1 SS (x1) SS (x1|x2,x3,x4)
x2 SS(x2|x1) SS(x2|x1,x3,x4)
x3 SS(x3|x1,x2) SS(x3|x1,x2,x4)
x4 SS(x4|x1,x2,x3) SS(x4|x1,x2,x3)