• No se han encontrado resultados

Fase 3: Modelado del pliegue

2. RELACIÓN ENTRE RHO1, LA MIOSINA II Y LA APOPTOSIS

In this section we describe the methods reviewed by Darlington (1968). These measures are the simple measures and should be used only when the regressors are uncorrelated. For correlated regressors, these measures have serious drawbacks and so the methods are not considered further after this section, but reviewed here for completeness. The methods are illustrated using a real dataset.

We assume that the response, Y , and regressors X1, . . . , Xp are related through

the regression equation

Y | X = β0+ β1X1+ . . . + βpXp+ , (4.1)

Table 4.1: Correlation matrix of the body fat data Variable BF T ST T C M C BF 1.000 0.843 0.878 0.142 T ST 0.843 1.000 0.924 0.458 T C 0.878 0.924 1.000 0.085 M C 0.142 0.458 0.085 1.000

there are n data, so that the model can be written in matrix form as:

y| X = β01 + Xβ +  (4.2)

where 1 is an n×1 vector of 1’s, y is an n×1 vector of responses, X = (x1, . . . , xp)

is an n × p matrix of known values of X1, . . . , Xp, β is a p × 1 vector of regression

coefficients (whose values are unknown) and  is an n × 1 vector of independent random errors. The coefficient β0 is irrelevant for the regressors’ relative impor-

tance so, to simplify notation, throughout this chapter we assume that Y and X1, . . . , Xp have been centered to have sample means of 0. Then the least squares

estimate of β is bβ = X>X−1X>y and var(bβ) = σ2 X>X−1.

4.2.1

Example data

Data for illustrating the methods was obtained from Neter et al. (1983). There are four measurements collected from 20 healthy women, aged between 25 and 34 years. The measurements were: BF (Body Fat), T ST (Triceps Skinfold Thick- ness), T C (Thigh Circumference), M C (Midarm Circumference). We take BF as the response variable. The correlation matrix of the body fat data is shown in Table 4.1. There is a high correlation between T ST and T C and the correlation structure creates problems in allocating relative importance.

The fitted standardized multiple regression model is:

d

BF = 4.264T ST − 2.929T C − 1.561M C (4.3)

having R2 = 0.801.

4.2.2

Zero-order correlation (validities)

Zero-order correlation is the simplest measure of importance. It measures the de- gree and direction of the linear relationship between a regressor variable and the response variable when all other regressors are ignored in the regression model, so that it is unaffected by the other regressors of the model. In the case of uncor- related predictors, the sum of the squared zero-order correlations is equal to the model R2, and thus can be used for initial rank ordering of the individual con-

tributions of predictor variables to the model. However, for correlated regressors, shared variance in the response variable is added up several times and thus the sum of the squared zero-order correlations is often greater than R2 for the model

with all regressors together (Bi, 2012). This is also clear from the example data:

Variable T ST T C M C ryx2 j 0.711 0.771 0.020

N ote : r2

yxj is the squared zero-order correlation between the response and the jth regressor.

These r2yxj’s are the R2 values of each regressor. The sum of the squared zero-order correlations is 1.502, which is about twice the size of the model R2 of 0.801. In

contrast, the reverse relation can happen if some of the regressors are suppres- sors (Hamilton, 1987). If a regressor has zero or near zero correlation with the response but is correlated with one or more of the regressors then that variable is a suppressor.

4.2.3

Standardized regression coefficients (beta weights)

Standardized regression coefficients are commonly used for evaluating the contri- bution of each predictor variable. Beta weights are easily computed and when the predictor variables are uncorrelated they are simply equal to the zero-order correlations. In such a case, squared beta weights can be used to determine the relative importance of each predictor — the squared beta weights sum to the full model’s R2 so there is no need to calculate a complicated measures in order to

rank predictor variables. However, predictor variables are usually correlated and beta weight for a particular predictor will depend on which other predictor vari- ables are in the model. When a predictor shares the explained variance with one or more predictors in the model (Pedhazur, 1997), then a predictor variable that has a high positive correlation with the response variable may have a near-zero beta weight. Alternatively, a predictor variable with a low (positive) zero-order correlation may have a large positive beta weight. As Darlington (1968) notes, it is possible to have a negative beta weight for a predictor that has a positive zero-order correlation. The following are the squared beta weights determined from the example dataset:

Variable T ST T C M C b

β2j 18.179 8.577 2.438 N ote : bβ2

j is the squared beta weight for the jth predictor.

T ST and T C are approximately equally correlated with BF . However, T ST contributes twice as much as T C in predicting BF and the beta weight ( bβj) for

T C is negative, even though T C has a positive correlation with the criterion. This happens due to the high collinearity between T ST and T C. Thus interpretation

of beta weight is sometimes an invalid measure of the importance of collinear regressors. Moreover, if we add or remove variables from the model, then the sign of the beta weights can change.

4.2.4

Product measures

Hoffman (1960) proposed the product measure (named by Bring, 1996), which is the product of the zero-order correlations with corresponding beta weights. Pratt (1987) justified this measure as a relative importance measure and showed that the sum of the product measures over all predictor variables is equal to the model R2, irrespective of correlated or uncorrelated predictors. A major disadvantage

of this measure is that it may produce a negative importance value for a predic- tor variable even though that predictor contributes substantially to the criterion (Darlington, 1968). Thomas et al. (1998) claimed that the negative value of the product measure can only happen for high multicollinearity. Basically, this mea- sure shares the limitations of both the zero-order correlations and the beta weights (Bring,1996; Darlington, 1968). The example data illustrates this.

Variable T ST T C M C b

βjryxj 3.595 -2.572 -0.222

N ote : bβjryxj is the product of the beta weight for the jth predictor and the corresponding

zero-order correlation.

Since the beta weights for the variables T C and M C are negative while the zero- order correlations are positive, the product measures for these variables are nega- tive. So calculation of percentages of importance is not possible. If one or more of the predictors has a negative product measure value, then the product measures of all variables have no meaningful interpretation. Pratt (1987) mentioned that this

measure would be valid only if both the zero-order correlation and beta weight for a predictor variable have same sign.

4.2.5

Usefulness

The increase (decrease) in R2 from adding (removing) a predictor to (from) a model that already contains all other predictors is referred to as the usefulness of that particular predictor (Darlington,1968). If the regressors are highly correlated, the usefulness of a predictor can exceed the squared zero-order correlation and a predictor with the lowest zero-order correlation can have a higher usefulness than some of the other predictor variables. For correlated predictors, the sum of the usefulness over all predictors is typically far less than the model R2 (Gr¨omping, 2006). The table below gives the increase in R2 from adding each variable for

example dataset:

Variable T ST T C M C Usefulness 0.026 0.015 0.023

Though M C has a lower zero-order correlation than T C, the percentage of im- portance assigned from the usefulness measure to the variable T C is even smaller than that of M C. M C has a usefulness value of 0.023, which is greater than the squared zero-order correlation of 0.020 (see, Subsection 4.2.2). Also, sum of the usefulness of the three predictors is 0.064, which is far less than the model R2 of

0.801.

4.2.6

Engelhart’s measure

variable and also a joint effect to each pair of predictor variables. The sum of the contributions (individual and joint) is equal to the model R2 irrespective of correlated or uncorrelated predictors. Engelhart expressed model R2 by

R2 = bβ12+ . . . + bβp2+ 2 bβ1βb2r12+ . . . + 2 bβp−1βbpr(p−1)p. (4.4) If a regression model has p predictors then it will produce p individual contribu- tions and [p(p − 1)] /2 joint effect terms. So if the number of predictors increases then the total number of joint contributions increases rapidly. For high multi- collinearity, the joint effect can be negative (as with the product measure) and hence have no meaningful interpretation (Darlington, 1968). This is also clear from the example dataset:

Variable T ST T C M C T ST ∗ T C T ST ∗ M C T C ∗ M C Contribution 18.179 8.577 2.438 -23.072 -6.095 0.774

Because of high multicollinearity between T ST and T C, the joint effect of them is negative, even though both of them are highly correlated with the response variable, BF . The sum of the contributions is equal to the model R2 of 0.801. Since some of the joint contributions are negative it is not possible to calculate the percentage contributions.

4.3

Relative importance based on sequential sums