Segunda condición para no requerir de anclaje

Multicollinearity is a situation in which the regressors in a linear regression model are highly correlated with each other. Multicollinearity can be either perfect or imperfect. If multicollinearity is perfect (where the regressors are perfectly inter- related), the regression coefficients of the independent variables are indeterminate, and their standard errors are infinite (Gujarati, 2003). If multicollinearity is less than perfect (where the regressors are imperfectly inter-related), the regression coefficients, though determinate, holds large standard errors (relative to the coefficients themselves). This implies that the coefficients cannot be estimated with great accuracy or precision (ibid).

5.2.4.1. Causes of Multicollinearity

There are numerous sources of multicollinearity. As noted by Montgomery and Peck (1982:289–290), multicollinearity may be caused by the following factors: (1)

inadequacies in the data collection method, such as, sampling over a narrow range of the values assumed by the regressors in the population. (2) Constraints on the specified model or in the population being sampled. Such as, in the regression of electricity consumption on income and household size as dependent variables, there is a physical constraint in the population in the sense that families with lower incomes generally have smaller homes than families with higher incomes. (3) An overdetermined model. This occurs in cases where the model contains more explanatory variables than the number of observations. Another reason for multicollinearity particularly in time series data could be that the regressors comprised in the model share a common trend; which means they all increase or decrease over time. Therefore, in the regression of consumption expenditure on population, income and wealth, the regressors population, income and wealth may all be rising over time at more or less the same rate, which leads to collinearity among these variables.

5.2.4.2. Consequences of Multicollinearity

Estimating a regression in the presence of multicollinearity may be misleading. This is because the standard errors increases in tandem with multicollinearity. The presence of multicollinearity leads to confidence intervals for coefficients being very wide and t-statistics will tend to be very small (Williams, 2015). Coefficients will have to be larger in order to be statistically significant. That means that it will be more difficult to reject the null hypothesis in the presence of multicollinearity. It is important to note however, that large standard errors can be caused by things other than multicollinearity. When there is a high and positive correlation between two independent variables, there will tend to be a high and negative correlation between their slope coefficient estimators. When for example, b1 is greater than β1; b2 will tend to be less than β2. Furthermore, a different sample will likely produce the contrary result. The implication is that if one overestimates the effect of one parameter, one will possibly underestimate the effect of the other. Thus, coefficient estimates tend to be very unstable from one sample to another (Williams, 2015).

5.2.4.3. Detecting Multicollinearity

Multicollinearity can be detected in several ways, according to Gujarati (2003). First, if one observes a high R2_{but few significant t ratios}_{in the regression output. This is}

one of the main symptoms of multicollinearity. If R2_{is high, such as, in excess of} 0.8, the F-test in most cases will reject the hypothesis that the partial slope

coefficients are simultaneously equal to zero. But the individual t-tests will indicate that none or very few of the partial slope coefficients are statistically different from zero. A second way to detect multicollinearity is when one observes high pairwise correlations among regressors. The rule of thumb is that if the pair-wise or zero- order correlation coefficient between two regressors is high, such as in excess of 0.8, then multicollinearity will portend a serious problem (Gujarati, 2003). See correlation matrix of all measurement variables in Table 5.7 in section 5.3. Another useful way to detect multicollinearity is to compute the variance inflation factor (VIF). The larger the value of VIFj, the more “troublesome” or collinear the independent

variable. As a rule of thumb, if the VIFj of a variable exceeds 10, which will happen

if R2_j_{exceeds 0.90, that variable is said be highly collinear (Kleinbaum et al.,} 1988:210). The initial VIF for the regressors is shown in Table 5.4 below.

Table 5.4: Variance Inflation Factor (VIF) – Initial Results

Variable VIF R-squared

Real FDI 14.96 0.9331 FDI/GDP 4.06 0.7538 M2/GDP 20.58 0.9514 Private Credit/GDP 24.24 0.9587 Loan-Deposit ratio 2.61 0.6162 Market Capitalisation/GDP 4.64 0.7847 Trading Volume/GDP 8.13 0.8770 Market Turnover 4.90 0.7959 Trade Openness 2.73 0.6334 Population Growth 2.94 0.6602 Government Consumption/GDP 3.52 0.7157

Electric Consumption per capita 14.44 0.9308

Enrolment per capita 16.18 0.9382

Inflation 2.26 0.5581

Mean VIF 8.94

Condition Number 18.07

Source: Stata Output for Collinearity Diagnostics

As can be seen from Table 5.4, ten of the independent variables have a VIF of less than 10, which is below the threshold. This implies that these variables are not collinear. However, five variables have a VIF above 10, which indicates that multicollinearity is likely to be a problem if these variables are included in the

regression estimation. One possible reason for the relatively high collinearity in the affected variables, as mentioned earlier, is the likelihood of a joint movement in variables like M2/GDP and Private sector credit/GDP (which are both indicators of financial development measured against the GDP) over time. The pairwise correlation test shows that the correlation between these two variables to be quite high (0.74). But not much can be said about enrolment per capita, electricity consumption per capita, and real FDI. Overall, the mean VIF for all variables is 8.94, which is less than the threshold. Sometimes condition indices, the condition number and eigenvalues will be referred to when examining multicollinearity. However, the condition number gives an overall sense of the extent of multicollinearity. The condition number (κ) the largest value in the condition index. It is equivalent to the square root of the largest eigenvalue (λmax) divided by the smallest eigenvalue (λmin). When there exists no collinearity at all, the condition indices, condition number and eigenvalues will all equal one. As collinearity increases, eigenvalues will become both greater and smaller than 1 (eigenvalues close to zero is an indication of a multicollinearity problem). While the condition number and the condition indices will increase. An informal rule of thumb is that if the condition number is 15, one should be concerned about multicollinearity. If it is greater than 30, then multicollinearity becomes a very serious concern (Belsley et al, 1980). The condition number for the collinearity test conducted above is 18, which indicates some level of concern.

5.2.4.4. Dealing with Multicollinearity

According to Williams (2015), there are several ways to deal with multicollinearity. One is to increase the sample size in order to reduce standard errors and make it less likely for the results to be the effect of a sampling bias. A second way is to create new variables from the existing variables that may serve as a proxy for the collinear variables using information from prior research. A third way is to use factor analysis or some other means as to create a scale from the independent variables. In Stata, relevant commands include factor and alpha. It is sometimes recommended that the researcher “drops” the affected variable(s). However, if the variable is a key component of the model, this could lead to a specification error, which can be even more of a problem than multicollinearity.

In response, it has been deemed necessary to drop two financial development variables (Private Sector Credit/GDP and Market Turnover) in any OLS regression

since they are both duplicate measures of financial deepening and market liquidity respectively, the others being M2/GDP and Trading Volume. The removal of these two variables produces a drastic reduction in the VIF for M2/GDP (which has now become 4.86 and in the overall VIF, now 5.40). See new VIF results in Table 5.5. The condition number also falls to 11.16, implying that multicollinearity is now less likely to be a problem in the econometric estimation, though two seemingly unrelated variables – electric consumption per capita and enrolment per capita are still highly collinear.

Table 5.5: Variance Inflation Factor (VIF) – Final Results

Variable VIF R-squared

Real FDI 7.98 0.8746 FDI/GDP 3.68 0.7283 M2/GDP 4.86 0.7944 Loan-Deposit ratio 1.95 0.4862 Market Capitalisation/GDP 4.07 0.7542 Trading Volume/GDP 4.55 0.7802 Trade Openness 2.48 0.5965 Population Growth 2.86 0.6500 Government Consumption/GDP 3.43 0.7085

Electric Consumption per capita 12.78 0.9218

Enrolment per capita 14.44 0.9307

Inflation 2.12 0.5292

Mean VIF 5.40

Condition Number 11.16

Source:Stata Output for Collinearity Diagnostics

In document Diseño de tanques de condensado de 10000 m3 de techo fijo de planta Canadá LNG (página 54-64)