SOLTAI\DO SORBOS DE VIDA - iii::'i=,: SOLTAI\DO SORBOS DE VIDA EntrevÍstas Cuba en el exilío

Prior to the main quantitative analysis of the research, all instrument items will be thoroughly examined for data accuracy entry, possible missing values, outliers, as well as for procedures assessing the assumptions of the multiple linear regression model such as normality, homoskedasticity, linearity and multicollinearity. Finally, statistical

remedies for common method bias will also be examined. This section outlines briefly the methods that will be used for this data screening process.

First, descriptive statistics such as frequencies, percentages, mean values, standard deviations and coefficients of skewness and kurtosis will be obtained for all the items. Histograms of all the items will be examined in order to see the structure and the frequencies of the responses for each item. According to Sekaran (2003) this analysis is essential in order to get an initial feel of the data.

The next step is to check for possible missing values in the obtained data.

Missing values is a very common problem in research conducted through self-administered questionnaires. Hair et al. (2010) note that missing values can cause two main problems: (a) minimise the ability of statistical test to imply a relationship in the data set, and (b) create biased parameter estimates. Obviously the degree of significance of the effects of missing values depends on the frequency of occurrence, the pattern of missing observations, and the reasons for the missing values (Tabachnick and Fidell, 2007). Hair et al. (2010) suggest that if the pattern of missing data is systematic, any technique used to treat this missing data could possibly generate biased results. On the other hand, if the missing data is scattered in a random fashion with no distinct pattern, any remedy to treat this problem is assumed to yield acceptable results. Generally speaking there are no clear set guidelines regarding what constitutes a large amount of missing data. Hair et al. (2010) suggests that an acceptable level is below 5% of the total. This suggestion is carried out through this research.

Regarding outliers, Hair et al. (2010) describes them as cases with scores that are distinctively different from the rest of the observations in a dataset. Problematic

outliers can have serious effects on the results of a statistical analysis since they affect both model fit estimates and parameter estimates. To detect possible outliers the box-and-whisker plots of all items will be obtained and examined. Cases with extreme outliers that seem to be problematic will be excluded of the analysis in order to avoid problems.

The next issue is that of normality. Normality has to do with the distribution or the shape of the data and how closely this corresponds to the normal distribution (Hair et al., 2010). Violation of normality might affect the estimation process or the interpretation of results especially in SEM analysis. Therefore, prior to every analysis it is good to make sure that the data follow the normal distribution as closely as possible.

One approach to diagnose normality is through visual check of the histograms together with a bell-shaped curve of the normal probability distribution. If the observed data distribution largely follows the diagonal lines then the distribution is considered as normal (Hair et al., 2006). Beside the shape of distribution, normality can also be inspected by two multivariate indexes i.e. skewness and kurtosis. The skewness coefficient deals with the symmetry of distribution whereas the kurtosis refers to the measure of the heaviness of the tails in a distribution (also known as peakedness or flatness of the distribution). For perfectly normal distribution, the scores of skewness and kurtosis should be zero. However, Hair et al (2006) suggest that skewness scores outside the -1 to +1 range demonstrate a somehow skewed distribution, while values of the greater than -3 or +3 are indicated as extremely cases of skewness. Similar are the values for the coefficient of kurtosis. In this study, the researcher set the maximum acceptable limit of observation values up to ±3 for both cases. Multivariate normality is also tested with the same method described above but this time for the residuals

obtained from a multiple regression model regarding the main equations of the theoretical model.

The next data screening method has to do with the possible detection of heteroskedasticity in the data. Heteroskedasticity means that the variables have unequal spreads for different sub-populations or simply unequal variances (Hair et al., 2010).

When this is the case then the obtained results are not consistent. The homoskedasticity assumption in this research is examined both by visual inspection of the scatter plots and through the Levene’s test. The Levene’s test is a special case for testing possible heteroskedasticity between two groups of variables and it is used in order to detect possible differences among the demographics of the sample.

Another assumption that needs to be checked deals with the issue of linearity.

Linearity means that the average values of the outcome variable for each increment of the explanatory variable lie along a straight line (Field, 2009). Following Hair et al.

(2010), the most common way to examine the linearity of the relationships is to examine scatterplots of the independent variables with the dependent variable and try to identify possible nonlinear patterns. Furthermore, Probability-to-Probability (P-P) plots of the residuals will be obtained in order to check for multivariate normality.

The final problem that needs to be addressed in the preliminary data screening process is that of problematic multicollinearity. The problem of multicollinearity is present in the data when two or more independent variables are highly correlated. When this is the case the standard regression results are no longer valid because the error variances are inflated giving wrong values of t-statistics (Hair et al., 2010).

Multicollinearity is examined through the estimation and inspection of the correlation

matrix of the independent variables. Correlation coefficients higher than ±0.9 are considered to be highly problematic. On the other hand lower correlations do not seem to create problems.

In document iii::'i=,: SOLTAI\DO SORBOS DE VIDA EntrevÍstas Cuba en el exilío ( ) Lurs DE LA P iz COLECCIÓN POLYMITA (página 68-72)