Verificación del cumplimiento del programa 4. Revisión y actualización del programa

Programa de Erradicación del ToBRFV

3. Verificación del cumplimiento del programa 4. Revisión y actualización del programa

According to Tabachnick and Fidell (2006, p. 72) an outlier is defined as ‘a case with such an extreme value on one variable (a univariate outlier) or such a strange combination of scores on two or more variable (multivariable outlier)’. It is observation(s) which is distinct from other observations due to high or low scores (Hair et al., 2006). Researchers agreed that outliers can result in non-normality of data and distorted statistical results (Kline, 2005; Hair et al., 2006; Tabachnick & Fidell, 2007). Tabachnick & Fidell (2007, p. 73) defined four reasons for outliers’ presences within dataset, due to: 1) incorrect data entry, 2) failure of specifying codes for missing values which might be treated as real data, 3) entering observation which is not part of population from which sample is extracted, and 4) including observation from population but the distribution for the variable in the population has extreme values than the normal distribution. Kline (2005) categorised two types of outliers: univariate outliers- a case of an extreme value on single variable, and multivariate outlier- case of odd combination of extreme values in two or more than two variables. The issue of ‘extreme values’ and their tolerance are not explicitly characterised in literature. However, there are some widely accepted rules of thumb which suggest that within univariate outliers a case is outlier if: 1) standard score for small sample size (80 or fewer) is 2.5 or beyond, while for large sample size standard score can be considered up to 4, 2) value more than 3.0 standard deviations away from the mean is regarded as an outlier (Hair et al., 2006. p.75).

In the current study, for detecting the univariate outliers items were grouped together to represent single variable. Using SPSS function of descriptive statistics, the data values of each observation were converted to standardised score also known as z-scores (Hair et al., 2006; Tabachnick & Fidell, 2007). The results indicate that data set contains fewer univariate outliers (see table 5.3).

S.NO Variable Case of outlier standardised values i.e. z-scores > 3.0

1 NAT

103 -3.64489

57 -3.64489

124 -3.64489

115 -3.43909

344 -3.23328

2 BI No Case ---

3 BU 312 -3.55219

430 -3.28631

194

355 -3.02044

4 GS No Case ---

5 TS No Case ---

6 VOL No Case ---

7 PU 449 -4.41476

8 PEOU No Case ---

9 SN No Case ---

10 SE No Case ---

11 TF 58 -4.2893

409 -3.50643

12 RF No Case ---

13 AT 274 -3.41418

14 PD No Case ---

15 IC No Case ---

16 UA No Case ---

17 MF No Case ---

Table 5. 3: Univariate outliers

Multivariate outliers were detected by using Mahalanobis D² measure, also considered as multidimensional version of z-score (Hair et al., 2006; Tabachnick & Fidell, 2007). This method helped to measure each observations distance in multidimensional space from the mean of centre of all observations and provides a single value (Hair et al., 2006, p.75).

According to Hair et al., (2006, p.75) if case D²/df exceeds value 2.5 in small sample and 3 or 4 in large sample it is considered to be possible outliers. Additionally, a conservative statistical test of significance i.e. p< 0.001 or p<0.005 is used with Mahalanobis distance measure, where larger the D² value for a case results smaller corresponding probability value, likely to be considered an outlier (Hair et al., 2006; Tabachnick & Fidell, 2007).

In the current study, liner regression method was applied to calculate the Mahalanobis D² value. For obtaining t-value of significance, function of SPSS V.16 “1-CDF.CHISQ(quant, df)” was applied, where quant = D² and df=13. The function returned, cumulative probability that a value from the chi-square distribution i.e. D² with degree of freedom less than the quant. Table 5.4 indicates that there were only seven observations of extreme outliers in sample of 380 (i.e. p<0.005). Using Box Plot researcher also applied graphical method for detecting multivariate outliers. Figure 5.1 indicates that twenty eight observations were found as mild-outlier outlier (i.e. inter quartile range (IQR)> 1.5) and only one case was found extreme outlier (i.e. IQR > 3.0). According to Hair et al., (2006) outliers can be retained until and unless there is proof that outliers are truly deviated and are not signifying any observation in dataset. Even though, if outliers are found to be problematic they still can be accommodated in way that will not seriously distort the results (Tabachnick & Fidell, 2007). Therefore, observing outliers identified in table 5.3(univariate) and table 5.4 (multivariate) researcher decided to retained the observations having outliers for the next stage.

195

Circle= represents Mild-Outliers score which is more than 1.5IQR from the rest of the score Star= represents Extreme-Outliers score which is more than 3IQR from the rest of the score

Figure 5. 1: Box-Plot representing multivariate outliers

5.2.3. Normality, Homoscedasticity and Non-Response Bias of data 5.2.3.1. Normality

The normality is considered to be fundamental assumption in multivariate analysis (Hair et al., 2006; Kline, 2005; Tabachnick & Fidell, 2007). Normality is characterised by the assumption that the data distribution in each item and in all linear combination of items is normally distributed (Hair et al., 2006; Tabachnick & Fidell, 2007). According to the Hair

196

et al., (2006, p. 79) ‘if the variation from the normal distribution is sufficiently large, all resulting statistical test are invalid, because normality is required to use the F and t statistics’. Furthermore, author state that violation of normality within multivariate analysis can cause underestimation of fit indices and standardised residuals of estimations (ibid).

The assumptions of normality can be examined at unvariate level (i.e. distribution of scores at an item-level) and at multivariate level (i.e. distribution of scores within combination of two or more than two items). According to Hair et al., (2006, p. 80) if the variable/items satisfies the multivariate normality than it also satisfy the univariate normality, while reverse is not necessarily true. In other words, existence of univariate normality does not guarantee the assumption of multivariate normality.

Assessing the severity of nonnormality is based on two assumptions- 1) the shape of offending distribution, and 2) the sample size (Hair et al., 2006, p.80). According to (Tabachnick & Fidell, 2007, p. 79) shape of normal distribution can be ascertained by either graphical or statistical methods. Within graphical method of examination, normality is checked by inspecting the histogram of variable, which requires being symmetrical, bell-shaped curve and has higher frequency of scores in middle and lower on peaks (Pallant, 2007, p. 124). Another graphical method for assessing normality, also considered to be an easier method compared to the others is Q-Q plot (also know normal probability plot) (Norusis, 1992). The Q-Q plot, displays graph between observed values and expected values. Within Q-Q plot if the points within graph are clustered around a straight line than it represents variable is normally distributed (Field, 2009).

Through visual inspection in figure A-1(appendix-A) the distribution of values in the current study shown that all variables were clustered around the straight line, therefore, observation within sample does not require any adjustment through transformation process.

Furthermore, the normal probability plot (P-P plot of the regression standardised residual) employed to assess multivariate normality were also noticed normal (see figure 5.2). In addition, Kolmogorov-Smirnov and Shapiro-Wilk (K-S) statistics (Shapiro and Wilk, 1965) were calculated for each variable (see table 5.5) and results revealed that all the variables were significant, which violated the assumption of normality. The significance of K-S test was expected due to large sample size (Pallant, 2007, p. 62). According to the Field (2006, p.93) the significance of K-S test for large sample size cannot be considered as deviation of data from normal distribution.

197

Figure 5. 2: Multivariate normal P-P plot of regression standardised residual

Table 5. 5: K-S Test of Normality

The other method used to identify the shape of distribution is skewness and kurtosis (Pallant, 2007). Whereas, skewness portrays the symmetry of distribution and kurtosis refers to the ‘peakedness’ or the ‘flatness’ of distribution compared to the normal distribution (Field, 2006; Hair et al., 2006). According to the Hair et al., (2006, p.80) positive skewness denotes distribution shifted to the left and tails off to the right; whereas negative skewed distribution is reversed. For the normal distribution, the value of skewness is recommended to be zero which represents symmetric shape (Curran et al., 2006). The

198

‘leptokurtic’, and the distribution that is flat is termed ‘platykurtick’ (Hair et al., 2006, p.

80). Additionally, the negative kurtosis value indicates a flatter distribution, while a positive value indicates peaked distribution. The kurtosis values less than 1 are considered negligible, and values from 1 to 10 are indicated moderate non-normality, while greater than 10 are indication of severe non-normality (Holmes-Smith, Cunningham & Coote, 2006).

In this study, as presented in table 5.6 all the variables were within the normal range of skewness and kurtosis(i.e. < 2.58, c.f. Hair et al., 2006, p.82). However, the score presented in table 5.6 have both positive and negative skewness and kurtosis values.

According to Pallant (2007, p. 56) negative or positive skewness and kurtosis does not represents any problem until and unless they are within normal range. Also, negative or positive values of skewness and kurtosis reflect the underlying nature of the construct being measured. For example, in this study, the negative skewed score of construct perceived usefulness represents that individuals within sample are agreed more than disagreed towards the acceptance due to usefulness.

The severity of normality is also based over the sample size (Hair et al., 2006). The larger sample size reduces the negative effects of non-normality (Hair et al., 2006; Pallant, 2007).

Moreover, small sample size (fewer than 50 cases) represents serious effect on normality compared to the large sample size (more than 200 cases). In the current study, workable sample size is 380; therefore, presence of little non-normal univariate distribution may be avoidable.

For the test of multivariate normality, Mardia’s coefficient was used (Brwon, 1982).

Mardia’s (1970) coefficient of multivariate normality was computed by AMOS (Arbuckle, 2006) (see table A-3 in appendix-A), which indicates that the assumption of multivariate normality was not tenable (Mardia’s coefficient = 228.527, CR = 39.88). The table A-4 appendix-A, represents the observations farthest from the centroid (Mahalanobis distance) and displays potential multivariate outliers which resulted non-normality within sample.

N Minimum Maximum Mean

Std.

Deviation Skewness Kurtosis

Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Std.

Erro r

vol 380 1.00 7.00 3.1897 1.51255 0.516 0.125 -0.628 0.250

PU 380 3.40 7.00 6.1463 0.62208 -0.433 0.125 0.022 0.250

PEOU 380 3.75 7.00 5.9480 0.73350 -0.319 0.125 -0.611 0.250

SN 380 3.75 7.00 5.9007 0.72609 -0.502 0.125 -0.234 0.250

SE 380 1.00 7.00 4.5561 1.24208 -0.141 0.125 -0.153 0.250

199

TF 380 1.50 7.00 5.6092 0.95801 -0.772 0.125 0.788 0.250

RF 380 3.00 7.00 5.7428 0.95392 -0.679 0.125 -0.179 0.250

AT 380 4.00 7.00 6.1321 0.62448 -0.448 0.125 -0.405 0.250

NAT 380 1.00 7.00 5.4276 1.21475 -0.975 0.125 1.124 0.250

BI 380 4.25 7.00 6.0132 0.69665 -0.396 0.125 -0.688 0.250

BU 380 2.50 7.00 5.8401 0.94030 -0.843 0.125 0.357 0.250

GS 380 3.60 7.00 5.7359 0.73117 -0.244 0.125 -0.219 0.250

TS 380 4.20 7.00 6.1163 0.62831 -0.419 0.125 -0.479 0.250

PD 380 1.00 6.50 2.9868 1.05344 0.568 0.125 -0.037 0.250

IC 380 2.83 7.00 5.4097 0.95252 -0.437 0.125 -0.177 0.250

UA 380 4.75 7.00 6.3513 0.54507 -0.546 0.125 -0.455 0.250

MF 380 1.00 7.00 3.1506 1.66060 0.532 0.125 -0.788 0.250

Valid N (listwise) 380

Table 5. 6: The shape of data distribution based on Skewness and Kurtosis values 5.2.3.2. Homoscedasticity

According to Hair et al., (2006, p.83) homoscedasticity is the assumption of normality related with the supposition that dependent variable(s) display an equal variance across the number of independent variable(s). Whereas, Tabachnick and Fidell (2007, p.85) defined homoscedasticity as variability in scores for one variable roughly same to the values of all other variables. The assumption of equal variation between variables is pre-requisite in multiple regressions (Field, 2006). Within multivariate analysis, the failure of homoscedasticity is also known hetroscedasticity and can create serious problem (Hair et al., 2006). Hetroscedasticity is caused either by presence of nonnormality or higher error of measurement at some level in independent variable(s) (Hair et al., 2006; Tabachnick &

Fidell, 2007). In analysis, where data are grouped, homoscedasticity is known as homogeneity of variance (Tabachnick & Fidell, 2007, p.86). The most common method for assessing the homoscedasticity is Levene’s test of equal variance (Hair et al., 2006; Field, 2006; Pallant, 2007).

In this study, Levene’s test for the metric variables was computed across non-metric variable (gender) as part of t-test. Most of the obtained scores (see table 5.7) except PU, PEOU, NAT, PD and MF, were higher than the minimum significant value i.e. p<0.05, which suggest that variance for all the variables was equal within groups of male and female and had not violated the assumption of homogeneity of variance. Similar to the Kolmogorov-Smirnov and Shapiro-Wilk test, Levene’s test is also considered to be sensitive with respect to the sample size and can be significant for large sample (Field, 2006, p.98). Therefore, for the current study which has sample of 380, significance of few constructs in Levene’s test does not represent the presence of substantial non-normality within sample.

200

Test of Homogeneity of Variances

Levene

Statistic df1 df2 Sig.

vol 0.581 1 378 0.447

PU 4.347 1 378 0.038

PEOU 4.117 1 378 0.043

SN 2.760 1 378 0.097

SE 0.857 1 378 0.355

TF 0.960 1 378 0.328

RF 3.228 1 378 0.073

AT 0.677 1 378 0.411

NAT 4.230 1 378 0.040

BI 2.404 1 378 0.122

GS 0.377 1 378 0.540

TS 0.513 1 378 0.474

PD 4.463 1 378 0.035

IC 3.306 1 378 0.070

UA 0.233 1 378 0.630

MF 7.041 1 378 0.008

BU 0.536 1 378 0.464

Table 5. 7: Leven’s test of homogeneity of variances 5.2.3.3. Multicollinearity

Multicollinearity is the problem related to the correlation matrix in which three or more independent variables are highly correlated (say, .90 or above) to each other (Tabachnick

& Fidell, 2007; Hair et al., 2006). The presences of higher level of multicollinearity results in lower of the unique variance explained by each independent variable (β-value) and increase the shared prediction percentage (Hair et al., 2006, p.186). In other words, the presence of multicollinearity limits the size of regression (R) value and makes it difficult to understand the contribution of each individual independent variable (Field, 2006). For increasing the prediction, it is suggested to inspect the highly correlated variables and delete one of them (Hair et al., 2006, Tabachnick & Fidell, 2007).

From the several method of detecting severity of multicollinearity, two are very common:

inspecting the bivariate and multivariate correlation matrix, and calculating the variance inflation factors (VIF) and tolerance impact (Pallant, 2007; Tabachnick & Fidell, 2007;

Temme et al., 2010). According to the Pallant (2007, p.156) the tolerance effect indicates the variability specified by independent variable is unique (not explained by any other independent variable), whereas VIF is the inverse of tolerance effect. The larger VIF (say, above 10) and lower tolerance (say, below 0.1) indicates the presence of mulitcollinearity (Myer, 1997; Menard, 1995; Pallant, 2007).

In document PROGRAMA NACIONAL PARA LA APLICACIÓN DE LA NORMATIVA FITOSANITARIA (página 49-54)