4. ANÁLISIS DE DECISIONES DE INVERSIÓN EN UN SISTEMA
4.2 Análisis histórico del plan de obras real
Correlations were used to test for associations between the predictor variables
H, length, type and frequency variables and the relative strengths and directions of any such associations. Correlations between the predictor variables of length and frequency were used to examine how they related to each other and check for collinearity, while correlations between predictor and outcome variables (alternate measures of name agreement) were also examined.
Table 8 Summary of Pearson’s r bivariate correlations of item variables (N = 1,474)
H Frequency Length Type
Agreement r -.969** -.075** .495** -.838** p < .001 .004 < .001 < .001 H r .76** -.517** .929** p .004 < .001 < .001 Frequency r -.148** .0720** p < .001 .006 Length r -.504** p < .001
Notes: 1. Agreement = percentage name agreement 2. H = H-statistic 3. Frequency = frequency of unvowelised letter string from Aralex (Boudelaa & Marslen-Wilson, 2010) 4. Length = length in letters of unvowelised letter string 5. Type = number of different correctly diacritised forms of letter string
There was a strong negative correlation between agreement and H (r = -.969, p < .001), with H approaching 0 as agreement approached 100%. An increase in
percentage agreement was associated with a decrease in the number of alternative forms of diacritisation and number of participants agreeing on them. A weak
negative correlation between agreement and frequency (r = .075, p < .001) indicated that an increase in frequency was associated with a decrease in percentage name agreement, while a moderate positive correlation between agreement and length (r = .495, p < .001) indicated that longer letter strings elicited greater name agreement. A strong negative correlation between agreement and type (r = -.838, p < .001)
indicated that agreement decreased as the number of correctly diacritised alternatives for a homographic letter string increased.
For the H statistic, there was a weak positive correlation with frequency (r = .076, p< .001), a decrease in frequency associated with a decrease in the extent of name agreement across the number of correctly diacritised alternatives; higher frequency words were associated with a greater number of possible alternatives. A moderate negative correlation between the H statistic and length (r = -.517, p < .001), indicated that shorter words were associated with more possible alternatives. There was a strong positive correlation between H and type (r = .929, p < .001), with an increasing value of H increasing associated with a higher number of possible alternative diacritised forms of the letter string.
A very weak negative correlation between frequency and length (r = -.148, p< .001) showed an increase in length associated with a decrease in frequency, while a very weak positive correlation between frequency and type (r = .072, p = .006) indicated that an increase in frequency was associated with an increase in the number of correctly diacritised forms of the letter string. A moderate negative correlation between length and type (r = .504, p< .001) indicated that an increase in length was
associated with a decrease in the number of correctly diacritised forms of the letter string.
4.6.6Regression
Multiple linear regression was carried out in order to predict the value of a dependent variable based on the value of two or more other, independent variables. Multiple regression allows identification of both the overall fit, and the variance to be explained, between data and model, and identifies the relative contribution of each of the predictors to the total variance. All variables were standardised as z scores before a range of regression models was run, based on the earlier elimination of variables which made no contribution to explaining the variances.
Two different regression models were run: (1) Name agreement as dependent variable, with zscore(Length) and zscore(Frequency) as predictor variables; (2) H as dependent variable, with zscore(Length) and zscore(Frequency) as predictor
variables.
4.6.6.1Regression model 1
A multiple regression analysis was calculated to predict percentage name agreement based on target item length and frequency. A significant regression equation was found (F(3, 1471) = 238.414, p = < .001), with an R2 of .245. Output from the first regression is shown in Tables 10 and 11.
Coefficients of effects of variables (Table 9) show that, within this model, item length contributed to explanation of the variance in the mean of name
agreement, and the relationship was positive. The estimated coefficient implies that agreement changes by .118 for unit increase in standardized length: longer
unvowelised letter strings tend to elicit vowelised forms with higher name agreement. No change in name agreement is implied by a change in frequency.
Table 9 Survey regression model 1: Coefficients of variables
Standardized Coefficients B Std. Error Beta (Constant) .763 .005 142.841 <.001 zscore(Frequency) <.001 .005 -.002 -.090 .928 zscore(Length) .117 .005 .494 21.582 <.001 M odel Unstandardized Coefficients t Sig. 1 4.6.6.2 Regression model 2
A second regression analysis was carried out to predict H based on target item length, and frequency in order to determine which of the two measures (name agreement and H) would give a better fit between model and data. A significant regression equation was found (F(3,1470) = 178.795, p < .001), with an R2 of .267.
Coefficient estimates from the second regression is shown in Table 10. Table 10 Survey regression model 2: Coefficients of variables
Standardized Coefficients B Std. Error Beta (Constant) .771 .017 45.080 <.001 zscore(Frequency) -.001 .017 -.001 -.030 .976 zscore(Length) -.396 .017 -.517 -22.891 <.001 Model Unstandardized Coefficients t Sig. 2
Zscore(Length) significantly predicted percentage name agreement, β = .396, t = 22.891, p < .001. The negative relationship means that for unit increase in length,
H is expected to decrease by .4. Zscore(Frequency) was not significant, β = < .001, t = -.030, p = .976.
The adjusted R square figure indicates that this second model accounts for almost 27% of the variance around the mean of the H statistic which can be explained by this model. The value of R2 is slightly higher than in model 1,
suggesting that the same predictors as in model 1 give a better prediction for H than for agreement. However, the higher value of the standard error of the estimate suggests that there are wider variations in the values that can be predicted from this model.