As the purpose of these analyses was to investigate factors that influence response latencies and accuracy on word targets, responses for nonword targets were excluded. For RT analyses, given that RT data in general is positively skewed, a log transformation4 of the cleaned RT data was performed so as to normalise the RT distribution and not violate the assumptions of normality and linearity of residuals needed for linear mixed effects regression analyses. All data were used for accuracy analyses. A mixed effects regression analysis of the two main dependent variables (RT and accuracy) for word targets were then conducted separately using R (R Core Team, 2016) and lmerTest (Kuznetsova, Brockhoff, & Christensen, 2017) with maximum likelihood.
R2, the coefficient of determination, is traditionally used in regression modelling to represent the proportion of variance in a dependent variable explained by the fixed effects in a model with a single random effect. However,
4 Only the analyses from the log(RT) models were reported in this chapter as Q-Q plots indicated that the
log transformation ameliorated the skew in the raw RT distribution the best compared to other transformations, rendering the distribution closest to a normal distribution. However, models with inverse transformed RT data were also fitted for parity with the speeded pronunciation data analyses in
R2 cannot simply be generalised to the context of mixed-effects modelling with multiple random effects, and thus, multiple sources of error or residual variances, which makes it challenging to calculate R2 via the traditional calculation (see Nakagawa & Schielzeth, 2013). A pseudo-R2 is instead calculated to provide an absolute value for the goodness-of-fit of a mixed-effects model and a summary statistic that describes the amount of variance explained by the model. Pseudo- R2s for all mixed-effects models in this chapter and subsequent chapters were calculated using the ‘r.squaredGLMM’ function in the ‘MuMIn’ package (Barton, 2017) that is based on R code by Nakagawa and Schielzeth (2013) for models with random intercepts and by Johnson (2014) for an extension to models with random slopes. The conditional pseudo-R2 represents the variance explained by the entire model (both fixed and random effects) and is calculated as follows, where 𝜎𝑓2 is the variance of fixed effect components, 𝜎𝛼2 is the variance of random effect components, and 𝜎𝜀2 is the observation-level variance:
𝑅(𝑐)2 = 𝜎𝑓 2 + 𝜎
𝛼2 𝜎𝑓2+ 𝜎𝛼2+ 𝜎𝜀2
The marginal pseudo-R2 represents the variance explained by the fixed effects in the model and is calculated as follows:
𝑅(𝑚)2 = 𝜎𝑓 2 𝜎𝑓2+ 𝜎𝛼2+ 𝜎𝜀2
5.7.3.1 Response Latencies
Fitting the random effects structure. The mixed effects model used in analysing response latencies included random intercepts for both participant and stimuli, as well as random slopes for each principal component (length, frequency, phonotactic probability, root, neighbourhood density, and Levenshtein distance) varying by participant, using a maximal random effects structure as recommended by Barr et al. (2013). This is because we expected these effects to vary across individuals. Furthermore, a likelihood ratio test comparing the random-intercepts-only model with the random-intercepts-and-random-slopes model showed that adding the random slopes for each effect by participant into
the model improved the model fit and accounted for a significant amount of the random variance, χ2(27) = 284.13, p < .001.
Covariates. For this analysis, the following covariates were standardised using z-scores: age, trial order number, and display refresh rate. Both onset and sex were sum coded so that the analysis would show effects on RTs averaged across onset and sex respectively.
Fixed effects. In terms of main effects, the model included all six principal components (length, frequency, phonotactic probability, root, neighbourhood density, and Levenshtein distance), as well as z-scored memorisation and z- scored vocabulary knowledge. In terms of interactions, the model included the three-way interactions between memorisation, vocabulary knowledge, and each principal component, as well as their subsumed two-way interactions, i.e., memorisation × vocabulary knowledge, memorisation × principal component, and vocabulary knowledge × principal component.
A linear mixed effects regression analysis was then conducted using the ‘lmer()’ function in the ‘lmerTest’ package (Kuznetsova et al., 2017) and with maximum likelihood, running the model as follows:
Model_LDT <- lmer(log(RT) ~ (1 + Length + Freq + N + LD + Root + PP|participant) + (1|stimuli)
+ Onset + Trial Order Number + Display Refresh Rate + Sex + Age + MemScore + QVT
+ Length + Freq + N + PP + LD + Root + MemScore:QVT
+ MemScore:Length + MemScore:Freq + MemScore:N + MemScore:PP + MemScore:LD + MemScore:Root
+ QVT:Length + QVT:Freq + QVT:N + QVT:PP + QVT:LD + QVT:Root + MemScore:QVT:Length + MemScore:QVT:Freq + MemScore:QVT:N + MemScore:QVT:PP + MemScore:QVT:LD + MemScore:QVT:Root, data = all, REML = F, control=lmerControl(optCtrl=list(maxfun=1e6)))
In terms of computing p-values in linear mixed-effects modelling, Baayen et al. (2008) recommended using Monte Carlo Markov Chain (MCMC) simulation; however, as this is currently not possible in lmerTest for models with correlation parameters, simulations by Barr et al. (2013) suggest that the likelihood-ratio test
psycholinguistic datasets where the number of observations usually far outnumbers the number of model parameters, as is the case for the current study. Therefore, p-values were obtained by likelihood ratio tests of the full model with the effect in question against the model without the effect in question. This method of computing p-values was also used in subsequent regression analyses in this study as well as in other chapters.
5.7.3.2 Accuracy
Fitting the random effects structure. Similar to response latencies, the mixed effects model for accuracy included random intercepts for both participant and stimuli, as well as random slopes for each principal component varying by participant, using a maximal random effects structure. This is because we expected these effects to vary across individuals. Furthermore, a likelihood ratio test comparing the random-intercepts-only model with the random-intercepts- and-random-slopes model showed that adding the random slopes for each effect by participant into the model improved the model fit and accounted for a significant amount of the random variance, χ2(27) = 123.46, p < .001.
The same covariates and fixed effects used in the previous model for RTs were also used in the fitting of this model, except for onsets and age, which were removed as they were not significant (onsets: χ2(27) = 37.199, ns.; age: χ2(27) = .305, ns.). However, as the dependent variable is a binary response, a mixed effects logistic regression analysis was conducted instead using the ‘glmer()’ function in the lme4 package (D. M. Bates, Maechler, Bolker, & Walker, 2015) and a binomial distribution was selected, running the model as follows:
Model_Accuracy <- glmer(Accuracy ~ (1 + Length + Freq + N + LD + Root + PP|participant) + (1|stimuli)
+ Trial Order Number + Display Refresh Rate + Sex + MemScore + QVT
+ Length + Freq + N + PP + LD + Root + MemScore:QVT
+ MemScore:Length + MemScore:Freq + MemScore:N + MemScore:PP + MemScore:LD + MemScore:Root
+ QVT:Length + QVT:Freq + QVT:N + QVT:PP + QVT:LD + QVT:Root + MemScore:QVT:Length + MemScore:QVT:Freq + MemScore:QVT:N + MemScore:QVT:PP + MemScore:QVT:LD + MemScore:QVT:Root, data = all_accuracy, control=glmerControl(optimizer="bobyqa",
optCtrl=list(maxfun=1e6)), family="binomial")