Título del gráfico
ANALISIS DE REGRESIÓN
The multivariable model to describe the relationship between the length of time since an orchard became infected by Psa and orchard productivity was developed in four steps: i) variables were created to represent possible effects of agrichemicals on Psa and productivity, ii) potential confounders and their distribution among orchards were identified, iii) simple linear regression models were developed to examine relationships between potential
confounders and the outcome (productivity), and iv) a multivariable model was developed to describe the relationship between time from first detection of Psa until harvest at the end of the 2011/2012 season and productivity in ‘Hayward’ kiwifruit, while controlling for the effects of the potential confounders.
68
4.3.1 Data extraction and management
Data were taken from orchards in all the growing regions in New Zealand (Figure 4-1). The criteria for inclusion of orchards were: i) Zespri registered orchard, ii) ‘Hayward’ fruit produced in the 2010/11 and the 2011/12 growing seasons, and iii) Complete productivity data for 2010/11 and 2011/12 growing seasons. Productivity and agrichemical data from Zespri were combined with Psa, orchard location and management data from KVH. Microsoft Access was used to merge the datasets and extract agrichemical data for orchards that met the inclusion criteria and time frame of the study. Both datasets have been described by Froud et al. (2014). The outcome variable was ‘Hayward’ productivity in2011/12, measured at harvest (late March to June 2012) in tray equivalents per hectare (te/ha) for each orchard. A tray equivalent is a single layer packing tray containing 18 to 36 kiwifruit with an average weight of 3.6 kg/tray for ‘Hayward’ kiwifruit (Mithraratne et al. 2010). The key factor of interest was the number of weeks between when Psa was confirmed in the orchard, and the ‘Hayward’ harvest date in the 2011/12 seasons. The date of first detection was based on data in the KVH database. The method used to confirm Psa positive detections changed during the outbreak. Cases were defined as either orchards with Psa confirmed by a diagnostic test, or by the visual observation of symptoms. The date of a positive diagnostic test, or the date visible signs of disease were reported, were recorded in the database as the date of confirmed infection.
Potential confounders were classified as orchard-related, production-related or spray-related. There were four orchard-related variables: i) elevation, ii) orchard size, iii) region, and iv) presence of other kiwifruit cultivars. Four production-related variables were: i) productivity in the 2010/11 season (te/ha), ii) harvest day in 2010/2011 season, iii) harvest day in 2011/2012 growing season, and iv) organic or conventional production system. Harvest day variables, which gave an indication of early or late harvest for 2011 and 2012, were constructed from the count of days between the start of the New Zealand ‘Hayward’ harvest for the season and the harvest date for each orchard. For those orchards with fruit harvested on more than one day the median harvest date was used in the calculations. Agrichemical data for the 2012
production season (11 March 2011 to 17 June 2012) included the first spray applied
immediately after harvest in 2011 until the last spray applied while fruit were still present in the orchard in 2012 (228,065 spray events). Spray variables were created that grouped active ingredients for Psa management (Table 4-1) and those applied for other purposes (e.g.,
insecticides and foliar fertilisers). Spray data pertained to individual ‘Hayward’ blocks within an orchard. Productivity data pertained to all ‘Hayward’ blocks. Disease data pertained to a whole orchard, comprising multiple blocks in one locality. Differences in numbers of spray
69
applications between blocks within orchards were small so the spray data were aggregated by using the median number of applications per block in the analysis. The water source used for agrichemical spraying was categorised as: i) ground water (including water from bores and spring water that was not part of a water scheme), ii) surface water (including dam, tank, rivers and streams), iii) water scheme (including water taken from a rural or urban water scheme), and iv) mixed, where more than one water source category was used in an orchard.
Table 4-1 Classification of agrichemical and bio-fungicide active ingredients applied to ‘Hayward’ kiwifruit for Psa control during the 2012 growing season. The classification was based on use information contained in the agrichemical database (from Zespri data).
Spray category Active ingredient
Copper copper
Wound protection didecyl dimethyl ammonium chloride tebuconazole
Antibiotics streptomycin Induced resistance
(plant defence elicitors)
propiconazole with benzalkonium chloride and salicylic acid
acibenzolar-S-methyl BioAlexin
Mycorrcin Yeast culture Bio-fungicides Bacillus subtillis
Bacillus amyloliquefaciens Pantoea agglomerans
Ulocladium sp.
Biocides benzalkonium chloride and copper sulphate chlorine dioxide
dodine
hydrogen peroxide peracetic acid
miscellaneous experimental biocide products 4.3.2 Data analysis
Statistical analyses and graphics were undertaken using the R freeware statistical package version 3.0.1 (R Core Team 2013). The level of statistical significance was set at P<0.05. Continuous data were summarised using median and percentiles or mean and standard deviation. Initially, separate linear regression models were used to explore relationships between the outcome, which was 2012 productivity (te/ha), and the time that Psa was first detected or other orchard, spray and production variables. A Lowess smoothing line was fitted to visualise the relationship between 2012 productivity and time that Psa was first detected. For categorical variables with more than two levels (e.g. region), statistical significance was assessed using the partial F-test statistic. For several spray groupings, it was necessary to
70
recode the discrete count variables as categorical variables. Decisions about recoding were made from visual assessment of boxplots and scatterplots and the simple linear regression results. For agrichemical uses, where only a few active ingredients were applied, e.g., bud- break enhancers (max=2) and leaf drop promoters (max=3), the data were examined for evidence of a dose response e.g. did two applications have a greater effect than one? Where there were no differences in productivity between single and multiple applications, the variables were recoded to binary (Not used/Used). Where there was an obvious or significant “dose effect” on productivity, the variables were either left as continuous variables, where there were many applications e.g. copper (max=15), or converted to categorical variables when most orchards received only 1–4 applications. For example, herbicide applications were converted to a four-level categorical variable (Not used/1 spray/ 2 sprays/≥3 sprays).
Productivity for 2011 showed a normal distribution and was scaled to a standardised unit for inclusion in the multivariable modelling.
The multivariable model was constructed in a five-step process. The first step was to construct a ‘full’ model that included weeks since Psa was detected, and any other variables that were associated with productivity at P<0.20. Exceptions were harvest days for the 2010/11 and 2011/12 seasons as these two variables were collinear and therefore only one could be included in the model. Harvest day for the 2010/11 season was included because the P-value was lower and the R2 value was higher. The second step used an iterative manual backward
elimination procedure to remove variables until either all remaining variables in the model were significantly associated with productivity, or the exclusion of the variable: i) altered the Beta co-efficient for weeks since Psa was detected by more than 20%, ii) changed the adjusted R2 value by more than 5%, or iii) changed the AIC (Akaike information criterion) by more than
four points. The significance of each coefficient was assessed using a t-test. For categorical variables with more than two levels, the statistical significance of all the levels in that group was assessed using the partial F-test. The third step was to determine if the continuous variables in the model had a linear relationship with productivity after accounting for the effects of other variables in the model. In a model with only one variable this could be done by visual inspection of a scatter plot. However, in a multivariable model the aim is to assess linearity in the presence of other variables, so the assumption of linearity was assessed
through the inclusion of a quadratic term. If the quadratic was significant then the variable was deemed to have a non-linear relationship with the outcome and it was either converted to a categorical variable or the quadratic term retained was in the model, depending on which produced the highest adjusted-R2. The fourth step was to ensure that no important factors
71
were excluded from the multivariable model. Each variable not included in the ‘full’ model or removed during model building was separately added back into the reduced model and
retained if statistically significant in step four. Finally, in step five, we considered all biologically plausible two-way interactions via the inclusion of an interaction term. Any interactions that were significant, as determined by the partial F-test, were retained in the model. The adjusted R2 value was used to assess the goodness of fit of the model as a whole.
No adjustments were made to p-values for the final model as they are not recommended where exposure variables are individually selected based on the potential for a biologically plausible association with the outcome (Rothman 1990; Vandenbroucke et al. 2007) and manual selection of model variables was applied rather than automated selection criteria (Dohoo et al. 2009c; Froud et al. 2015).
Standard model diagnostics for multivariable linear regression were performed (Kabacoff 2011). The distribution of the Studentised residuals were plotted and visually assessed. The square root of the standardised residuals was plotted against the fitted values and checked for a horizontal line of best fit with no apparent funnel or cone shapes formed by the points. Influential observations were assessed by plotting Cook’s distance values against each variable. For Cook’s distance, a cut-off for concern was set to 0.0002. This was calculated using 4/(n-k- 1), where n is the number of observations (2599) and k is the number of coefficients in the final model (29).
Predicted productivity plots for some effects were constructed in R using the effects package (Fox 2003).
72
Figure 4-1. Map of New Zealand kiwifruit growing regions and kiwifruit orchard locations in 2012.