All analyses were conducted on an intention-to-treat basis, including all participants in the groups to which they were randomised. For superiority comparisons, two-sided significance tests at the 5%
significance level were used. Analyses were conducted in Stata version 13 (StataCorp LP, College Station, TX, USA).
Overview of analyses
All outcome measures were assessed separately among the following groups of participants: 1. Beating the Blues versus usual GP care alone (superiority)
2. MoodGYM versus usual GP care alone (superiority) 3. MoodGYM versus Beating the Blues (non-inferiority).
For each group comparison, similar analyses were applied depending on the type of comparison (superiority or non-inferiority) and the inclusion of potentially important covariates. It was aimed to minimise the number of models applied to each outcome to avoid issues arising from multiple comparisons. The non-inferiority comparison was undertaken only for the dichotomised PHQ-9 scores, with the primary outcome being at 4 months. To assess non-inferiority between Beating the Blues and MoodGYM, we computed two-sided 90% confidence intervals (CIs). Using this method, the free-to-use cCBT program MoodGYM was not inferior to the commercial pay-to-use cCBT program Beating the Blues at the 5% level if the upper boundary was below the pre-specified margin of non-inferiority [0.15 difference in proportions, which translated to 1.44 for the odds ratio (OR)].
Baseline data
All baseline data were summarised by treatment group and described descriptively. No formal statistical comparisons were undertaken. Continuous measures were reported as means and standard deviations (SDs; with medians and minimum and maximum values where appropriate), whereas the categorical data were reported as counts and percentages.
Primary analysis
The primary outcome was depression status at 4 months using a cut-off point of 10 on the PHQ-9. The PHQ-9 was scored if at least eight out of the nine questions were completed. If there was a missing response for one item, then the mean of the other eight responses was imputed. Groups were compared using a logistic regression model with adjustments for gender, age, baseline depression severity, depression duration and level of anxiety. From this model, we obtained ORs and corresponding 95% CIs.
Secondary analyses
Patient Health Questionnaire-9 dichotomised
The primary analysis was repeated for the 12- and 24-month dichotomised PHQ-9 data using the same methods as described above. In addition, all time points were analysed in a single model rather than individual analyses at each time point, using a repeated measures multilevel logistic regression model. The outcome measures were the values at 4, 12 and 24 months, and baseline PHQ-9 score, age, gender, depression duration, level of anxiety, treatment group and time were included as fixed effects. The model also included an interaction between treatment and time. Participants were treated as random effects (to allow for clustering of data within individual participants). Different covariance patterns were calculated for the repeated measurements within participants: unstructured, independent, exchangeable and identity. Models were compared and the model with the smallest Akaike information criterion (AIC) value was selected for the final model for each comparison. Model assumptions for the final models were checked for all comparisons. Overall ORs and corresponding 95% (or 90% for non-inferiority) CIs and individual ORs at each time point (4, 12 and 24 months) were estimated from these models.
Subgroup analyses
An a priori subgroup analysis was performed for the dichotomised PHQ-9 scores at 4 months only. The primary analysis was repeated, including an interaction term between the baseline factor and treatment comparison as described in the previous section. As this study has not been powered to detect
interactions, a statistical significance level of 10% (p<0.10) was used. The subgroup analysis was based on baseline pre-randomised patient preference in relation to cCBT.
Patient Health Questionnaire-9 continuous
The PHQ-9 outcome was also analysed in its continuous form using a repeated measures multilevel linear mixed model following a similar procedure to those outlined above for the dichotomised PHQ-9 scores. The model made adjustments for the same covariates.
Clinical Outcomes in Routine Evaluation–Outcome Measure
The CORE-OM was analysed using a repeated measures multilevel linear mixed model following a similar procedure to that outlined above for the dichotomised PHQ-9 scores. The model adjusted for the same covariates but included the baseline CORE-OM score and not the baseline PHQ-9 score. If there was a missing response for no more than three items, then the overall score was still calculated by summing the valid responses and dividing by the number of valid responses.
Short Form questionnaire-36 items Health Survey version 2
The HRQoL was measured using the SF-36v2 questionnaire at baseline and 4, 12 and 24 months. The scores for the individual SF-36v2 health components (physical functioning, role-physical, bodily pain, general health, role-emotional, vitality, social functioning and mental health) were summarised at each time point by group. For statistical analysis purposes, only the physical component summary (PCS) scores and mental component summary (MCS) scores were analysed to prevent problems caused by multiple testing. The SF-36v2 PCS and MCS scores were analysed using a repeated measures multilevel linear mixed model following a similar procedure to that outlined above for the dichotomised PHQ-9 scores. The model adjusted for the same covariates but included the baseline SF-36v2 score and not the baseline PHQ-9 score.
Adverse events
The number of adverse events and the number of participants experiencing those events were summarised overall and by group for non-serious adverse events (NSAEs) and SAEs separately. No statistical comparisons were undertaken.