ij= xijβ+ µi+ ij and yij = 1 if y∗ ij≤ α1 2 if α1< y∗ij ≤ α2 .. . k if αk−1< y∗ij
The error terms ijare distributed as logistic with mean zero and variance π2/3 and are independent of µi. Error terms distributed as N (0, 1) correspond to a probit link, see Section 4.1.2
As touched upon in Chapter two, Section 2.2.3 random effects models with discrete outcomes can be difficult to fit via maximum likelihood and the likelihood function must be approximated by such methods as Gauss-Hermite quadrature.
4.2.3
Marginal models
In the random effects model (or cluster-specific model) the focus is on modelling a cluster-specific response and the regression coefficient for treatment represents the average effect of treatment if an individual stays in the same cluster but moves from the control to the intervention arm. The random effects model uses maximum likelihood estimation and explicitly models the covariance structure by introducing cluster-specific random effects into the model.
In a population averaged, or marginal, model the regression coefficient for treatment represents the effect of treatment if an individual in the population moves from control to intervention arm. The model does not fully specify the distribution of the population, as in the random effects model, instead the marginal expectations are modelled and a variance-covariance structure (referred to as the working or hypothesised correlation) is chosen to describe the correlation between members of a cluster. The model is not fitted via maximum likelihood; an estimate of the treatment effect can be found by solution of a generalised estimating equation (GEE).
Generalised Estimating Equation methods for the analysis of ordinal outcomes have been described by Lipsitz, Kim and Zhao and are summarised here.68
4.2. ANALYSIS OF CLUSTERED ORDINAL OUTCOMES
For each individual we observe the response on a k-level ordinal outcome with categories q = 1, 2, . . . , k. To keep with the terminology of Lipsitz et al a higher category here is used to indi- cate a better outcome. Let Zij denote the ordinal response of the j’th individual in the i’th cluster, j = 1, 2, . . . , n and i = 1, 2, . . . C. The size of the cluster is assumed fixed and denoted by n. We form k indicator variables Yijq, where Yijq = 1 if subject j has response q and Yijq = 0 otherwise. For each subject we form a k-1 response vector of indicator variables Yij= [Yij1, . . . , Yij(k−1)]0 , and for each cluster Yi= [Yij0, . . . , Yin0 ]0
The marginal probability is denoted by P r[Zij = q] = E[Yijq] = P r[Yijq = 1] = πijq and the corresponding marginal cumulative probabilities by P r[Zij ≤ q] = Pijq
Lipsitz et al analyse the data using a marginal model based on cumulative logits
logit[Pijq] = Xβ (4.4)
Where X denotes a (k− 1) × k design matrix for the j0th individual of the i0th cluster and β = [α1, . . . , αk−1, β]0denotes a k×1 parameter vector. Where the αq corresponds to the q0th cumulative logit and β denotes the effect of treatment.
In Chapter Two, Section, 2.2.2, I presented a worked example of the sample size methodology proposed by Kim et al which assumed a GEE model. In Step 10 of their sample size process the parameter vector β is found by solution of the GEE equation:
ˆ β = [P tD 0 tV −1 t Dt]−1[PtX0tWth(θt)]
Where t is an index representing treatment group, Dt= ∆tXtwhere ∆t= ∂π∂η a matrix of partial derivatives of the mean of the outcome with respect to the regression parameters, η is the linear predictor Xβ, Xtis the design matrix and Vtis the working covariance matrix and Wt= ∆tVt−1∆t and h(θt) is a vector of cumulative logits i.e. ln(cumulative probability/(1-cumulative probability).
4.2. ANALYSIS OF CLUSTERED ORDINAL OUTCOMES
W = β2
var(β) ∼ χ
2 1
The variance of the treatment effect can be calculated in two ways. The model-based estimate of the variance provides valid inferences only if the working covariance matrix is correctly specified.
var(β) = T X t=1 [Dt0V −1 t Dt]−1 (4.5)
An alternative estimate that is more robust to miss-specification of the working covariance matrix is calculated by varr(β) = T X t=1 [D0 tV −1 t Dt]−1 T X t=1 [D0 tV −1 t var(Yt)Vt−1Dt]−1 T X t=1 [D0 tV −1 t Dt]−1 (4.6)
4.2.4
Comparison of random effects and marginal models
Of those trials identified in Chapter Three with ordinal outcomes the random effects model was the most popular choice of analysis.
The GEE, or population averaged (PA) method provides consistent estimation even when the cor- relation structure is miss-specified and is computationally simple compared to the random effects (RE) model, where the likelihood function must be approximated. However, because the method does not specify a full multivariate distribution for the responses the GEE method does not have a likelihood function. No likelihood function means that the likelihood ratio test and other likelihood based methods cannot be used to check model fit, compare models or make inferences about the model parameters. With a GEE model inference about the model parameters must be made via a Wald test, which can give spurious results, particularly for small samples. The empirical-based standard errors calculated from a GEE model may be underestimated unless the sample size is very large.
The two models also differ in the assumptions they make with regard to missing data. Missing data mechanisms have been described by Little and Rubin and I use their terminology here.142 The GEE model makes the strongest assumption, that the data are Missing Completely At Random (MCAR). In simple terms this means that the probability that the observation is missing does not
4.2. ANALYSIS OF CLUSTERED ORDINAL OUTCOMES
depend on the value itself, or any other observed measurement. Maximum likelihood based random effects methods make a weaker assumption that the data are Missing At Random (MAR), that is the probability that the observation is missing does not depend upon the value itself but can be explained by other observed measurements. Violations of these assumptions in either model may lead to biased results.
The relationship between the treatment effect estimated from a RE model and that estimated from a PA model have been described by Agresti.130 These relationships are briefly summarised here as they will be used in the design of the simulation study described in the next chapter.
• The treatment effect from a marginal model will be smaller than that from a random effects model, the difference increases as the level of within-cluster correlation increases.
• When a probit link is used the treatment effect estimate from the RE model and that from the PA model can be directly compared, with the RE estimate beingp(1 + σ2
w) times that of the PA effect. • For the logit link the relationship between the model estimates is only approximate with the RE estimate beingp1 + 0.346σ2
wtimes that of the PA effect.
•Despite the difference in interpretation and magnitude between the random effects and population averaged models the significance of the treatment effect is likely to be similar.130
• Using the fact that the standard normal cumulative distribution function (cdf) at a point z is well approximated by the standard logistic cdf at 1.7z the estimates from models with a logit link are approximately 1.7 times those from probit models.130
4.2.5
Software
In Stata version 13 the xtologit command fits random-effects models via maximum likelihood using adaptive Gauss-Hermite quadrature as the default for approximating the likelihood. The accuracy of this quadrature can be checked using the command quadchk. The xtologit command assumes that larger values of the ordinal response correspond to better outcomes. The command xtoprobit is available for ordered probit models. The gllamm command is a user-written command that can also be used to fit these models and pre-dates the inbuilt functions. Currently there is no option available in Stata to analyse ordinal outcomes with a GEE model.
4.2. ANALYSIS OF CLUSTERED ORDINAL OUTCOMES
The SAS software can accommodate random effects and GEE models using PROC NLMIXED and PROC GENMOD respectively.
4.2.6
Assessment of proportional odds
For the individually randomised trial formal methods have been proposed to test the assumption of proportional odds and for cases when the assumption of proportional odds is not valid non- proportional or partial proportional odds models have been suggested as alternative analysis methods and have been incorporated into statistical software, see Section 4.1.2.
For the clustered case there has been less development and limited guidance around how to formally assess proportional odds. As in the individually randomised case to test the assumption of propor- tional odds one may consider fitting a separate effect for each category and comparing this to the proportional odds model via a likelihood ratio test. Hedeker and Mermelstein have extended the ran- dom effects proportional odds model to allow for non-proportional or partial proportional odds.143 Their approach extends Peterson and Harrell’s approach for partial proportional odds for the fixed effects model.139 The partial proportional odds method described by Hedeker and Mermelstein has been implemented in an extension to the MIXOR package available in R for mixed effects ordinal regression.144
The partial proportional odds model of Hedeker and Mermelstein was developed within the context of behavioural state of change data, where participants are categorised according to their readiness to change ranging from pre-contemplation to action. The authors considered the assumption of proportional odds to be unreasonable for this type of data. Of the 11 CRTs with ordinal outcomes identified in Chapter Three there were three trials that used such an outcome. Therefore non- proportional odds may be a significant problem in the design and analysis of ordinal outcome trials with behavioural outcomes.
4.2.7
Significance testing
As in the individual case the Null hypothesis that there is no effect of treatment can be assessed via a likelihood ratio, Wald or score test for random effects models or Wald test for GEE. However,
4.3. THE ICC
the approximation of the Wald test statistic to the normal distribution is worse in the clustered case when the number of clusters is small or the cluster size is variable. This tends to inflate the Type I error rate, so we are likely to see more than 5% of calculated P-values being less than 0.05 under the null hypothesis. A suggested solution is to compare the Wald statistic to a t-distribution, but this may reduce the Type I error to below the nominal value.
4.3
The ICC
Estimators of the ICC have been extensively described for binary and continuous data.145, 146 The most common interpretation of the ICC is that it represents the proportion of variance due to between-cluster variation, for continuous outcomes this is defined as
ρ = σb2
σ2
b+σw2
Where σ2
b is the between-cluster variance, σw2 is the within-cluster variance and the sum of the two is the total variance.
For binary outcomes the definition is slightly different ρ = σb2
π(1−π)
The ICC is dependent on π, the prevalence in the population. The total variance is calculated as π(1− π). The prevalence, and therefore the within-cluster variance will likely vary between clusters. Therefore the assumption of constant within-cluster variance does not hold for binary outcomes. This ICC for binary outcomes is referred to as being on the proportions scale and it is this ICC which should be used in the design effect when calculating sample size for binary outcomes.147 Like ordinal outcomes the model for binary outcomes can be motivated by the existence of an underlying continuous variable on the logistic scale and an ICC can also be calculated on this logistic scale. This estimate will be different from the ICC on the proportions scale and there is no easy formula to convert one to the other, instead simulation must be used.148 Eldridge, Ukoumunne and Carlin have compared the two estimates through simulation and provide a useful table that shows the relationship between the ICC on the proportions scale and that on the logistic scale for different
4.3. THE ICC
levels of overall prevalence. The difference between the two values is greatest when the ICC on the proportions scale is large or the prevalence is further from 50%.
4.3.1
ICC of the latent response
The underlying latent variable, Y∗
ij is assumed to be continuous and follows the random effects model Y∗
ij= xijβ+ µi+ ij
with error terms ij distributed as logistic with mean zero and variance π2/3 and are independent of µi, the random effects for clusters, which are distributed N (0, σ2b).
The intracluster correlation coefficient on this underlying (logistic) scale is defined as
ρ(l)= σ2 b σ2 b+ π2/3 (4.7) This ICC is relatively straight forward to calculate and is often automatically provided by statistical software, such as Stata, after model fitting. However, the fact that this ICC relates to the underlying variable means it is not clear whether its use in the design effect for ordinal outcomes is appropriate. For the probit link π2/3 is replaced by 1 in the denominator.
4.3.2
Analysis of variance ICC
It is possible to assign numerical values to the ordinal categories and calculate an ICC using a one- way analysis of variance.146
ρ = M SB−M SW
M SB+(n−1)M SW
Where MSB and MSW are the mean squares between and within clusters estimated from the analysis of variance and n is the cluster size.
The estimated ICC will be dependent upon how the numerical scores have been assigned. For example the simplest approach is to assign equally spaced values such as 1, 2, 3 and 4. Alternatively
4.3. THE ICC
if it was felt that the gap between categories 3 and 4 was twice that of other adjacent categories the scores could be assigned as 1, 2, 3, and 5. If the categorical variable was formed from the grouping of scores from a health questionnaire, for example, scores may be defined from using mid-points of the categories. In this thesis I assume equally spaced scores. Departures from equally spaced scores are heavily dependent upon the nature of the outcome and the ordinal outcome trials reviewed in Chapter Three provided no evidence against equally spaced scores. Use of equally spaced scores should therefore apply more widely than alternative scoring methods.
4.3.3
Kappa-type ICC
For discrete outcomes the ICC can be interpreted as a version of the kappa statistic. For binary outcomes kappa is a measure of agreement corrected for chance, most often used to measure inter- rater agreement, which is the agreement among a set of raters recording measurements about a binary trait on the same individual. It is calculated as the observed proportion of agreement minus that expected by chance, divided by the maximum agreement over chance:
ˆ ρk= ˆ πO− ˆπE 1− ˆπE (4.8) Where πO and πE are the proportions of observed and expected agreement, respectively. A value of 1 for the statistic represents perfect agreement.
Gao has proposed a kappa-type ICC estimator for ordinal data in the cluster randomised trial context.125 It is similarly constructed as the proportion of observed agreement among pairs of observations within clusters minus that expected by chance, divided by the maximum agreement over chance. However, weighting is used to define the distance between ordinal categories. For equally spaced scores the weight, wjj0 (termed the square error rate) corresponding to the agreement
between two categories q and q0 is:
wqq0 = 1−(q−q 0)2
(k−1)2
4.3. THE ICC