In a conventional RCT, individuals are randomly allocated between alternatives. However, complex interventions and large-scale transformations often target levels other than the individual participant. For example, whole communities may be targeted by public health programmes; other interventions may be aimed at groups of health professionals, or organisations such as general practices.
Contamination of the control group, leading to biased estimates of effect size, is a drawback of RCTs of population-level interventions.7 This is likely to occur in situations in which different public health
promotion interventions are tested within the same community, or the same clinician delivers different types of health care to patients.8
Cluster randomised controlled trials
Cluster randomised controlled trials (cRCTs) have emerged partly as a means of addressing such problems. The units randomised are pre-existing, natural or self-selected clusters. Cluster members all have an identifiable feature in common (such as patients in a single general practice), and outcomes are measured in all or in a representative sample. The clusters can vary widely depending on the setting, and range in size from families to entire communities.
Methodological discussion of the cRCT appears to have begun in the field of education in the 1940s,8 but
the design saw only sparse use before the 1980s. However, the last half-century has seen a steady increase in cRCTs published in the medical literature: from one per year in the 1960s to over 120 in 2008.8 The
main implication of the cRCT design is that the outcomes of individuals within the same cluster tend to be correlated, which must be accounted for in the analysis. The statistical measure of correlation is known as the intracluster correlation coefficient (ICC), defined as the proportion of variation in the outcome that can be explained by the variation between clusters. Sample size estimates need to account for this, taking into account both the size of the clusters and the ICC. Reliable estimates of ICCs are essential for robust sample size calculations. There is increased understanding of the factors that affect clustering and therefore the likely magnitude of an ICC. For example, Campbell et al.9 demonstrated that ICCs are
sensitive to a number of trial-related factors, particularly the setting and the type of outcome. Analysing a range of data sets, mainly from the UK, they found that ICCs were significantly higher for process than outcome variables and for outcomes in specialist settings compared with primary care. The effects of disease prevalence and trial size were less clear-cut.9 Other studies, meanwhile, suggest that ICC is
associated with prevalence for binary outcome measures.10
There is also growing understanding of how to calculate sample sizes for cRCTs, including scenarios in which the size of individual clusters varies.11 This has been incorporated into standard statistical packages,
and relevant online tools are also available.12
The third development has been the publication of reporting guidelines for cRCTs. These guidelines highlight that it is important to indicate the level at which the interventions were targeted, hypotheses were generated, randomisation was done and outcomes were measured.13 In 2012, specific guidance
was published, based on the 2010 version of the Consolidated Standards of Reporting Trials (CONSORT) statement and the 2008 CONSORT statement for the reporting of abstracts.14 The guidance includes
a checklist of items that authors should address when reporting cRCTs, in conjunction with the main CONSORT guidelines.15
There has also been the development of an appropriate ethical framework, The Ottawa Statement on the Ethical Design and Conduct of Cluster Randomised Trials. This sets out 15 recommendations, providing guidance on the justification for a cRCT; the need for ethical review; the identification of research participants; obtaining informed consent; the assessment of benefits and harms; and the protection of vulnerable participants.16
Stepped-wedge trials
The stepped-wedge trial has emerged as an alternative to the standard parallel-group cRCT and is
increasing in popularity for evaluating complex interventions and large-scale transformations.17 The design
includes an initial period during which no clusters are exposed to the intervention and therefore act as controls. Subsequently, at regular intervals (the ‘steps’), one cluster (or a group of clusters) is randomised to cross over from control to intervention. This process continues until all clusters have crossed over and are exposed to the intervention (Figure 2.1). The appeal of the design is that, at the end of the study, there will be a period when all clusters are exposed, but the order of intervention delivery is determined at random. Data collection continues throughout the study, so that each cluster contributes observations during both control and intervention observation periods.19
The rationale for using this approach, as well as specific methodological considerations, has been described in more detail elsewhere.19,20 However, the principal advantage is that it may more adequately
reconcile the differing needs and priorities of policy-makers and service managers on the one hand, and evaluators on the other. It has intuitive appeal, as each cluster eventually receives the intervention and acts as its own control. As a consequence, it is particularly suited to interventions in which there is a lack of evidence of effectiveness but a strong belief that they will do more good than harm,21 such as efforts
to improve hand hygiene. It also promotes the phased implementation of novel interventions, permitting time for staff training, if this is required. In addition, the researcher may study the effect of time on an intervention, such as whether or not adoption or learning is maintained.
Nevertheless, because data collection usually takes place at each time point in the study, the data burden is potentially significant. It may also take time before every cluster receives the intervention. In addition, the analysis is more complex than that required for a conventional cRCT. Sample size calculations and analysis must make allowance for both the clustered design and the confounding effect of time.20 In a stepped-
wedge design, more clusters are exposed to the intervention towards the end of the study than in its early stages. Consequently, the effect of the intervention might be confounded by any underlying temporal trends if, for example, there has been a general move towards improved patient outcomes, external to the study.19 Of course, judging whether or not an intervention is likely to do more good than harm is complex,
especially when considering wider issues such as opportunity costs. A stepped wedge may mean that all clusters are exposed to an ineffective intervention if initial estimates of effect prove unfounded.
FIGURE 2.1 Stepped-wedge trial schema. Shaded cells represent intervention periods, blank cells represent control periods and each cell represents a data collection point. Reproduced from Brown and Lilford under the terms of the Creative Commons Attribution Licence (CC-BY 2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.18
5 4 3 Participants/clusters 2 1 1 2 3 Time periods 4 5 6
RANDOmISED CONTROLLED TRIALS OF COmpLEx INTERVENTIONS AND LARgE-SCALE TRANSFORmATION OF SERVICES
24
NIHR Journals Library www.journalslibrary.nihr.ac.uk
Selecting a randomised approach
Stepped-wedge trials and cRCTs both offer a means of evaluating complex interventions and large-scale transformations. In what situation should one be preferred? Consider a hypothetical study, involving a group-based weight loss programme with internet support. Early pilot investigations suggest that the intervention is effective, with a primary outcome of weight loss after 6 months. There is interest locally in implementing the programme; eight obesity clinics have provisionally agreed to participate.
In this example, we assume a minimum duration of follow-up of 6 months, and monthly introduction of the intervention into new clusters. In the event that the intervention is effective, the final results would be available in 14 months using a stepped-wedge design. By that stage, all the study clusters would have at least 6 months’ experience of using the intervention, and have been reaping the benefits. In contrast, if a parallel-arm cRCT were used, this result would have been known within 6 months. Four of the clusters would have been delivering the programme since the start of the trial. However, it would now be necessary to ‘fast track’ the intervention in the four control clusters.
On the other hand, consider if the intervention were not effective. With a stepped-wedge design, the final results would again be available in 14 months. However, by that point, all of the clusters would have been exposed to an intervention that does not work, and all would have been delivering it to patients for at least 6 months. Consequently, ethical scrutiny of this type of trial is important. In addition, because data collection usually takes place sequentially, the researchers would have collected a full set of trial data on 14 separate occasions (although a smaller number of data collection periods could be considered). If a cRCT were used, the results would be known within 6 months. Only half of the clusters would have been exposed to an intervention that did not work, and trial data would have been collected only once. Key questions to be addressed before commencing such a trial include:
z What is the strength of the evidence that the intervention is more likely to do more good than harm?
Considerations of costs and opportunity costs may be relevant.
z As well as effectiveness, what other issues are being assessed by the trial, such as learning and
possible decay of the intervention effect?
z How quickly could an effective intervention be rolled out to control clusters if a cRCT were conducted?
The optimal conditions for a stepped wedge are likely to be when there is a lack of evidence of effectiveness but a strong belief that the intervention will do more good than harm.20 It may also
be relevant when measurement of the effect of the intervention over time is important, including evaluation of staff learning and the effect of the service ‘bedding down’. In contrast, a cRCT may be more appropriate when the evidence of effectiveness is not clear-cut and rapid implementation in control clusters is possible. cRCTs have the added advantage that half of the study sample is ‘protected’, should the intervention not work, and the data collection burden is minimised.
In many situations, the concerns of those commissioning the evaluation may outweigh methodological issues. The constraints under which policy-makers and service managers operate may mean that a stepped wedge is the only way to gain agreement to randomisation, because all clusters will receive the intervention during the research. As a consequence, stepped-wedge designs may be particularly useful when policy dictates that an initiative will be rolled out widely, and there is scope to do this in a stepped way to enhance evaluation.
There are examples of successful cRCTs of complex policy interventions, which have successfully negotiated issues around randomisation and implementation, such as a cRCT of a scheme to provide free healthy breakfasts in Welsh primary schools.22 Of the 608 schools invited to participate, 111 agreed to be
randomised and participate in data collection activities. Stepped-wedge designs have the potential to appear more attractive to stakeholders and may encourage higher levels of participation.
Pragmatic and explanatory attitudes to trial design
The PRagmatic-Explanatory Continuum Indicator Summary 2 (PRECIS-2) tool is designed to support trial design decisions consistent with the intended purpose of a planned RCT. This tool is especially useful if the intention is to design a RCT that will have results widely applicable within usual care at whatever level of the health service this may be, from primary to tertiary care settings. PRECIS-2 consists of nine domains: eligibility criteria, recruitment, setting, organisation, flexibility (delivery), flexibility (adherence), follow-up, primary outcome and primary analysis (Figure 2.2). These are discussed until a consensus is reached that that particular element of the trial is designed in a way which promotes the goal of the trial, be it a real-world applicability or research designed purely to understand efficacy. The process makes research teams more aware of the range of opinions and can help to ensure that design decisions are matched to the needs of those who will ultimately use the results of the trial.23
FIGURE 2.2 The PRECIS-2 wheel. Reproduced from The PRECIS-2 tool: designing trials that are fit for purpose, Loudon K, Treweek S, Sullivan F, Donnan P, Thorpe KE, Zwarenstein M, vol. 350, p. h2147, 2015,23 with permission
from BMJ Publishing Group Ltd.
5 Recruitment
How are participants recruited into the trial? Setting Where is the trial being done? Organisation What expertise and resources are needed to deliver the intervention? Flexibility: delivery
How should the intervention be delivered? Flexibility: adherence
What measures are in place to make sure participants adhere to the intervention? Follow-up
How closely are participants followed up? Primary outcome How relevant is it to participants? Primary analysis To what extent
are all data included?
Eligibility Who is selected to participate in the trial?
4 3 2 1
RANDOmISED CONTROLLED TRIALS OF COmpLEx INTERVENTIONS AND LARgE-SCALE TRANSFORmATION OF SERVICES
26
NIHR Journals Library www.journalslibrary.nihr.ac.uk