Capítulo X. Guía de excelencia para profes motivadores
70. Comparta sus penas
We conducted our analysis in two streams: (a) a difference-in-difference analysis of the effect of virtual wards on health and social care use, which is described in sections 2.4-2.9 inclusive; and (b) an economic analysis on the cost and any savings of the intervention from the perspectives of the NHS and the local authority, which is described in sections 2.10-2.12 inclusive.
Our difference-in-difference analysis aimed to test whether the virtual wards had an impact on the utilisation of health and social care, such as emergency admissions to hospital and admissions to care homes. We compared the health and social care utilisation of virtual ward patients with that of a control group that had been chosen retrospectively to match the characteristics of the virtual ward patients as closely as possible in the period leading up to the start of the intervention.
The use of a control group is essential for estimating what might have happened in the absence of the intervention (the “counterfactual”). It is particularly important in the context of hospital avoidance interventions because typically, many of the patients offered such interventions have previously experienced high levels of hospital use. Such patients have a natural tendency to show reductions in hospital use over time, even in the absence of a specific intervention. This is due to a statistical phenomenon called “regression to the mean”.12 Although the virtual ward design
involved selecting patients on the basis of a predictive model that seeks to take account of this phenomenon, reductions in service use over time are nevertheless possible and need to be accounted for.
The gold-standard approach to selecting a control group is often considered to be the randomised controlled trial.47 This is because randomisation has
the potential to balance both observed and unobserved characteristics
between different groups asymptotically. In the current study, however, we chose to evaluate the effect of the intervention on patients who had already received the intervention, so randomisation was not possible. Instead, we used large administrative data sources to select control groups of patients that appeared similar to the virtual ward patients in the period prior to the start of the intervention, but who did not receive the intervention
themselves.48 While this approach ensured that the groups were similar in
terms of what we could observe, it is possible that the groups differed systematically in ways that we could not observe, thereby threatening the validity of our findings.
We used two methods for ensuring that the control groups were as similar as possible to the intervention group across a distribution of characteristics, namely propensity score matching and prognostic matching.
National matching: we drew patients from comparable areas of England - the ONS Corresponding Health Areas,49 having first
excluded any areas that had a virtual ward, or equivalent, operational during the study period. We identified patients for inclusion as so- called “national controls” by matching on a range of variables derived from hospital data (HES), mortality data, as well as an area-level deprivation score called the index of multiple deprivation.50
Local Matching: we drew patients drawn from the same PCT area who were not admitted to a virtual ward - our so-called “local controls”. We matched these patients using a combination of
variables derived from hospital (SUS) data, GP clinical data,
community health services data, social care data, index of multiple deprivation scores, and mortality data.
We used three methods - propensity matching, prognostic matching and genetic matching- to ensure that the control groups were as similar as possible to the intervention group across a distribution of characteristics (see Figure 3).
Figure 3. Methods for selecting controls
A variety of analytical methods exist to select matched control groups. However, the
principle is always to select, from a larger population, a subgroup of patients who are similar to the patients receiving the intervention with respect to variables recorded for all
individuals. We investigated three methods, propensity matching, prognostic matching and genetic matching, and chose the one that produced the control group that was more closely matched.
The propensity score is an estimate of the probability that a given individual will be recruited to the intervention.51 It summarises a wide range of variables such as age and
prior hospital use into a single quantity. Controls that are selected on the basis of having a similar propensity score are thus expected to be similar in terms of the wider set of variables reflected in the score, if the propensity score model is correctly specified.52 Balance can be
further improved by simultaneously matching on key variables predictive of future health and social care utilisation along with the propensity score,51 using a multivariate distance
measure such as the Mahalanobis distance.53
An alternative strategy for finding controls is to match on the estimated probability of experiencing the outcome (for example, an emergency hospital admission), where this is calculated assuming that the intervention is not in place. This score is called the prognostic
score, and the approach is called prognostic matching.54 Prognostic matching can be
combined with matching on other variables using the Mahalanobis distance.
The final method, genetic matching, is an iterative technique that aims to optimise balance between groups using a genetic search algorithm. It is a generalisation of matching using the propensity and prognostic scores as these scores can be included in the assessment of balance used in the search algorithm.55
When we implemented these approaches, we used matching without replacement so that the control group consisted of distinct individuals. We also chose to calculate propensity and prognostic scores on a monthly basis in order to reflect recent activity. This gave us a choice, for a given virtual ward patient, of whether to use the risk score calculated at the month-end immediately prior to being admitted to the virtual ward, or the score calculated at the month-end immediately following admission. Using the risk score from the month before did not capture very recent events that occurred in the few days before bring admitted to the virtual ward.
We assessed the similarity of the matched control group to the group of virtual ward patients by using the standardised difference. This is defined as the difference in means as a
proportion of the pooled standard deviation.56 Although the standardised difference would
ideally be minimised without limit, Normand and colleagues have suggested that a value greater than 10 per cent is indicative of a meaningful difference between the groups.57 Other
metrics, such as formal t-tests, are not recommended or observational data. 58 We did not
conduct statistical tests to assess the similarity of the matched control group to the virtual ward patients. As argued by Imai and colleagues,58 statistical tests do not form a good
stopping rule for matching algorithms because they are a product of the sample size. They argue (1) that a statistical test would therefore favour scenarios in which cases were dropped from a matching analysis, when this was not in fact desirable; and (2) that statistical tests are also inappropriate from a theoretical point of view because in this context, similarity is a property of the samples rather than of some hypothetical population.