All statistical analysis used SPSS (version 15). Prior to data analysis all data were double entered into a spread sheet for accuracy and visually checked for plausible values as part of data cleaning (162).
Assessment of inter-rater reliability involves an assessment of the amount of agreement between raters when they use the assessment tool. Reliability is
calculated as a ratio of the variation between subjects to the total variation (which is the sum of subject variation and measurement error). The resultant reliability
coefficient expresses the proportion of the variation that is due to ‘true’ differences between subjects. The choice of statistical test to calculate the reliability coefficient in this context has been widely debated within the literature (101). Pearson’s correlation coefficient is based on a regression analysis and is used erroneously in reliability studies using continuous data. It has been criticised for ‘measuring the strength of relation between two variables, not the agreement between them’ (163) and as such systematic biases in the agreement can be obscured in the analysis. Where there are multiple observers within a study, the Pearson’s correlation coefficient will calculate the agreement between pairs of observers which can be useful for identifying outliers. However, as there is no agreed way of combining the coefficients of paired observers (101) and given the previous caveats, the Pearson correlation coefficient is unsuitable for use in this study.
The intra-class correlation coefficient (ICC) overcomes these problems and has varying forms according to the assumptions that are made. In this analysis, an ICC [2,1] was used where the 2 is ‘class 2’ to reflect that all subjects were rated by all raters and the 1 to reflect that it is the reliability of a single rater (164). Raters were treated as a random factor. These tests rely on the data having a normal distribution (i.e. parametric), this assumption was checked prior to analysis by visual checking of distribution graphs and quantitative analysis of the data using a Shapiro-Wilk W test as this statistic is suitable for small samples (162).
Internal consistency was assessed using Cronbach’s Alpha (98). Within Cronbach’s Alpha, the means and standard deviations of each item from the assessment tool are assessed for correlation. Perfect correlation = 1, for clinical applications, scores of α > 0.95, indicating high internal consistency, are desirable (98). The following calculation was used:
(
)
Where k is the number of items, is the variance of the ith item and is the variance of the total score formed by summing all the items.
Justification of analysing the data as continuous
Each competency item within the CTS-R-Pain is scored on a 0-6 scale that is broadly comparable for competence levels across each item. Thus scoring on each item is ordinal; however, each increment on the scoring system represents an increment of increasing competency with ‘anchor points’ at each end. There is some debate as to whether the data collected from such scoring systems should be analysed as continuous data or as ordinal data. Kinnear and Gray (165) consider that the decision will depend on several factors including the distribution of the data
and the number of points on a scale. The CTS-R-Pain has a relatively large number of points, seven, and would therefore approximate to the normal distribution. As a result, it was decided a priori to analyse the data as continuous using the ICC if the data were normally distributed (parametric). In the case of the data being non- parametric then the Kappa statistic would be applied.
Other analyses
Validity evidence for the CTS-R-Pain can be provided by finding relationships between the CTS-R-Pain output and other variables expected to correlate with competence. It could be hypothesised that competency might be related to;
1) Previous biopsychosocial attitudes and beliefs that fit in with the culture of a CB approach (3)
2) Previous relevant experience (68)
Attitudes and beliefs were measured using the Pain Attitudes and Beliefs Scale – Physiotherapists (PABS-PT; (51)) prior to the training course for the BeST trial. The
scale measures relative rates of biomedical and biopsychosocial beliefs with lower scores indicating more biopsychosocial beliefs. The tool is reported to have good validity and reliability (166). We could expect that higher levels of biopsychosocial beliefs are positively correlated with greater competence.
Previous experience of running similar groups was determined by the
physiotherapist as a yes/no answer on a data collection form prior to the training course. We would predict that specific experience would be correlated with competence due to the theories of deliberate practice (70) outlined in Chapter 1 (section 1.7.1).
Correlational analyses were conducted to explore the hypothesised relationships between competence and physiotherapist beliefs and previous specific experience.
Sensitivity and responsiveness of the tool were investigated using a one way ANOVA and one-sample t-test respectively. However, due to the small numbers of measurements available for these tests they were considered at high risk of error and have not been reported.
4.4 Results
Eleven of the fourteen physiotherapists delivering the back skills training intervention provided recorded sessions for the reliability study. The three physiotherapists who did not provide tapes had started delivery of groups before the reliability study had been designed, and had not recorded the sessions. The demographic details of the included physiotherapists are shown in Table 13.
Table 13 Characteristics of the 11 physiotherapists who delivered the Back Skills Training trial intervention (based on data for 2006)
Therapist code
Age Gender Years qualified Grade Experience of running similar groups Number of groups run in trial 1 31 F 9 Senior I Yes 12 2 43 M 10 ESP Yes 5 5 45 M 14 Senior I No 4 6 42 F 9 Senior I No 5 7 52 F 31 Senior I Yes 6 11 33 F 12 Clinical Specialist Yes 3 12 43 F 21 Clinical Specialist Yes 3 15 24 M 2 Senior II No 1 16 29 F 7 Senior I No 2 18 46 F 25 Senior I No 3 19 37 F 12 Senior II No 3
NB; Grade provides the level at which the therapist is working within the NHS, with
increasing levels of responsibility and expertise the titles of the grades are Junior, Senior II / Band 6, Senior I, Clinical Specialist / Extended Scope Practitioner (ESP) / Band 7. Gender; M = Male, F = Female
The reliability study commenced approximately half way through the main BeST trial when there were 33 groups still to be delivered. Ten of these groups were felt
number of recordings from one therapist (eight groups excluded on this basis), one physiotherapist was very nervous delivering her first group and recording was felt to be likely to adversely affect performance, one group was felt to be non-representative as it ended up with two participants and was condensed into four sessions. A further six groups were not recorded due to forgetting to record (4), communication mix up (1) and unknown reason (1).
Of the potential 33 groups that could have been recorded, 17 were recorded which comprised six physiotherapists who recorded two groups and five physiotherapists who recorded one group.
The results of scoring the 17 session recordings using the CTS-R-Pain are shown in Table 14.
The recordings broadly represented a spread of sessions 1 to 6 with a majority of the groups run on a Tuesday with a spread across times and location. Within the BeST trial, no relationship was found between attendance/competence and the
day/venue/time variables (88).