1. INTRODUCCION
2.4. Estructura De La Célula Solar
The Kappa statistic is used to measure agreement for dichotomous variables. Kappas range between - I and +1 with a score of zero rqjresaiting only random agreement. Scores below 0.4 rqjresent poor agreement and above 0.7 represent good agreement. Between these scores agreement is said to be fair/good.
Table 4.4 presents the Kappa scores for the agreement on identification of clinical signs for the study and control groups at both stages of data collection. Variables for which several choices were offered to examiners have been dichotomised (for instance, the data for fiery red, red, normal or other in the appearance of the marginal gingiva).
G enially the reliability for all variables was fair. In Stage 1 the study group showed greater agreement than the control group in the recognition of increased probing depth. In Stage 2 the study group showed greater reliability for two variables (colour of marginal gingiva and probing dq>th).
4.43 Reliability of diagnosis
The proportion of subjects agreeing on the diagnosis for each slide in the Stage 1 and 2 reviews was tabulated for the study and control groups. The diffCTences in the numbCT of reviewers agredng before and after the calibration session were determined and analysed by
the Wilcoxon rank-sum test. The median change in agreemeit level between the two reviews was 0.0 in the control group and 0.3 in the study group {p - 0.0113), indicating that the training had improved the level of agreement.
The proportion of subjects in the study and control groups who agreed on the diagnosis of each slide in the stage 2 review was also analysed by the Wilcoxon test. The median agreement for the control group was 0.33 and for the study group, 0.60 {p = 0.0054), indicating a significantly higher level of agreement in the trained group.
Frequency of diagnoses by rank of diagnosis in the original, control and study groups are expressed graphically in Appendix 2.
é Discussion
The findings of this study are compatible with our knowledge of diagnosis. Examiner reliability is often poor, but improves with training and calibration (Smith et al, 1970; Alexander et al, 1971; Cowell et al, 1975). The use of photographs may have limited the information available to the examiners and reduced their ability to diagnose accurately. However, this technique is useful for calibration for acute conditions which are uncommon and painful. Moreover, the use of photographs standardised the examination.
4J%1 Signs associated with diagnoses
Examiners associated similar clinical signs with chronic marginal gingivitis and HIV- associated gingivitis (Table 4.3). The only signs associated with these conditions with different frequency were punctate erythema of the attached gingiva (12% and 28% of diagnoses, respectively) and a distinct red band of the gingiva (82% and 50% of the diagnoses, respectively). Thus, with differences between the frequencies of 16% and 32%, in only a maximum of 48% of cases can the distinction between these two conditions be attributed to these two variables. This figure is an over-estimate since examiners more commonly associated red bands with conventional gingivitis than with HIV-gingivitis. Distinct red bands are strongly associated with HIV-gingivitis in the literature (Winkler et al, 1989)) suggesting that diagnoses were made using other criteria.
Examines readily distinguished HTV-P with and without ulceration. HIV-P without ulca*ation lacked the marginal red bands, swelling and spontaneous bleeding which they associated with the ulcerated form. The distinction between HIV-P with and without ulceration is new and was introduced to reflect two différait presentations in the lito^ture
Diagnostic criteria: Development and reliability
Table 4A, Kappa scores for inter-rater agreement on signs of periodontal diseases
Sign
Stage 1
Study Gp (95% Cl) Control Gp (95% Cl)
Stage 2
Study Gp (95% Cl) Control Gp (95% Cl) Red marginal gingiva 0.39 (0.24,0.53) 0.64 (0.46,0.82) 0.57 (0.42,0.72)t 0.01 (-0.17,0.19)t
Red attached gingiva 0.48 (0.33,0.63) 0.50 (0.33,0.69) 0.53 (0.38,0.68) 0.46 (0.28,0.64)
Red band 0.45 (0.31,0.63) 0.38 (0.19,0.56) 0.59 (0.45,0.74) 0.44 (0.27,0.62)
Swelling 0.31 (0.17,0.44) 0.21 (0.03,0.39) 0.17(0.03,0.31) 0.19(0.01,0.37)
Spontaneous bleeding 0.33 (0.18,0.48) 0.53 (0.34, 0.72) 0.63 (0.44, 0.82) 0.30(0.11,0.49)
Increased probing depth 0.75(0.60,0.89)* 0.31 (0.12, 0.50)* 1.00(0.84,1.00)1: 0.24(0.06,0.41)$
Recession 0.50 (0.34, 0.67) 0.30(0.08,0.52) 0.71 (0.55,0.88) 0.76 (0.54, 0.98)
Ulceration marginal gingiva 0.51 (0.37,0.65) 0.71 (0.52,0.90) 0.95 (0.81,1.00) 0.64 (0.46,0.82)
Ulceration attached gingiva 0.38 (0.23, 0.62) 0.27 (0.07,0.46) 0.57 (0.42,0.71) 0.59 (0.40, 0.82)
Ulceration mucosa 0.48 (0.34,0.62) 0.48 (0.27,0.69) 0.74 (0.59,0.88) 0.64 (0.45, 0.82)
Exposed bone 0.08 (-0.06, 0.23) -0.04 (-0.23,0.16) -0.01 (-0.16,0.13) -0.01 (-0.20, 0.17)
(Greenspan et al^ 1986). Whilst it is possible that the lesions which the examiners tmned mV-periodontitis without ulceration' represexited a destructive disease, another explanation is they are areas of arrested destruction. In that case, the term 'periodontitis’ is a misnomer and localised inter-dmtal attachment loss' might be more appropriate.
A distinction bd;ween these two presentations might account for the microbiological differences betweœ HTV-P' and ANUG. Localised areas of interproximal cratering and minimal pockets would be expected to have different microflora from lesions with progressive ulceration, though both might be tam ed H IV -F from the early descriptions. The same misclassification allows over-estimation of disease prevaloice.
Examiners associated all the ulcerative changes with swelling and ulceration of the gingiva. Necrotising Stomatitis and HIV-P with ulceration were associated with more extensive tissue damage (as evidenced by recession in Table 3) but there were no other differences in the features associated with the diagnoses. These data suggest that diagnoses were not made by use of criteria common to all examiners. A hypothesis supported by the poor agreement on diagnoses.
Other signs or a particular combination of signs may have led examiners to a particular diagnosis for each photograph. However, the data do not suggest this and the poor agreement on identification of signs and on diagnoses further suggests that diagnoses were made intuitively.
4.5.2 Reliability of recognising signs
In both stages of this study, examiners showed only fair reliability identifying clinical signs such as redness of the marginal gingiva or ulceration (Table 4). This emphasises the need for careful calibration before any study involving sevoal examiners.
After calibration, the study group agreed on the colour of the marginal gingiva more than the controls. This improvement is encouraging but constitutes only one variable of eleven studied. Alexander (1971) did not detect any improvement in agreement until th a e had been three calibration sessions. Calibration in this study involved only reading of the criteria and one two hour session with photographs; thus, profound improvements were not expected.
Diagnostic criteria: Development and reliability
4 ^ 3 Reliability of diagnosis
The intuitive diagnoses and mediocre agreement on the recognition of signs contributed to the poor diagnostic agreement in the initial stage of this study (Table 2). The implications of this are important. Authors who have reported on HIV-associated periodontal conditions might not have been looking at this range of conditions at all. It must be stressed that the examiners in this study were ‘experts’ who were experienced and well informed. All worked in the same department of a dental school and it is likely that there is greater agreement among this group than among less experienced examinas not working in such close proximity.
There was significant improvement in the inter-examiner reliability of examiners who were trained and calibrated in the use of the diagnostic criteria, with no improvement among non calibrated controls. This resulted in significantly more agreemait among the calibrated group. These findings confirm the value of training and demonstrate the reliability of the criteria.