• No se han encontrado resultados

Variedad de textos y su secuencia didáctica 113

0. Introducción 17

2.2. La gramática de Pietro Tomasi (1779) 89

2.2.6. Variedad de textos y su secuencia didáctica 113

shown in section 2.3.1.3. In the following examples, the choice was influenced by the research question and available data.

2.3.2.1 Summary points only

Example: Rapid diagnostic tests for uncomplicated non-falciparum malaria

RDTs give a binary test result based on a colour change (visible test line) on a strip to indicate the presence of antigens produced by malaria parasites in the blood of infected individuals. This is a binary outcome, therefore it is reasonable to focus on the estimation of summary sensitivities and specificities (summary points). Also, because a common threshold for the judgement of a colour change is assumed, the summary estimates are meaningful and

clinically applicable. Overall accuracy (measured by the DOR) is not of interest here because consequences for missed malaria cases outweigh those for false positives.

74 A bivariate model that included a covariate for test type was used to investigate the

association of test type with sensitivity and specificity (equations 1.14 and 1.15 in section 1.5.4.1). The bivariate model was chosen because it directly models sensitivity and specificity unlike the HSROC model. Figure 2.5 shows the summary points for the three RDT types on a SROC plot. Each summary point is surrounded by a 95% confidence region to show the uncertainty around the point estimate, as well as a 95% prediction region to visually illustrate the extent of between-study heterogeneity for each test.

Figure 2.5| SROC plot of rapid diagnostic tests for non-falciparum malaria

The solid circles are the summary estimates of sensitivity and specificity for each RDT type, and are shown with a 95% confidence region (dotted line) and a 95% prediction region (dashed line) around each summary point. The summary point for Type 2 and the 95% confidence region for Type 3 are not visible because Type 2 and Type 3 have identical summary estimates and 95% confidence regions but their 95% prediction regions differ. The size of the symbols for study specific estimates was shrunken to make the summary points visible. (Adapted from Abba et al 201496)

75

2.3.2.2 Summary points at fixed specificity

Example: First trimester serum tests for Down's syndrome screening

Although studies reported results at different thresholds, it is common in this clinical field for studies to report sensitivity (detection rate) at a fixed specificity (usually a 5% FPR). The chosen FPR level is determined as the FPR deemed acceptable in a particular screening programme. Since all specificities are the same value, there is no need to account for correlation between sensitivity and specificity across studies in a hierarchical meta-analytic model. The main meta-analysis comparing test accuracy included only studies that used a 5% FPR threshold. A univariate random effects logistic regression model (a bivariate model reduced to two univariate models as explained in section 1.4.4.1) that allowed for a separate variance term for the random effects of logit sensitivity for each test was used.97 Equation 1.16 was simplified to a univariate model as

(

)

(

(

)

2

)

,

~ A A k Ak

Aik N µ v t σ

µ + (2.1)

where µAikis the logit sensitivity for the kth test within the ith study; tk represents the kth test;

A

µ estimates the expected logit sensitivity for the index test used as the reference category (referent test and not reference standard),µA +vAtkestimates the expected logit sensitivity for the kth test, and σ𝐴𝐴𝐴𝐴2 is the variance of logit sensitivity for the kth test.

Based on all available data for the nine test combinations described above, Figure 2.6 shows the point estimates, including confidence intervals, of detection rates for a 5% FPR. For example, the plot shows that for the double test with a marker combination of free βhCG, AFP and maternal age (labelled G), the estimated detection rate at a 5% FPR was 49% (95% CI 39% to 60%) based on data from three studies with 157 affected cases out of 2,992 participants. The test combinations in Figure 2.6 were ordered according to decreasing

76 detection rates. The single test strategies with and without maternal age (PAPP-A alone; free βhCG alone, PAPP-A and maternal age, and free βhCG and maternal age) have the worst performance, whereas, the triple test strategies (ADAM 12, PAPP-A, free βhCG and maternal age; PAPP-A, free βhCG, AFP and maternal age) have the highest performance.

Figure 2.6| Sensitivity (detection rate) at a 5% false positive rate for the 9 selected test strategies

Sensitivity is presented as percentages. Each circle represents the summary sensitivity for a test strategy and the size of each circle is proportional to the number of Down's cases. The estimates are shown with 95% confidence intervals. The test strategies are ordered on the plot according to decreasing detection rate. The number of studies, cases and women included for each test strategy are shown on the horizontal axis. A=Age, PlGF, PAPP-A and free ßhCG; B=Age, PAPP-A, free ßhCG and AFP; C=Age, ADAM 12, PAPP-A and free ßhCG; D=Age, PAPP-A and free ßhCG ; E=Age, PAPP-A; F=PAPP-A; G=Age, free ßhCG and AFP ; H=Age, free ßhCG; I=Free ßhCG.

77

2.3.2.3 Summary curves and points

Example: Screening tests for bipolar disorder– detection of any type of bipolar disorder in mental health centre settings

The total score range from 0–25 points for the BSDS, 0–15 points for the MDQ and 0–32 points for the HCL-32. The cut-off recommended by the developers of each of the screening instruments is 7 for the MDQ,107 13 for the BSDS,108 and 14 for the HCL-32.109 However, studies used different cut-offs to define a positive screen for each instrument (Figure 2.7).

78

Figure 2.7| Forest plot of screening tests for detection of any type of bipolar disorder (BD type I, BD type II or BD NOS) in mental health centre settings

FN = false negative; FP = false positive; TN = true negative; TP = true positive.

The plot shows study specific estimates of sensitivity and specificity (with 95% confidence intervals) at a specific cut-off. The studies are ordered according to cut-off and sensitivity. (Adapted from Carvalho et al 201556)

79 The diagnostic accuracy of the MDQ, the BSDS and the HCL-32 was compared using a HSROC meta-regression model to assess the effect of test type on accuracy, threshold and/or shape parameters of the model (see equations 1.17 and 1.18 in section 1.5.4.2).56 The indirect comparison included 44 studies (Figure 2.7). Based on the relationship between the HSROC and bivariate meta-regression models (section 1.5.4.3), summary points were also estimated by applying the HSROC model to only studies that used the recommended cut-off for each instrument. The summary estimates are shown in Table 2.5.

Table 2.5| Accuracy of the BSDS, HCL-32 and MDQ for detection of any type of bipolar disorder in mental health centre settings

Instrument Cut-off N Cases Patients Sensitivity (95% CI) Specificity (95% CI) BSDS 13 3 474 559 68.8 (63.3–73.7) 85.9 (73.9–92.9) HCL-32 14 9 1,845 4,807 81.2 (76.7–85.0) 66.7 (46.7–81.9) MDQ 7 19 969 3,220 65.0 (56.8–72.4) 78.8 (72.5–84.0)

Sensitivity and specificity are presented as percentages. Summary sensitivity and specificity are shown for each instrument at the recommended cut-off.

(Adapted from Carvalho et al 201556)

This example clearly shows that test comparisons may be based on a comparison of summary points and SROC curves in the same review. The feasibility of both types of analyses will depend on the data available and whether common cut-offs are used in practice. In the bipolar disorder review, the choice of a common cut-off was based on recommended cut-offs but may be data driven in other scenarios. Although summary points can be estimated for each test at each threshold for which data are available, ranking of the sensitivities and/or specificities of the tests will not be consistent across thresholds if accuracy depends on threshold. In such situations, a comparison of SROC curves is more appropriate if the curve for each test is allowed to have its own shape (equation 1.17), thus enabling accuracy to depend on threshold

80 and the crossing of curves. This is evident in Figure 2.4 (panel A) which shows that the

SROC curves for the three tests cross, indicating no test is consistently more accurate than any of the others and relative accuracy depends on threshold. This analysis is discussed further in section 2.3.3.1 while Chapter 4 focuses on identifying synthesis methods and test comparison approaches that have been used in recent systematic reviews.

2.3.3 Should a common shape be assumed for SROC curves across tests?