• No se han encontrado resultados

Anexo A: Perfil funcional del Ministerio de Higiene Anexo A: Perfil funcional del Ministerio de Higiene

4.2.1 Included methods for estimating the heterogeneity vari-

ance

I present a comparison of seven methods for estimating the heterogeneity variance in a meta-analysis: DerSimonian-Laird (DL) [25], Cochran's ANOVA (CA) [59], Paule-Mandel (PM) [80], Hartung-Makambi (HM) [40], Sidik-Jonkman (SJ) [101], maximum likelihood (ML) [37] and restricted maximum-likelihood (REML) [41]. These seven estimators were selected from the comprehensive list in chapter 2 because of their popularity and availability in statistical software. DL is derived from the method of moments approach to heterogeneity variance estimation and is the most frequently used heterogeneity variance estimator in practice. It is the default method in the Stata command metan and is currently the only method implemented in RevMan software [22]. CA assigns equal weightings to studies and represents a simple alternative to DL. PM assigns random-eects weights to studies, which are considered the statistically optimal weights in the method of moments approach [24]. REML is the default method in the R package metafor [126]. ML is a widely-used approach to statistical parameter estimation and is therefore also included in this analysis. In contrast to these estimators, HM and SJ were selected as non-truncated estimators that always estimate a positive heterogeneity variance.

I stated in chapter 2 that the PM estimator can theoretically be interpreted as a simple approximation of the REML approach in specic situations [94]. The extent of agreement between PM and REML estimates has not been investigated in other empirical studies [117] so both estimators are included and compared here.

estimators. To demonstrate this, many of the estimators not included in the main text are given in gure B.1 in the appendix. This gure shows a comparison between two included methods (DL and REML) and many that are excluded (PMCA, PMDL,

HS, SJCA and ARML). I also exclude Bayesian methods that rely on a subjective

choice of prior distribution because of diculties dening these distributions out of context. Rukhin's estimators [93] are excluded because simulations results later in this thesis show they have poor properties. The estimator proposed by Malzahn, Böhning and Holling [74] is excluded because it can only be used in meta-analyses with a standardised mean dierence outcome measure. I exclude bootstrapping because the approach could theoretically be applied to any heterogeneity variance estimator.

4.2.2 Empirical study dataset

A complete re-analysis of all meta-analyses in the CDSR dataset was possible from the study-level data available. I re-conducted all meta-analyses of dichotomous or continuous outcomes containing at least three studies. Those containing two stud- ies were excluded from the results because it is arguably inappropriate to estimate heterogeneity in such cases. Eect estimates and standard errors were calculated for all studies from basic summary statistics. I calculated the log odds ratios for all dichotomous outcome meta-analyses and standardised mean dierences for all continuous outcome meta-analyses. Hedges' g method was used to estimate stand- ardised mean dierence eects, which corrects for bias caused by small sample sizes [8] and is detailed in section 1.3.1.

4.2.3 Summary statistics

I used four summary statistics to compare the seven estimation methods: (i) the estimated heterogeneity variance, (ii) the estimated summary eect from a random- eects meta-analysis, (iii) the estimated standard error of the summary eect, and

(iv) the p-value for this result. These statistics were chosen because they are the key statistics used to draw inference from a meta-analysis and may be aected by the estimated heterogeneity variance. Furthermore, by comparing standard errors, I can also compare the widths of condence intervals of the summary eect because condence interval formulae for all included methods are otherwise independent of ˆ

τ2.

I calculated standard errors and hence p-values for the overall summary eect (i.e. summary statistics (iii) and (iv) above) using both Wald and Hartung-Knapp meth- ods (i.e. two of the three methods outlined in chapter 3). The Wald method is the currently used as standard in Cochrane meta-analyses [51] and was introduced in section 1.6. The Hartung-Knapp method uses an alternative weighted standard er- ror of the summary eect and derives a p-value from the t-distribution. This method was introduced in section 3.4 and derived from the same approach as the Hartung- Knapp condence interval for the summary eect. I omitted p-values based on the t-distribution method outlined in section 3.3 because they are based on the same formula for the variance as the Wald-type method and therefore results would be be similar.

4.2.4 Data analysis

I illustrate pair-wise agreement between results from dierent estimation methods using Bland-Altman plots [5], thereby illustrating how the discrepancy between two methods depends on the underlying value of the parameter (estimated as the average result across the two methods). Pair-wise plots are arranged in a matrix to facilitate simultaneous comparison of each method with all others. Bland-Altman plots are used to examine the rst three of our four summary statistics (heterogeneity variance, summary eect and precision of summary eect). I superimpose non-parametric 80% reference ranges on the same plots to illustrate the spread of agreement. To calculate the 80% reference ranges, I split meta-analyses into groups of 200 according to their

order on the x-axis and calculated the 10th and 90th percentiles of the discrepancies. The plotted reference range is a smoothed line between the calculated percentiles. Bland-Altman plots traditionally present raw dierences between parameter estim- ates on the y-axis, but precisions of the summary eect are compared as a ratio for two reasons: (1) They naturally conform to a log-normal distribution. (2) By includ- ing precision of the summary eect in this analysis, I can also compare the widths of summary eect condence intervals and these comparisons are more meaningful on the ratio scale. For example, a condence interval that is half the width of another is half as likely to include the null value with all else being equal. Heterogeneity variance estimates also have a skewed distribution in practice [21], but I apply a transformation as detailed below and present raw dierences.

I sought to measure discrepancies between heterogeneity variance (τ2) estimates on

an appropriate scale that would maximise the generalisability of the results and be intuitively interpretable. The most obvious option is to present the raw dierences of τ2 estimates, but the scale of these dierences is too dependent on the average

τ2 estimate (as shown in appendix A.1). Therefore, I transformed τ2 estimates to

the scale of the I2 statistic and present their raw dierences (see equation 1.6 for I2

in the introduction chapter). I consider this a transformation because all parameter estimates other than the heterogeneity variance estimate τ2 remain xed between

methods. Dierences in I2 statistics reect only dierences in values of τ2.

The summary eect and its standard error depend on the scale of measurement. Therefore, I multiplied standardised mean dierences and standard errors from each continuous meta-analysis by a value of 1.81 to obtain a result that is approximately comparable to a log odds ratio [15]. The I2 statistic and p-values for the sum-

mary eect are independent of the scale of measurement and so do not require a transformation. I carried out separate analyses on continuous and binary outcome meta-analyses, but since I found no dierence between the results I present results with all meta-analyses combined.

I compared p-values of the summary eect by tabulating categories of levels of statist- ical signicance. First, p-values were dichotomised at the 5% level to explore agree- ment for the threshold most commonly applied in practice. Second, p-values were categorised to represent a wider range of levels of statistical signicance: p ≤ 0.01, 0.01 < p ≤ 0.05, 0.05 < p ≤ 0.1 and p > 0.1. I considered p-values that dier by at least 2 categories on this ner scale to be suciently dierent to change inference. I recognise the limitations of using statistical signicance to draw inferences [110, 117], but also appreciate their widespread use.

In a secondary analysis, I explored whether the level of agreement between hetero- geneity variance estimates can be explained by two meta-analysis characteristics; the number of studies (k) and the total information (V ). Hardy and Thompson [37] denes the total information as V = Pk

i=1wˆi , which takes into account the number

and sizes of studies. Hardy and Thompson [37] found using simulations that the power to detect heterogeneity (using the Q-statistic from section 1.7.1) depends on these characteristics, so I explored whether they also aect the level of agreement between heterogeneity variance estimators. I illustrate their eects using the same plots of pair-wise agreement as for the main analysis, but with the the number of studies and total information on the x-axes.

Outline

Documento similar