CAPÍTULO III.- IDENTIFICACIÓN Y EVALUACIÓN DE LOS RIESGOS
5. IDENTIFICACIÓN Y EVALUACIÓN DE RIESGOS
5.3. IDENTIFICACION PRELIMILAR DE LOS RIESGOS EN LAS
Multiple prediction models can be aggregated into a meta-model. The different methods are presented and evaluated. These methods aggregate models built in different sample populations and combine their evidence in an attempt to create a more robust model. This avoids disregarding models in preference to creating a new model based exclusively on the available dataset. These methods can improve model prediction as “Madigan and Raftery note that averaging over all the models in this fashion provides better average predictive ability than using any single model” [151].
Currently, there has not been a large volume of meta-modelling methods reported. This is because of an emphasis on creating new models rather than aggregating existing models. However, there has been an advance in methods to aggregate models in recent years. These methods are designed to either use the models and their standard errors in the model building dataset or use an external dataset to aggregate the models.
The next section of the review will present the methods to aggregate multiple prediction models.
8.9.1 Model Averaging
Model averaging, presented by Debray et al (2014) [148], combines multiple models based on their cali- bration in an external dataset. The models are weighted with better calibrated models assigned a large weight in the final model.
To apply this method, the original models are initially recalibrated in the external dataset, using the method presented in Section 8.5.2. This ensures that poorly calibrated models, potentially due to being applied in a different sample population, are not nullified by being assigned a low weight in the meta-model. Next the models are applied to the external dataset and a risk for each patient, p, can be generated from every model (1, . . . , M). The final risk for each individual is the weighted sum of the individual model predictions as follows; ¯ pi = M X 1 wipi, (8.9) where M X 1 wi = 1 (8.10)
The weights for each model (w1, . . . ., wM) now needs to be determined. The simplest solution assigns
each model an equal weight (M1). However, this does not reward stronger performing models. Debray
proposed to assign a weight to each as follows;
wn = e−0.5BICm PM l=1e−0.5BICl , (8.11) where BICm = −2lm+ umln(N ) (8.12)
Here, N is the number of patients in the model aggregation IPD and u the number of estimated parameters used to recalibrate the model. Since all the methods will be recalibrated using an intercept
and scaling parameter u will be fixed at 2. However, if models require more extensive updating then this is reflected with a higher u as a penalty [148]. Then, for each model the log-likelihood, l, is determined as follows; lm = N X i=1 yiln(pim) + (1 − yi)ln(1 − pim), (8.13) where yi = (
0 if disease not present
1 if disease present (8.14)
In a hypothetical scenario, a model with perfect calibration would assign a probability of 0 to disease free individuals and 1 to all diseased individuals. This would result in a log-likelihood of zero. As the model accuracy lowers from this perfect scenario, the log-likelihood increases and as a result the model has a lower BIC in the final weighting.
This method is advantageous because it rewards better calibrated models by assigning them a higher weight in the meta-model which could create a more robust meta-model. The method is straightforward to apply and applicable to a range of diverse model forms. Additionally, this method has reported some favourable results. One study found the meta-model had an improved calibration and discriminative ability [141]. This was supported by an additional study finding the aggregated meta-model outperformed the existing models [150]. Another study found model averaging created an improved model, although only reporting a minor improvement it did consistently improve upon the original models when validated [149]. However, there are some disadvantages to model averaging. Updating before aggregating the models could create a meta-model that is then successful in the sample population but has a poor performance if the target population differs from the sample population. Secondly, to use the final meta-model the original models are run separately to generate a participant’s risk and then combined. As a result of this the meta-model can be large, complex and cumbersome which health professionals and the public may be reluctant to use [148]. This was highlighted in a study in which the final meta-model was too complicated to present [151], although this can be averted by creating an easy to use model interface. The method also assigns extreme weights to some models so the final model is heavily determined by a combination of a few models rather than all the models considered. This is a consequence of the method being similar to Bayesian model selection and assumes “only one of the models is correct” [148]. Therefore, it will not utilise the full wealth of evidence available by rejecting some of the models. Finally, the weightings are based upon the model calibration. As a result, the final model may have an improved discriminative ability, as models with a poorer discriminative ability may be minimised in the final model if they also have a poor calibration. Additionally, other studies found meta-model approaches are a relatively new area of research and needed to be applied to more clinical prediction models before they can be recommended [141].
8.9.2 Stacked Regression
The next method, developed by Debray et al. [148], called stacked regression, is “a method for forming linear combinations of different predictors to give improved prediction accuracy” [152]. The method ag- gregates the models’ linear predictors based on their calibration performance in an external dataset using a minimising likelihood function.
This method does not recalibrate the models using the external dataset before aggregating the models.
The model variables are identified by calculating the maximum likelihood of α0. . . αM by minimising the
following function in the dataset with N participants for M + 1 unknown constraints;
= N X i=1 (yiln(1 + e−a0− PM m=1amLPim)) + ((1 − y i)ln(1 + ea0+ PM m=1amLPim)) (8.15) With constraint am≥ 0 (8.16)
Here yi is a binary variable which takes the value 0 if the disease is not present in the individual and
1 if it is present. For each model (1, . . . , M ) for an individual, i, their risk is defined as LPiM. α0 is the
intercept parameter for the “optimal baseline risk for the validation study” [148]. The constraint for the alpha values to be greater than zero eliminates co-linearity in the meta-model. This inhibits the inclusion of two similar models, such as a model and a recalibrated version of the model, in the meta-model because they would negate each other.
The weighting of the variables in the final model is determined by minimising the likelihood function for all the variables as follows;
Intercept: ˆβ0 = αˆ0+ M X m=1 ˆ αmLP0m (8.17) Parameters: ˆβi = M X m=1 ˆ αmLPim (8.18)
These variables are the different variables in the original models and if a variable is not present in a specific model then the alpha is zero so as to not influence the final aggregated model. Then the final logit model can be calculated as follows;
logit−1 = ˆβ0+ K X i=1 ˆ βixjk (8.19)
There are advantages to the stacked regressions approach to aggregate multiple prediction models. Firstly, this method creates a final formula that is neat and simpler to apply rather than applying multiple distinct models and weighting these. Additionally, the method does not assign an extreme weight to individual models in the final meta-model so the method is a fairer combination of their respective evidence. This method has been implemented and devised a more robust model in comparison to the original models in one study [141]. This was supported by other independent studies which discovered stacked regressions decreased calibration error rates and yielded an improved predictive performance [152, 153].
However, there could be issues with a large quantity of variables in the model designed to measure the same exposure. This can be seen in lung cancer prediction models such as smoking history (i.e. pack years, CPD, smoking duration, quit duration) or family history of cancer (i.e. any cancer, lung cancer, present in more than 2 family members) and these different variables will all need to be incorporated separately in the meta-model. Additionally, the method can be difficult to apply and is more restrictive as to which original models can be incorporated into the meta-model. Different model designs will not have a similar linear predictor or a single coefficient for each variable; some may have cubic, splines, or conditional coefficients. For example, for one variable age, i, considered in two distinct models, X and Y , could be incorporated into each model as follows;
ix = α × Age (8.20) iy = (β0× Age) + (β1× Age) (8.21) where β0 = ( x, if age < 65 0, if age ≥ 65 (8.22) where β1 = ( 0, if age < 65 y, if age ≥ 65 (8.23)
These two variables could not be combined because of the different model forms. The systematic review (Section 3) supports these concerns as distinct model forms were identified. Additionally, some models consider scaling factors in the linear predictor which is observed for lung cancer in the form of age, gender,
and smoking status specific incidence rates. Therefore, combining the models using stacked regressions is problematic. In the original article Debray et al. [148] assumes that there is a set of core variables across all the models and the models have the same form. However, this is not commonly the case and many distinct models have been designed to predict the same outcome. A study highlighted this concern that the quantity of models averaged is generally low because of difficulties combining the distinct clinical models [152]. A solution is to reduce the pool of models to models with a similar form. However, this could exclude successful models and their evidence from the final meta-model. Finally, not recalibrating the original models may result in some successful models being unfairly assigned a lower scaling factor if applied in a different sample population. It has been argued that not recalibrating the original models means the meta-model borrows less information from the model aggregation dataset in comparison to model averaging [150]. This is a relatively new field and more research is required before stacked regressions can be recommended [141].
8.9.3 Bayesian Model Averaging
The next approach to aggregate models is Bayesian Model Averaging (BMA), a similar method to the weighted averaging method (Section 8.9.1) [148].
Bayesian model selection methods can be used to identify the strongest model, often from a series of nested models, in an external dataset. However, BMA combines multiple models into a meta-model [154]. The methodology for BMA has been available for a considerable time [155].
The models are weighted based on their calibration in an external dataset but the methods are applied without model recalibration. This may result in successful models being unnecessarily assigned a low final weighting if the dataset is different from the model building dataset. Each model is assigned a weight based on their calibration in comparison to the other models, the final meta-model is the sum of the weighted averages from the different models. For example, the aggregated probability for a patient, j, using models 1, . . . , M is expressed as; Yj = M X i=1 wiYˆj (8.24)
Where ˆYj is the risk from one model for person J (8.25)
The weights for BMA are determined by calculating the posterior model which is expressed as P r (Mj|Data).
This approach then compares different models together using the Bayes factor. The following equation is used to assess the ‘evidence’ of preferring Model 2 in comparison to Model 1;
Bayes Factor: Bn= P r(M2|Data) P r(M1|Data) ÷P r(M1) P r(M2) (8.26)
Setting all prior odds as equal, P r(M1) = P r(M2) = P r(MM), means that there is no initial bias
towards assigning a higher weight to any model. This is the case in most instances, where there is no information to determine whether a model should be assigned a higher weight. As a result, the Bayes Factor becomes a measure of posterior odds between the models.
However, a second version of BMA could be applied which offers a bias towards models based on their discriminative ability. This aims to allow models that are successful at identifying individuals with a disease to be higher rewarded. Models with a stronger AUC performance can be assigned a higher weight by assigning each model prior odds as follows;
Pr(Mj) = P r(Mj) PK i=1P r(Mi) (8.27) where K X i=1 P r(Mi) = 1 (8.28)
The posterior odds can be calculated based on the model calibration and the final model weights can be determined; wj = P r(Mj|Yn= yn) = mj× P r(Mj) PK i=1mi× P r(Mj) (8.29) where K X j=1 wj = 1 (8.30)
Here P r(Mj) is either equal for all models when considering a non-informative prior or calculated using
equation 8.9.3 to incorporate the discriminative ability into the weights as an additional weighting. To
calculate the weightings mj needs to be calculated as follows;
mj =
Z
Lj(θj)pj(θj)dθr (8.31)
This integral can be converted, and was presented as follows [155];
log(mj) = log(Lj(θj)) − dj 2log(n) (8.32) where lj = N X i=1 yiln(pil) + (1 − yi)ln(1 − pil), (8.33) where yi = (
0 if disease not present
1 if disease present (8.34)
This is the log likelihood minus the dimension of the model, dj, where n is the number of participants
in the datasets. Then the individual model weights can be determined.
This method is easy to apply and all models of different forms to be included in the meta-model. Addi- tionally, the discriminative ability of the models can be incorporated into the meta-model, by considering an additional weighting, which may allow a more robust selective screening tool to be devised. It may prevent models with a good discriminative ability but poor calibration being harshly penalised in the meta- model. When previously implemented BMA has outperformed the original model in terms of calibration but also discriminative ability. While the benefit is typically minor it has been shown to be consistent [149]. An independent study concluded BMA improved the model predictive performance and when there is prior information this should be included to further improve the model predictive performance [151]. Finally, BMA was argued to provide a coherent approach for incorporating uncertainty due to variable selection and model form [157].
There are some concerns with BMA. The final meta-model formula can be complex as each model is run independently discouraging pubic application [151]. Bayesian model selection operates under the assumption that one model is the correct model; with this assumption BMA will assign a high weight to one or two leading models and minimise the remaining models. This may not effectively combine the evidence across the models. Additionally, with this approach the calculation is more challenging and “the number of terms can be enormous rendering exhaustive summation infeasible and the integrals implicit can in general be hard to compute” [151]. Finally, BMA does not initially recalibrate the models this could negatively influence more successful models which have a poor calibration in a differing external dataset. Some studies argued that while BMA marginally improved upon the original models there was no leading method to aggregate the models [149].
8.9.4 Univariate Meta-Analysis
The next method is univariate meta-analysis and was published by Debray et al [158]. To apply this method for each model the regression variable coefficients and standard errors are required, although an external dataset is not required. Univariate meta-analysis considers the variable coefficients as sample statistics with sample standard deviations.
The least squares approach is to combine the variable coefficients [158] as follows;
wij =
1
σ2
ij + τj2
(8.35)
“Where τj2 is the between-study variance of βj” [158] which is the variable coefficient in the model and
sigma the standard error in the ith model for the jth variable. These weights are then applied to the model
specific variable coefficients assuming a random effects model. Each variable is calculated separately to devise the final meta-model.
The method expects the models to have a large overlap of variables and “that identical model formu- lations are available for the published prediction models” [158]. This allows the variable coefficients to be combined in the meta-model [158]. As identified in the systematic review (Section 3) distinct prediction models often incorporate a large range of variables, including different measures for the same exposure (smoking history and family history of cancer) and the model form differs. This could limit the practical- ity of univariate meta-analysis as models may be removed to satisfy these conditions. This may reduce the pool of applicable models that can be combined and unnecessarily exclude successful models. A study found implementation can be difficult when the literature models greatly differ in terms of included vari- ables [150]. Additionally, the method does not utilise evidence in the external datasets. The weighting is based on the strength of the coefficients in the model building datasets. However, these can have unusual dataset designs, such as high or low risk of developing a disease sample population. With this approach, models are not penalised if they are unsuccessful in new populations. When this approach has been used the meta-model did not report an improved performance in comparison to the original models [150].
In summary, while univariate meta-analysis may create a presentable final model there are concerns with this method. The models require the same form and a core set of variables. Models with different forms will be unnecessarily excluded reducing the pool of evidence included in the meta-model. The evidence is further reduced by not using an external dataset to base the meta-model, which may also identify poorer or stronger performing models which should be penalised or enhanced in the final model respectively.
8.9.5 Multivariate Meta-Analysis
The next method presented by Debray et al. [158] is multivariate meta-analysis. This combines multiple models by estimating the with-in prediction model variance to capture correlation between variables within the prediction model, and the between study covariance, to capture the heterogeneity between the studies [158]. To apply this method the models require the same model form with a core set of variables to combine