LA TRAMPA DE LAS 4P

Different regression techniques were explored to develop mapping functions for two widely used cancer measures, the EORTC QLQ-C30 and the FACT-G to the EQ-5D. In addition to methods such as OLS, which are widely used in the literature, a newer method that takes into account the characteristics typically seen in the distribution of the EQ-5D the LDVMMs were also applied for the FACT-G.

Response mapping gave the best predictions for the combined EORTC QLQ-C30 data sets. This model used all dimension scores, age and gender to estimate the EQ-5D index. Compared with other models

ﬁtted to this data set, this was best at predicting the overall MAE and mean and MAE per health status group. The mapping function is based on pooled data from three data sets, which was necessary in order to give a large enough sample to produce more reliable and representative mapping estimates. The data were from three different types of cancer and, therefore, could be argued to be more representative for use in other populations of mixed cancer types than other published mapping models. We also explored a range of models not previously examined in other studies.146,147,197–199,216

Only one previous study had mapped from FACT-G to EQ-5D and the mapping estimates were not reliable.105_{At this stage we do not know whether our}

findings are generalisable to other studies. Given the small amount of patients in the severest levels of HRQL in the FACT-G data set, the generalisability of those estimates are likely to be limited when compared with other populations containing patients at the severe end of the HRQL scale. Of the models_fitted to the FACT-G data set, OLS and the tobit model using significant items gave the best estimates according to the mean predictions for the overall sample and the subgroups defined according to severity, and these models also performed well in the EORTC QLQ-C30 data sets. The model based on splining gave better median predictions and the response-mapping model performed best in terms of shrinkage. Only one LDVMM speci_fication was_fitted for the FACT-G data set, which included only dimension level information, gender and age. This model performed better than the equivalent linear model for the FACT-G and was shown to generate the main characteristics of the original distribution of EQ-5D in the data set. Even though the response mapping model results did notfit the data as well as other techniques, it is the only one, with the exception of the LDVMM, which can generate the features observed in the distribution of EQ-5D data. It does, however, ignore the ordinality of the data, and it is possible that more_flexible models for response mapping, such as those presented in Hernández Alavaet al.,213_{or further developments will increase the predictive ability of this modelling approach.}

When considering the development of mapping functions, we could consider the size of sample needed to produce reliable functions. However, there are no rules for sample sizes in predictive modelling like

prognostic modelling and mapping modelling but a rule of thumb is to have at least 20 individuals per independent variable.217_{For simple models like OLS, this would mean that a model including four dimensions}

would require a minimum of 80 individuals and a model including 27 items, each withﬁve levels, would require 2160 observations (4 × 27 × 20). For response mapping models, the number of variables would relate to the smallest response category (usually level 3 for EQ-5D dimensions) and to work out sample size

For example, if this was 3%, for a model including four dimensions you would need 2667 (80/0.03) observations or 27 items with_ﬁve levels 72,000 (2160/0.03) observations.

To our knowledge, this is the_ﬁrst time that uncertainty has been accounted for in parameter (coef_ﬁcient) estimates from mapping functions. At this stage we do not know what potential allowing for this

uncertainty will have on NICE decisions. Future research needs to build on this and allow for uncertainty in the original EQ-5D estimates as well as the selection of appropriate models.

Generally, both OLS and tobit models using item level EORTC QLQ-C30 and FACT-G models gave some of the best model estimates and for FACT-G produced the best models, while for TPMs, domain level models gave better predictions. Other studies haveﬁtted CLAD and generalised linear models as mapping

functions. Like the tobit model, the CLAD model also deals with the limited nature of the data and produces consistent estimates in the presence of heteroscedasticity and non-normality. Median based models are not usually used for economic evaluation as, particularly when applied to costs, when

aggregated, may not accurately reﬂect the total cost or beneﬁt for the population.218_{Therefore, this model}

was not_fitted here. Generalised linear models were not_fitted either as they did not improve model_fit over OLS models.

In terms of model selection, mapping studies in the literature report different modelﬁt and model selection criteria, some focusing on model goodness of_ﬁt, others on the predictive ability of the model. Models should be selected mainly on their predictive ability, but other considerations may also be taken into account. Even still, there are still a number of criteria from which a model can be selected and different choices can result in alternative models being selection. In this chapter, rather than choosing one performance statistic to select the best model, we have given equal weighting to the overall mean, median, MAE, shrinkage and the mean and MAE per health status group. Further work should be undertaken to examine whether the criteria we have included are the optimal criteria to be used when judging mapping functions. For example, measures such as MAE and RMSE are not often used in other analyses of individual level data because heterogeneity across individuals is considerable, making these measures very insensitive to model improvements. This is an even greater problem when using dependent variables that span an extremely small range such as EQ-5D. The ranking method used here does not account for the magnitude of the predictions and how close they are to the observed data; further work should be undertaken to incorporate this into selecting the best models.

One of the other methodological factors that should be taken into consideration when carrying out mapping is the sample size used when producing the mapping functions. Response mapping produced poor predictions for FACT-G, although it was the best-_fitting model for EORTC QLQ-C30. This was a result of the sample not covering the poorer health states but is also a function of sample size. With a larger sample, it would be possible to obtain more accurate predictions of the 3% of the sample being in level 3 for an EQ-5D dimension, for example. Further work is needed on sample size recommendations for the more complex models such as response mapping and LDVMMs. However, given the typically small size of cancer studies, it may be difficult tofind studies with large enough samples to carry out the analysis. Combining data sets, as carried out for the EORTC QLQ-C30, offers an alternative when available and using mapping functions based on simpler techniques, such as OLS, may be the only option when these are not available.

Chapter 4

Developing

_‘bolt-on_’

items to EQ-5D

LA TRAMPA DE LAS 4P

Chapter 4

Developing

‘bolt-on’

items to EQ-5D

_‘bolt-on_’