Plataformas y tecnolog´ıas utilizadas
3.2 Juego ”Kill the Croaches”
In this section we will analyse data from the COPD Study in order to conclude with remarkable relationships between the HRQoL of patients and clinical and
socio-demographical variables listed in Section 1.3.1. It has been stated that re-gression models based on the beta-binomial distribution are appropriate candidates for analysing PRO data and, in particular, HRQoL provided by the SF-36 Health Survey (Arostegui et al., 2012). However, we have mentioned in the previous section that there are two different approaches, the marginal (BBreg) and the conditional (BBhglm), for implementing that type of model. Therefore, we are going to make use of both regression approaches to analyse data from the COPD Study and, that way, we will not only look for relationships between the HRQoL and covariates, but also check the performance of both methodologies in real data application.
In terms of statistical packages and software, on the one hand, we have imple-mented the BBreg approach in the R-package PROreg available at CRAN, https://
cran.r-project.org/web/packages/PROreg/index.html. Further discussion about the specific function that performs the BBreg approach is provided in Chapter 5.
On the other hand, the BBhglm approach is implemented in the hglm R-package (Ronnegard et al., 2010) also available at CRAN.
The eight dimensions of the SF-36 Health Survey were the response variables and clinical and sociodemographic variables listed in Table 1.2 were considered as independent variables. Separated models were performed for each of the health di-mension of the SF-36 and exclusively data from the first visit to the outpatient clinic was considered. For variables selection, we retained in the model those covariates whose influence in HRQoL was statistically significant (p-value< 0.05) in at least one of the modelling approaches. For simplicity, clarity and brevity of exposition, we only show results for three of the eight health dimensions of the SF-36. The selected three dimensions (physical functioning, mental health and role emotional ) illustrate different shapes of the distribution (see Figure 1.4) and a wide range of maximum number of scores (m), from 4 to 20.
Tables 2.1-2.3 provide the results obtained from the analysis of the mentioned SF-36 dimensions in the COPD Study by both beta-binomial regression approaches.
Estimates of the regression coefficients, their standard deviations and test of sig-nificance associated with the BBreg and the BBhglm modelling approaches for the selected three health dimensions of the SF-36 Health Survey are displayed. We also show the estimates in logarithmic scale of the dispersion parameter of each approach, α for BBhglm and φ for BBreg.
Real data application leads to several conclusions and interpretation. As regards to the effects of the covariates in the SF-36 dimensions, it can be appreciated that while in physical functioning dimension both algorithms lead to similar estimates,
Table 2.1: Effect of explanatory variables in the physical functioning dimension measured by both regression approaches based on the beta-binomial distribution.
BBhglm BBreg
Physical functioning βˆ SD( ˆβ) p-value βˆ SD( ˆβ) p-value Dyspnea
Mild -0.616 0.111 <0.001 -0.580 0.112 <0.001
Moderate -1.339 0.122 <0.001 -1.281 0.120 <0.001 Severe -2.317 0.178 <0.001 -2.207 0.176 <0.001 Depression
Yes -0.541 0.139 <0.001 -0.544 0.130 <0.001
Anxiety
Yes -0.416 0.096 <0.001 -0.404 0.090 <0.001
Sex
Female 0.469 0.167 0.005 0.461 0.155 0.003
FEV1% 0.007 0.003 0.011 0.006 0.002 0.012
BMI -0.019 0.007 0.011 -0.018 0.007 0.009
Age 0.013 0.004 0.002 0.012 0.004 0.002
Walking Test 0.004 10−4 <0.001 0.004 10−4 <0.001
log(α) -2.656 0.084 − − − −
log(φ) − − − -2.826 0.115 −
SD: Standard Deviation; BMI: Body Mass Index; FEV1%: Forced Expiratory Volume in one second in percentile.
in mental health and, especially, in role emotional dimension regression parameter estimates and statistical significances are completely different. For example, for role emotional dimension, on the one hand, the estimation of the coefficient correspond-ing to anxiety is −6.145 in BBhglm approach and −1.649 in BBreg, becorrespond-ing both statistically significant in the model. On the other hand, the p-value corresponding to the estimate of moderate dyspnea is statistically significant in BBreg approach (< 0.001), but not in BBhglm (0.434).
Due to the fact that the logit link function is used in both methodologies, the interpretation of the regression coefficients β in both approaches is equivalent to the log odds-ratio in a binomial logistic regression model. For instance, the coeffi-cient of depression in the physical functioning model for BBreg approach is −0.544, which means that based on this model the presence of depression increases by 1/ exp(−0.544) = 1.72 the odds of having a smaller physical functioning score.
Table 2.2: Effect of explanatory variables in the mental health dimension measured by both regression approaches based on the beta-binomial distribution.
BBhglm BBreg
Mental health βˆ SD( ˆβ) p-value βˆ SD( ˆβ) p-value Dyspnea
Mild -0.353 0.234 0.134∗ -0.294 0.141 0.037
Moderate -0.853 0.246 <0.001 -0.704 0.145 <0.001 Severe -1.132 0.320 <0.001 -0.961 0.181 <0.001 Anxiety
Yes -1.480 0.204 <0.001 -1.290 0.108 <0.001 Depression
Yes -0.966 0.298 0.002 -0.853 0.157 <0.001
log(α) -0.7647 0.069 − − − −
log(φ) − − − -2.263 0.115 −
SD: Standard Deviation. Symbol∗stands for regression coefficients that are not statistically significant.
Table 2.3: Effect of explanatory variables in the role emotional dimension measured by both regression approaches based on the beta-binomial distribution.
BBhglm BBreg
Role emotional βˆ SD( ˆβ) p-value βˆ SD( ˆβ) p-value Anxiety
Yes -6.145 2.062 0.003 -1.649 0.226 <0.001
Dyspnea
Mild -2.600 5.229 0.619∗ -0.614 0.418 0.142∗
Moderate -3.981 5.080 0.434∗ -1.379 0.413 <0.001 Severe -5.603 5.496 0.309∗ -2.048 0.467 <0.001
log(α) 2.735 0.095 − − − −
log(φ) − − − 0.668 0.150 −
SD: Standard Deviation. Symbol∗stands for regression coefficients that are not statistically significant.
Therefore, at first sight, it seems that both regression approaches lead to com-pletely different conclusions about the effect of the covariates in the HRQoL of
pa-tients with COPD. However, care must be required when comparing marginal and conditional models. In fact, although the interpretation of the parameters is made in the same way, it is worth noticing that they refer to different measurements. For instance, the BBreg approach should be interpreted in terms of a marginal response, and hence, conclusions should be taken in terms of population. Indeed, the linear predictor of BBreg approach is constructed based on the marginal expectation of the outcome variable,
logit (E [Yi]) = logit (pmi ) = x0iβ.
On the contrary, the linear predictor of the BBhglm approach depends on the con-ditional mean,
logit (E [Yi|ui]) = logit (pci) = x0iβ + vi.
We denote pm and pc to refer to the marginal and conditional means respectively, where E[pc] = pm. Therefore, the conditional BBhglm approach describes individual responses, and consequently, interpretation of the parameters is done holding the value of the random effect (a particular value that corresponds to each individual).
Due to the fact that the logit and the expectation operator do not commute (i.e.
E [logit (pci)] 6= logit (E [pci]) = logit (pmi )), it has been shown that each approach is modelling a different measurement, and hence, we cannot compare them directly.
However, there are still some features shown in the real data application that should be explained. First of all, as mentioned before, due to the model definition, we know that regression coefficients estimates through marginal and conditional models may differ. However, differences seem to be larger than expected. Moreover, we know that if a covariate does not affect the individuals, it has no effect on populations; however, the real data application does not show the same. In fact, it can be appreciated in Table 2.3 that the effect of the mild dyspnea is statistically significant in the marginal approach, but not in the conditional approach, which does not make sense with the previous statement. Furthermore, standard deviations of the estimates are completely different in both approaches, which could tell that one of the models is over or under estimating the variances.
Figure 2.1 shows the distribution of the analysed three SF-36 dimensions and the model-fit by the BBhglm approach. It can be appreciated the subject-specific feature of the approach, where the inclusion of a beta random effect per individual accommodates the dispersion of the fitted values. Therefore, it is shown that the distribution of the fitted values corresponds to the observed distribution of the
re-sponses, especially in role emotional dimensions, where results tended to be more misleading (see Table 2.3). Consequently, Figure 2.1 shows that, apparently, the BBhglm approach is correct, at least concerning fitted values, and that it is fitting the relationship between the HRQoL of the patients and covariates adequately.
01020304050
Physical functioning
HRQoL
Frequency
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Physical functioning outcome BBhglm
050100150
Mental health
HRQoL
Frequency
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Mental health outcome BBhglm
0100200300400
Role emotional
HRQoL
Frequency
0 1 2 3
Role emotional outcome BBhglm
Figure 2.1: Observed distribution and fitted distribution by the BBhglm approach of the analysed SF-36 scores.
In general, we have explained that due to the model specification we cannot compare estimates from both approaches directly. However, from our point of view, there are some issues, such as statistical significance and over/under inflation of the variance, that must be addressed in order to conclude with the most appropriate approach to measure the effect of the covariates in PROs, for instance, the HRQoL of patients with COPD. It seems that as the dispersion parameter increases, both approaches conclude in more different results. Therefore, in the next section, we focus on the comparison of the two methodological approaches through a complete simulation study which is divided in different scenarios depending on the value of the dispersion parameter.