3. Análisis de antecedentes
5.6. Diagrama de secuencia
We start by considering possible sensitivities for our MCS example, and select a range for implemen- tation and analysis. Each of our chosen sensitivities varies from our base model, JMD, in a single aspect so their individual effects can be assessed. A second stage of sensitivity analysis could combine several sensitivities which are shown to have a sizeable impact on results.
8.1.1 Sensitivity to model of interest assumptions
We revisit the assumptions made in setting up the original model of interest, and decide that the areas of chief concern, given our analysis of the complete cases (Section 5.3.1) and the insights gained from the simulations undertaken in Chapter 4, are as follows:
• choice of error distribution; • choice of explanatory variables; • choice of transform for the response.
For each of these, there are a number of alternatives to the choice incorporated in JMD, but we limit our sensitivities to the alternative we consider most plausible.
Varying the error distribution
In selecting our initial model of interest we considered using Normal or t4 errors, and chose t4 errors for robustness to outliers. As a sensitivity we run a model, JMG, which is the same as JMD but uses a Normal error distribution instead of a t4 error distribution. (JME and JMF will be introduced shortly.)
Kenward (1998) provides an example where changing from a Normal to a heavy-tailed t distribution removed the evidence for MNAR, because the outliers were better accommodated. We started with a t4 distribution, but found some evidence of MNAR, and are interested in whether the amount of evidence changes if a Normal distribution is used.
Incorporating additional explanatory variables
In Section 5.3.1 we found evidence for the inclusion of an age2 term and age × edu interaction terms, but excluded them to avoid adding complexity at an early stage of model formation. We form JMH by adding both these terms to the model of interest in JMD. Their inclusion is complicated by the need to ensure consistency in the imputation of the missing values of age, age2 and age × edu. This is achieved by calculating the missing values of age2 and age × edu using the imputations of age. As with age, age2 and age × edu are standardised to improve convergence using fixed values of their means and standard deviations based on the observed data. In calculating age2, we square the original
(unstandardised) age and then standardise.
An alternative transform
From our exploration of selection models with simulated data in Chapter 4, we know that the choice of response transform is a key assumption, since distributional skewness can be confused with informative missingness. As an alternative to the log transformation, we use a cube root transformation in setting up model JMI, which is otherwise identical to JMD.
8.1.2 Sensitivity to response model of missingness assumptions
As with the model of interest, we could set up sensitivities in which the explanatory variables in the response model of missingness are varied. However, we restrict our attention to sensitivities involving the functional form and priors of these variables.
In building a joint model which incorporated expert knowledge in Chapter 7, we set up two models, JMB and JMC, which can be regarded as sensitivities to JMD for assessing the impact of incorporating expert knowledge. JMB excludes all aspects of the elicited information, while JMC uses the functional form of the response model of missingness parameters suggested by our expert but non-informative priors on these parameters. As further sensitivities to the choice of response model of missingness priors, we run two other models, JME and JMF. JME has response model of missingness priors generated from the initial elicitation, and in JMF all the response model of missingness parameters are fixed to their final elicitation median values and not estimated.
If the response missingness is MAR then δ1 = δ2 = 0. We know that these two parameters are difficult for the model to estimate, and that our expert found the part of elicitation involving their associated variable, change in income between sweeps, the most difficult. Verbeke et al. (2001) envisage a sensitivity analysis in which the changes in the parameters or functions of interest are studied for different values of δ. Therefore, we also carry out a sensitivity analysis in which a series of models are run with these two parameters fixed. We refer to this group of models as JMJ, and it contains forty-nine variants which are formed by combining seven values of δ1 with each of seven values of δ2. We use the same set of values, namely {−0.75, −0.5, −0.25, 0, 0.25, 0.5, 0.75}, for both δ1 and δ2. This range encompasses the values elicited from our expert and those estimated by the models we have fitted to date. The design includes seven variants in which the functional form of change is linear, i.e. δ1= δ2, with the δ1= δ2 = 0 variant equivalent to assuming the response is MAR.
8.1.3 Sensitivity to covariate model of missingness assumptions
A range of sensitivities can also be constructed by varying the assumptions in the covariate model of missingness. Most obviously, we could include additional covariates in the equations for ν (Equation 6.7). One variant was explored in imputing the missing values of two correlated binary variables, Section 6.2.1, and found to have little impact on the model of interest parameter estimates. We might also consider expanding the covariate model of missingness to include imputing the missing values of reg. However, as only four observed individuals moved between London and the regions between sweeps, this is a very low priority, which is not pursued.
A further possibility is to allow the covariates to be MNAR, rather than MAR as assumed to date. This raises a number of questions, for example should we use separate missingness indicators for the covariates and the response or should we use an overall missingness indicator for attrition? If we use separate indicators, a new sub-model linked to the existing covariate model of missingness is required. In implementing this we would need a different indicator for each covariate pattern of missingness. Alternatively, if we use an overall missingness indicator for attrition, we then also require a method for dealing with any item missingness that occurs in the response or covariates. Although in theory a model allowing MNAR covariates could be designed, it may currently be computationally prohibitive in WinBUGS. Attempting to implement such a model is beyond the scope of this thesis.
8.1.4 Sensitivity to dataset
Verbeke et al. (2001) point out that one or a few influential subjects can lead to non-random dropout being found in an analysis, and Kenward (1998) provides an example where discarding two subjects with anomalous profiles removes the evidence for MNAR. In setting up our MCS income dataset, we excluded four individuals with suspicious looking extreme pay values (Section 3.1.2). A possible sensitivity analysis might investigate whether any further exclusions are justified and investigate the impact. However, as we have used t4 errors for robustness to outliers, we consider this unlikely to have a large impact.
One of the criteria used in forming our dataset excluded individuals who stopped working in sweep 2. An additional sensitivity might fit a model to an expanded dataset that includes those who moved out of the workforce by sweep 2, treating their sweep 2 hourly pay as missing. The rationale for this is consistency, in that we do not know whether the individuals with missing sweep 2 pay were working or had moved out of the workforce. However, we do not pursue this idea.