The potential gains from retaining more of the information are offset somewhat by the increased complexity in implementation. Some practical considerations related to the implementation of the method are discussed below.
Starting Values
One important consideration in applying the latent variable method is how to choose the starting values for the likelihood optimisation algorithm. Initially, we try setting all the starting values at zero, to determine if this is a practical solution. In this setting, the algorithm is extremely slow to converge. Furthermore, setting the thresholds at the same starting values can be problematic due to the necessary ordering. That is, if at any point the lower limits of the integration in (3.6) exceed the upper limits, the probability cannot be computed and the optimisation fails.
We are interested in choosing more appropriate starting values to reduce the chances of these computational problems as well as to reduce the computational time and increase our chances of converging at the global maximum. One suggestion for this is to use random restarts, starting the algorithm at randomly chosen start points repeatedly [109]. This is computationally intensive and often runs in to difficulty due to restrictions in place, for example that the thresholds must be ordered. Another suggestion is to use the data to inform the choice of start value. This is more challenging in this context due to the fact that we are treating the discrete outcomes as latent.
Proceeding by using the data, the parameters related to the observed continuous variables can be set by fitting separate linear models. The variance and correlation parameters related to these can be set to the values estimated using the data. Treating the ordinal variable as continuous and fitting a linear model with no intercept provides a good starting value for the BILAG treatment parameter. Fitting a linear model to the binary outcome provides poor starting values for the the binary intercept and treatment parameter, however the algorithm still performs well if only these values are poorly specified. Treating these discrete variables as continuous in order to determine the correlations performs well and provides good starting values for these parameters. Having identified proposed starting values for the mean, variance and correlation parameters the latent outcome corresponding to the ordinal BILAG measure can be simulated. Ordinal threshold starting values can be obtained by comparing the distribution of the latent measure with the corresponding observed frequencies in the
Table 3.4: Average execution time in seconds across 1000 runs for the latent variable,
augmented binary and standard binary methods to produce log-odds treatment effect estimates and standard errors
Method Elapsed User System
Binary 0.464 0.439 0.020
Augmented Binary 9.591 9.480 0.092
Latent Variable 4925.2 4862.3 50.42
ordinal measure. However, this technique for selecting the starting values is clearly simplistic. A more elaborate search strategy could be conducted that may substantially speed up the optimisation. This will not be investigated within this thesis and is identified as an area for further research.
Computational Time
Another factor in the application of the latent variable model is increased compu- tational time. The average execution time to provide the outcome of interest with standard errors is shown in Table 3.4. The time of interest is the elapsed time, which expresses the wall clock time taken to fit the model, get maximum likelihood parameter estimates, obtain the probability of interest and its standard error. The standard binary estimate is obtained in less than a second and the augmented binary estimate requires approximately 10 seconds however the latent variable requires much longer, taking 4920 seconds or approximately 82 minutes.
As mentioned previously, the application of these models has been limited in the past by availability of sufficient computational power, which is now more readily available. However, applications of similar latent variable methods in the literature are still largely limited to modelling two outcomes, due to the non-linearity of execution times with increasing outcomes. This is in agreement with our findings, as modelling one outcome using the binary method is nine times faster than modelling two outcomes using the augmented binary method, whilst modelling these two outcomes is over 500 times faster than modelling four outcomes using the latent variable method. Of course these timings depend on many factors, in particular the type of outcome and the number of levels in the ordinal variable. In our case, we find the number of ordinal levels to be the most influential factor in computational time. This is due to the fact that 5 levels in the ordinal variable leads to 10 probability calculations in (3.6), however 3 levels would require the computation of 6 of these joint probabilities. Consequently,
3.3 Latent Variable Model 67
Table 3.5: Benchmarked time in seconds of each of the processes required to fit the
latent variable method to the systemic lupus erythematosus composite endpoint
Function Elapsed time
Likelihood maximisation 3683.1
Hessian 845.6
Probability of response 3.064
Partial derivatives 266.3
the run time will be substantially increased if there are multiple ordinal levels and decreased if the discrete variables are binary.
The increased complexity means that many factors may be responsible for the much slower progression of this model fitting. As we program the likelihood, rather than using a package to do this, it may be possible to code the method more efficiently. Another factor is that due to the increased number of outcomes, the unstructured covariance matrix and the thresholds, the number of parameters to model is greatly increased from nine in the augmented binary method to 21 in the latent variable method. Searching over this 21 parameter space to find a global maximum is complex and computationally intensive and therefore relatively slow. Furthermore, for reasons discussed previously, the Hessian is calculated separately with increased tolerance and the partial derivatives are computed to obtain the standard errors. Table 3.5 shows the benchmarked times of the processes involved in fitting the latent variable method. This highlights that obtaining the maximum likelihood estimates of the parameters accounts for 77% of the required computational time. Benchmarking the optimisation process provides a clearer picture of the bottleneck. Within each iteration, the most time consuming task is the calculation of the bivariate probabilities in (3.6). It is possible to parallelise this calculation using the ‘parapply’ function in R however for our problem we find that this slows down the overall computation. This is due to the fact that there are many of these calculated repeatedly but that each individual calculation does not require much time. In other words, in the time it takes to redistribute the calculations to separate cores, the result is already available on one core. The true bottleneck comes from the fact that the algorithm iterates many times in order to converge. For a problem of this nature it is common to consider coding it in a low level language such as C++. Due to the fact that the process requiring the most time is the optimisation itself, we conclude that it is not worthwhile given that although the model would be written in C++ it would still have to be optimised in R using a similar algorithm.
We conclude that when fitting to one dataset in an applied problem, a computation time of 82 minutes is not infeasible. However for exploring the performance of the methods through simulation, we require an alternative. The solution we propose for this, and apply in our case, is to parallelise at a simulation level across many cores on a High Performance Computer (HPC). For 1000 simulated data sets, using 200 cores, the simulation would complete in under 7 hours.
Model Fit
Goodness-of-fit statistics are well established when fitting univariate models however the assessment of multivariate methods is more challenging. Graphical techniques that involve inspecting plots of the residuals to determine the validity of assumptions such as homoscedasticity and normality are limited in their capacity to capture the structure in more than two dimensions. Furthermore, solutions providing comparative values must add an appropriate penalty for the additional outcomes, for example a modified Akaike Information Criterion (AIC). This is exacerbated by the fact that a subset of the outcomes are latent and therefore difficult to visualise or test. One suggestion in the literature for assessing goodness-of-fit is introduced in [92] for the case when there is one continuous and one ordinal variable. This may be extended to allow for two continuous, one ordinal and one binary outcome for application in SLE, as shown below.
As before, let Yi = (Yi1, Yi2, Yi3, Yi4)′ be the vector of observed responses for patient i.
Partitioning the observed and latent continuous measures, we let Ycts = (Y1, Y2) and Ydis = (Y3, Y4). Then, Σb11 = V ard(Ycts),Σb22 = V ard(Ydis),Σb12 = Σb21 =
d
Cov(Ycts, Ydis). The modified Pearson residuals taking in to account the correla- tion between responses are shown in (3.10).
rpi =Σb− 1 2(Y i− ˆµi) (3.10) where, ˆ µi= (Eb(Yi1, Yi2, Yi3, Yi4))T (3.11) and b Σ = b Σ11 Σb12 b Σ21 Σb22 (3.12)
3.3 Latent Variable Model 69
A Cholesky decomposition may be used to obtain Σb−
1
2 in (3.10). The covariance
between the vector of observed continuous and observed discrete responses is shown below.
Σ12=E(YctsYdis) − E(Ycts)E(Ydis)
=E(YctsE(Ydis | Ycts)) − E(Ycts)E(Ydis) =E(Y1Y2E(Y3, Y4 | Y1, Y2)) − E(Ycts)E(Ydis) =Z y1 Z y2 y1y2 X y3 X y4 y3y4P(Y3 = w, Y4 = k|Y1 = y1, Y2 = y2) fY1,Y2(y1, y2)dy1dy2 − E(Ycts)E(Ydis) Where, P (Yi3= w, Yi4= k|Yi1= yi1, Yi2= yi2; θ) = Φ τw3− µ3|1,2, τk4− µ4|1,2; Σ3,4|1,2 −Φτ(w−1)3− µ3|1,2, τk4− µ4|1,2; Σ3,4|1,2 − Φ τw3− µ3|1,2, τ(k−1)4− µ4|1,2; Σ3,4|1,2 + Φ τ(w−1)3− µ3|1,2, τ(k−1)4− µ4|1,2; Σ3,4|1,2
µ3|1,2, µ4|1,2 and Σ3,4|1,2 are defined in (3.7). Furthermore,
E(Ycts) = Z y1 Z y2 y1y2fY1,Y2(y1, y2)dy1dy2 E(Ydis) = X y3 X y4 y3y4P(Y3 = w, Y4 = k) and P(Yi3= w, Yi4 = k) = Φ(τw3− µ3, τk4− µ4; ρ3,4) − Φ(τ(w−1)3− µ3, τk4− µ4; ρ3,4)− Φ(τw3− µ3, τ(k−1)4− µ4; ρ3,4) + Φ(τ(w−1)3− µ3, τ(k−1)4− µ4; ρ3,4)
χ2p =
N
X
i=1
χ2p(Yi,ˆµi) (3.13)
where p = (w × k) − 1 with ith component,
χ2p(Yi, ˆµi) = (Yi− ˆµi)′Σb−1(Yi− ˆµi) (3.14) Comparing the residuals to the chi-squared value allows us to identify observations which the model does not fit well, as the residuals should follow a chi-squared distribution with p degrees of freedom if the model fits well. If there are many observations unexplained by the model then it may indicate a poor choice, which may be due to the covariance structure Σ and its assumed distribution. Otherwise the joint normality of the errorsb may be an unreasonable assumption indicating that the latent variable model may not be appropriate. It is possible to fit latent variable models which assume a different multivariate distribution for the error terms, however this will not be investigated within this thesis.