Existe alguma possibilidade de se aplicar o “mode- “mode-lo Leoni” à sociedade atual? O que disse anteriormente

A náLISE DE A LgumAS D IfIcuLDADES

2. Existe alguma possibilidade de se aplicar o “mode- “mode-lo Leoni” à sociedade atual? O que disse anteriormente

the command line, therefore some learning and use of Linux commands was required in running these simulations. Using the HPC meant my simulations were run in parallel and hence the computational time was substantially reduced compared to running these directly within Stata from a desktop computer.

5.2.4 Random number generator and starting seeds

Random number generation within Stata is technically pseudo-random in that numbers are not truly random, but generated by a specific algorithm. Number generation can therefore be replicated by specification of a starting value for the algorithm, referred to as the starting seed (a number between 0 and 2,147,483,647 in Stata). In my command I needed to generate variables from a Normal distribution. To do this I used the random number function rnormal (µ, σ) in Stata which returns a normal variable with mean µ and standard deviation σ.

For each scenario I used a different starting seed. The associated Stata help guide for use of random number functions states that it does not matter how the seeds are chosen as long as they do not exhibit any patterns. I chose seeds of varying length and without the use of any systematic selection that would exhibit patterns. I checked that all the seeds used were different from each other.

5.3 Methods for generating the datasets

At the time of my research there was no universally recommended method for generating clustered ordinal data. I explored the 85 papers identified in my review of sample size methods to see how simulation data was generated for binary outcomes but no one method was consistently used. I explored the literature to evaluate the best approach to use for generating clustered ordinal data. The generation of ordinal clustered data has been described in the literature in the multivariate context i.e. several dependent ordinal outcomes that are correlated with each other. To translate these methods to the cluster randomised trial context we could think of each dependent outcome as representing an individual within a cluster.

In 1995 Gange proposed a method for generating clustered ordinal outcomes.155 _{This method sim-} ulates ordinal data with a specified marginal and pairwise probability structure using an iterative

5.3. METHODS FOR GENERATING THE DATASETS

algorithm. A disadvantage of the algorithm, as stated by the author is, that as cluster size and the number of ordinal categories increases the required computing power may limit the feasibility of the approach. With the advances in computing since 1995 it is not clear whether this issue is still relevant today. The method was used to evaluate the GEE-based sample size method proposed by Kim et al.65 _{However, a cluster size of three was assumed and the number of ordinal categories was} four.

In 2004 Biswas proposed an algorithm for data generation for specific correlation structures, namely first and second order auto-regressive correlations.156 _{These correlation structures imply that as the} distance between two observations within the same cluster increases their correlation decreases. This structure is most relevant to longitudinal designs where a cluster is an individual and measurements are taken at repeated time points, and hence it may be reasonable to assume that the correlation will decrease as the lag between time points increases. This correlation structure is a less obvious choice for cluster randomised trials and so this method has limited applicability to my research. In 2006 Demirtas proposed a data generation procedure which in the first step generates binary data, using what the authors describe as well-accepted data generation methods, and in the second step converts these to ordinal outcomes. The authors state that their method is more general than the methods suggested by Gange155 _{and Biswas}156 _{in that there are no restrictions on the marginal} distributions and pairwise correlations and that a large number of ordinal categories does not lead to excessive computational complexity. In 2014 the package ”‘MultiOrd”’ was developed in R to implement the methods of Demirtas.157 _{The software makes the method more straightforward to} implement and ensures that the data are generated as intended by the authors. However, my experience of using this package with large cluster sizes highlighted that the computational time required was still lengthy.

The method by Gange appears to be the most popular method, with 80 citations in Google Scholar at the time of writing. A common disadvantage to all of the above methods is the complexity and computational time involved as the cluster size increases. Given the number of simulations to be undertaken and the values of the parameters to be investigated in my research it was not feasible to use these methods.

5.3. METHODS FOR GENERATING THE DATASETS

Demirtas described an alternative common approach to ordinal data generation; generate the latent continuous variable and convert to an ordinal outcome by chopping up the continuous outcome into categories using appropriate threshold values. However, Demirtas states that this latent variable approach is generally inappropriate as correlations between the ordinal variables are not of simple form or interpretation, the same comment was also made earlier by Biswas, without any further elaboration.156 _{I contacted Demirtas by email and he explained that what he meant by this comment} was that the correlation of the underlying latent variable will be different to that of the ordinal version of the outcome and that there is no simple formula to convert one to the other. A similar issue exists for binary data and simulations have been used to link values of the two correlations on the different scales.147

Although this latent variable approach is described by Demirtas as being common there is no associated source that provides any evidence of this. In fact it is unclear how often any of these data generation methods have been used. The latent variable approach appears to have been used by Jung and Kang in their evaluation of a score test for clustered ordered categorical data, although the published description of their approach lacks sufficient detail to replicate it.158 _{There is scope} for further research looking at the prevalence and evaluation of these data generation methods for ordinal outcomes.

I chose to simulate clustered ordinal data using the latent variable approach, evaluating the link between the ICC calculated on the latent continuous variable and the ICCs calculated on the ordinal outcome (kappa-type and ANOVA) by simulation. This method: appears to most closely reflect the proportional odds model whose derivation is motivated by the existence of an underlying continuous variable; the methodology can be easily described and hence replicated by others; the generation can be easily implemented in any statistical package; and the computational time required to generate a dataset is reasonable.

5.3.1 Data generating model

Clustered ordinal data was generated using the latent variable approach. Under this approach we think of the ordinal response categories as being a crude measure of some underlying continuous scale. A linear random-intercept model describes this underlying continuous response Y∗

In document liberdade e a lei (página 177-180)