ASPECTO METODOLÓGICO

FORMATO DE RESUMEN DE TESIS DE PREGRADO

One hundred thousand samples are generated for each combination of sample size, coefficients of variation and correlation. Three different sample sizes (60, 180 and 600), two different sets of coefficients of variation ((5%,5%,20%) and (30%,30%,60%)) and three different correlations (independence, 0.3 and 0.7) are considered in this simulation study. Compositional data is generated by considering that if logY˙i

follows a multivariate normal distribution with mean vectorζ_iand variance covariance matrixΨ, then ˙Yifollows a multivariate lognormal distribution with parameters ζ_i and Ψ and taking the closure of ˙Yi leads to the vector of compositional response variablesYi. Knowing the values of mi

˙ θi,βj

, coefficients of variation and correlations, the values of the parametersζ_i and

Ψare calculated through equations (3.7), (3.8), (3.9) and (3.10).

The coefficients of variation and correlations are taken to be fixed at the values used in the simulation study. The values for mi

˙ θi,βj = expθ˙i+x 0 iβj

used for the data generation procedure are obtained by fixing a set ofβ and ˙θparameters. Theβ parameters are taken to be β10= 0.14,β20 = 0.02, β30= 0.04, β11= 0, β21= 0, β31= 0. The vector

θ is generated using the standard normal distribution.

By taking the third component as reference component, the trueγ parameters are calculated by taking the difference β_j −β₃, j 6= 3, leading to the values shown in Table 3.1, whereγ11 and γ21 are set equal to 0.

Component 1 Component 2

Intercept 0.1 −0.02

x 0 0

Table 3.1: Table of True γParameters

Once ζ_i and Ψ are calculated, the 3-part compositional data is then generated. Since ˙

Yi are taken to follow a multivariate lognormal distribution, the compositional response variables will not contain any zeros.

Once the data generating procedure and the trueγ parameter values are set, the simulation study may be carried out. For each generated sample of data, estimates using Aitchison’s approach are obtained by fitting the linear model

E(log (Yij)) =βj∗0+βj1xi (3.46) for each of the three components. Estimates γb

∗

j are obtained by taking the difference

β∗_j−βb₃,j6= 3, and the resulting fitted values are exponentiated and rescaled so that they

adhere to the sum-to-1 constraint.

For each generated sample, estimates using the generalized Wedderburn approach are obtained through an iterative process. The linear model

E  log mi b θi,βb_j +   Yij mi b θi,βb_j −1    =βj0+βj1xi (3.47)

is used to obtain theβestimates. The initial value for eachθiis taken to be 0 and the initial values ofmi

θi,βb_j

, (i= 1, . . . , n, j = 1, . . . , J) are taken to be the fitted values obtained from (3.46). The initial estimates of θ1, . . . , θn are updated once the initial values of mi

θi,βb_j

, (i= 1, . . . , n, j = 1, . . . , J), are obtained. The initial values ofmi

θi,βb_j

are hence rescaled and used in the linear model (3.47) to obtain updated values ofmi

θi,βb_j

. This leads to another update in the estimates of θ1, . . . , θn which is used to once again obtain rescaled values of mi

θi,βb_j

and the procedure is repeated until convergence is achieved. The convergence criterion used in the simulation study is

mi b θ_it+1,βb t+1 j −mi b θt_i,βb t j < (3.48)

wheretdenotes the iteration number andis a predefined level of tolerance. In this study, is set to be equal to 10−8 _{and for convergence to be achieved, the convergence criterion}

(3.48) has to be satisfied for all iand j. Having achieved convergence, estimates _bγ_j are obtained by taking the difference βb_j −βb₃,j 6= 3.

Under the generalized Wedderburn approach, two different estimates of the variance- covariance matrix Var (γb) are obtained for each sample; the model-based estimator (2.67)

with ˆφ_V\_p

i,Ω,Wworked out using (2.74) and the robust estimator of Liang and Zeger (1986)

described in Section 2.8.3. This is done in order to be able to compare the performance of the two estimators under various sample sizes, coefficients of variation and correlation coefficients. Such a comparison may be entertained by computing the sample variance for each γ parameter using theγ estimates obtained from the generated samples.

Also for every sample and for both the model-based and robust variance estimators, confidence intervals for each of theγ parameters are computed using the estimated standard errors. The estimated standard errors are obtained by taking the square root of the di- agonal elements of the model-based and robust estimates of Var (γb). For every sample

and every parameter, note is taken of the number of times the true parameter values lie within the confidence intervals obtained throughout. At the end of the simulation study, the coverage probability for every parameter is estimated for both the model-based and robust variance estimator. The empirical coverage probability will be compared with the nominal 95% level. This exercise is also carried out to investigate the performance of the two variance estimators. The coverage probabilities that are closest to 95% are achieved by the better performing variance estimator.

Summarization of the Simulation Results

The estimates that are obtained at the end of the simulation are:

• the biases achieved under the two approaches together with their standard error • the variance of theγ estimates achieved under the two approaches together with the

corresponding standard error

• the average of the estimated Var (bγ) using both model-based and robust variance es-

timators, under the generalized Wedderburn approach, together with their standard error

• coverage probabilities for every non-intercept γ parameter using both model-based and robust variance estimators under the generalized Wedderburn approach. Since interest lies in the non-intercept parameters, all the results obtained from the simulation study will focus on the coefficients γ11 and γ21. To get an idea of the typical

simulated datasets that are used in this study, refer to Appendix E. The ternary diagrams presented in Appendix E have been obtained using the first generated sample for each combination of sample size, correlation coefficient and coefficients of variation.

In document Nivel de sobrecarga y funcionamiento familiar del cuidador principal de pacientes hemodializados H R H D E Arequipa 2014 (página 110-115)