FORMATO DE RESUMEN DE TESIS DE PREGRADO
ASPECTO METODOLÓGICO
One hundred thousand samples are generated for each combination of sample size, coef- ficients of variation and correlation. Three different sample sizes (60, 180 and 600), two different sets of coefficients of variation ((5%,5%,20%) and (30%,30%,60%)) and three different correlations (independence, 0.3 and 0.7) are considered in this simulation study. Compositional data is generated by considering that if logY˙i
follows a multivariate normal distribution with mean vectorζiand variance covariance matrixΨ, then ˙Yifollows a multivariate lognormal distribution with parameters ζi and Ψ and taking the closure of ˙Yi leads to the vector of compositional response variablesYi. Knowing the values of mi
˙ θi,βj
, coefficients of variation and correlations, the values of the parametersζi and
Ψare calculated through equations (3.7), (3.8), (3.9) and (3.10).
The coefficients of variation and correlations are taken to be fixed at the values used in the simulation study. The values for mi
˙ θi,βj = expθ˙i+x 0 iβj
used for the data generation procedure are obtained by fixing a set ofβ and ˙θparameters. Theβ parameters are taken to be β10= 0.14,β20 = 0.02, β30= 0.04, β11= 0, β21= 0, β31= 0. The vector
˙
θ is generated using the standard normal distribution.
By taking the third component as reference component, the trueγ parameters are calcu- lated by taking the difference βj −β3, j 6= 3, leading to the values shown in Table 3.1, whereγ11 and γ21 are set equal to 0.
Component 1 Component 2
Intercept 0.1 −0.02
x 0 0
Table 3.1: Table of True γParameters
Once ζi and Ψ are calculated, the 3-part compositional data is then generated. Since ˙
Yi are taken to follow a multivariate lognormal distribution, the compositional response variables will not contain any zeros.
Once the data generating procedure and the trueγ parameter values are set, the simulation study may be carried out. For each generated sample of data, estimates using Aitchison’s approach are obtained by fitting the linear model
E(log (Yij)) =βj∗0+βj1xi (3.46) for each of the three components. Estimates γb
∗
j are obtained by taking the difference
b
β∗j−βb3,j6= 3, and the resulting fitted values are exponentiated and rescaled so that they
adhere to the sum-to-1 constraint.
For each generated sample, estimates using the generalized Wedderburn approach are obtained through an iterative process. The linear model
E log mi b θi,βbj + Yij mi b θi,βbj −1 =βj0+βj1xi (3.47)
is used to obtain theβestimates. The initial value for eachθiis taken to be 0 and the initial values ofmi
b
θi,βbj
, (i= 1, . . . , n, j = 1, . . . , J) are taken to be the fitted values obtained from (3.46). The initial estimates of θ1, . . . , θn are updated once the initial values of mi
b
θi,βbj
, (i= 1, . . . , n, j = 1, . . . , J), are obtained. The initial values ofmi
b
θi,βbj
are hence rescaled and used in the linear model (3.47) to obtain updated values ofmi
b
θi,βbj
. This leads to another update in the estimates of θ1, . . . , θn which is used to once again obtain rescaled values of mi
b
θi,βbj
and the procedure is repeated until convergence is achieved. The convergence criterion used in the simulation study is
mi b θit+1,βb t+1 j −mi b θti,βb t j < (3.48)
wheretdenotes the iteration number andis a predefined level of tolerance. In this study, is set to be equal to 10−8 and for convergence to be achieved, the convergence criterion
(3.48) has to be satisfied for all iand j. Having achieved convergence, estimates bγj are obtained by taking the difference βbj −βb3,j 6= 3.
Under the generalized Wedderburn approach, two different estimates of the variance- covariance matrix Var (γb) are obtained for each sample; the model-based estimator (2.67)
with ˆφV\p
i,Ω,Wworked out using (2.74) and the robust estimator of Liang and Zeger (1986)
described in Section 2.8.3. This is done in order to be able to compare the performance of the two estimators under various sample sizes, coefficients of variation and correlation coefficients. Such a comparison may be entertained by computing the sample variance for each γ parameter using theγ estimates obtained from the generated samples.
Also for every sample and for both the model-based and robust variance estimators, con- fidence intervals for each of theγ parameters are computed using the estimated standard errors. The estimated standard errors are obtained by taking the square root of the di- agonal elements of the model-based and robust estimates of Var (γb). For every sample
and every parameter, note is taken of the number of times the true parameter values lie within the confidence intervals obtained throughout. At the end of the simulation study, the coverage probability for every parameter is estimated for both the model-based and robust variance estimator. The empirical coverage probability will be compared with the nominal 95% level. This exercise is also carried out to investigate the performance of the two variance estimators. The coverage probabilities that are closest to 95% are achieved by the better performing variance estimator.
Summarization of the Simulation Results
The estimates that are obtained at the end of the simulation are:
• the biases achieved under the two approaches together with their standard error • the variance of theγ estimates achieved under the two approaches together with the
corresponding standard error
• the average of the estimated Var (bγ) using both model-based and robust variance es-
timators, under the generalized Wedderburn approach, together with their standard error
• coverage probabilities for every non-intercept γ parameter using both model-based and robust variance estimators under the generalized Wedderburn approach. Since interest lies in the non-intercept parameters, all the results obtained from the sim- ulation study will focus on the coefficients γ11 and γ21. To get an idea of the typical
simulated datasets that are used in this study, refer to Appendix E. The ternary diagrams presented in Appendix E have been obtained using the first generated sample for each combination of sample size, correlation coefficient and coefficients of variation.