número 18 y variable número 22) Porcentajes.
9. CONCLUSIONES: SOCIEDAD, DEPORTE Y MEDIOS DE COMUNICACIÓN
Combined likelihood for the family and twin studies
To obtain joint estimates for the NIMA and direct offspring allelic effect we maximize the combined likelihood for both data sets, given by the product of the likelihood contribution from family study (2.1), and the likelihood contribution from twin study (2.6):
CJ L(τu, β0, β1, β2) =J Lf(τu, β0, β1, β2)×J Lt(τu, β0, β1, β2). (2.7) Information to estimate the direct allelic effect, the baseline risk and the variance of the random effect comes both from twins and families. On the other hand, the family likelihood allows us to estimate also the NIMA effect. By adding the twins to the families, we borrow information to better estimate the direct allelic effect, which will also improve the estimate of the NIMA parameter through the family likelihood.
2.3
Simulation Study
The primary goal of the simulation study was to test efficiency gain for estimating effects that depend on parental genotype, such as NIMA, when a twin data set, with missing parental information, is combined with a data set comprised of nuclear families. In addition, we wanted to study the finite sample properties of the J L itself and relative to theP L. In particular, we investigated the impact of family size, variance of random effects and ascertainment scheme on the parameter estimates, and compared our method with theP Lused in previous studies, in terms of efficiency and bias of estimates of NIMA effect.
In each scenario, genotype frequencies were selected to mimic the frequency of DERAA alleles in the English population, i.e. .15 [Ann Morgan, personal communi- cation]. To generate genotypes of family members, maternal and paternal genotypes were generated assuming random mating and HWE. Offspring genotypes were gener- ated assuming Mendelian transmission. Disease outcomes of offspring were generated according to the random effects model (2.5). The family-specific random intercept was assumed to be normally distributed with mean zero and variance either 1.5 or 2.5, resembling results from previous literature on heritability of RA [van der Woude et al., 2009]. Two different ascertainment schemes were used, that is, families were included in the study if at least one or two offspring were affected. Twins were generated as families with two offspring, ascertained such that at least one twin per pair is affected. Parental genotype and phenotype information was ignored to mimic the real data set. We set β0 to -3, representing a common disease with population prevalence approximately 5%. The true parameter values for offspring allelic and
NIMA effect, β1 and β2, were fixed at -.5 and -1, corresponding to an odds ratio of .6 and .4 respectively. In total, 16 scenarios were generated, each consisting of
103 simulated data sets, with corresponding family and sample size, ascertainment scheme and variance of the random effect as indicated in Table 2.2.
To study the finite sample properties of theJ L, we applied the likelihood to all scenarios of Table 2.2. Results are summarised in Table 2.3. Effect of different family and sample size on the parameter estimates is reflected by comparing scenarios 1-4. When both sample and family size are small, e.g. scenario 1, τu2 is overestimated
Table 2.2: Simulation scenarios with varying sample and family size, ascertainment scheme and variance of the random effects. fam: family, of: offpsring, P
jYij ≥1: at least one affected offspring, P
jYij ≥ 2: at least two affected offspring, τu2: variance of random effect.
Scenario Nr. fam Nr. of Ascertainment τ2
u 1 100 3 P jYij ≥1 1.5 2 100 5 P jYij ≥1 1.5 3 500 3 P jYij ≥1 1.5 4 500 5 P jYij ≥1 1.5 5 100 3 P jYij ≥1 2.5 6 100 5 P jYij ≥1 2.5 7 500 3 P jYij ≥1 2.5 8 500 5 P jYij ≥1 2.5 9 100 3 P jYij ≥2 1.5 10 100 5 P jYij ≥2 1.5 11 500 3 P jYij ≥2 1.5 12 500 5 P jYij ≥2 1.5 13 100 3 P jYij ≥2 2.5 14 100 5 P jYij ≥2 2.5 15 500 3 P jYij ≥2 2.5 16 500 5 P jYij ≥2 2.5
2.3. SIMULATION STUDY 19 resulting in an underestimatedβ0. However, estimates of the log odds ratios for the offspring allelic and NIMA effect are nearly unbiased, -2.3% and 3.4% respectively.
Increasing family size from 3 to 5, scenario 2, reduces the bias of both effects to .1%
and 2.4% and their standard deviations by 8.5% and 11.43% respectively. On the
other hand, increasing the number of families from 100 to 500, scenario 3, reduces the bias of both effects to -1.4% and -1.0%and their standard deviations by 55.6%and
58.4% respectively. To study the effect of differentτ2
u on the parameter estimates, we compared scenarios 1-4 with scenarios 5-8 or/and scenario’s 9-12 with scenarios 13-14. Whenτu2increases from 1.5 to 2.5, from scenario 1 to scenario 5, bias on the estimate ofβ0andτu2itself increases. However, this does not introduce much bias in the estimation of the offspring allelic and NIMA parameters. Different ascertainment schemes were compared by contrasting scenarios 1-4 with scenarios 9-12. Bias in τ2
u andβ0 estimates increases when ascertainment is PjYij ≥ 2, as compared to P
jYij ≥ 1 while estimates of the offspring allelic and NIMA parameters remain unbiased, e.g. bias in scenario 9, forβ1 andβ2, is 1.9%and 5.7% respectively.
Next, we compare the two different likelihoods to model family/twin data in terms of efficiency, the P Lused in existing methods, with the approach we use in this article, theJ L. We define the percentage of efficiency improvement of likelihood A over B, for estimating a parameterβ, as EI= (1−V ar(βA)
V ar(βB))×100. Positive values
mean that likelihood A performs better. In Figure 2.1.a we plot the EI of theJ Lover theP L, for estimating the log odds ratios of the offspring allelic and NIMA effect. All values are positive, thus the J L is always more efficient. Improvement mainly depends on sample size and less on family size, e.g. EI is approximately the same in scenario 1 and 3 as compared to scenario 2. Moreover, improvement, due toJ L, is higher when information is limited, i.e. when families are small and ascertainment is P
jYij ≥2.
Last, we compared the performance of the J L when different data sources are available: ascertained families-only versus ascertained families and twins. In terms of likelihoods, we compare the J L in (2.1) with the CJ Lin (2.7). Efficiency im- provement of the families-only against the combined approach, with families and 100 twin pairs, is plotted in Figure 2.1.b. TheCJ Lapproach is more efficient under all scenarios studied. The percentage of improvement is similar across different values of variance of the random effects or ascertainment scheme. Nonetheless, improvement is noticeably high when the sample size of the nuclear family data is small. When the twin data set was added, we expected efficiency improvement for the offspring allelic effect, due to increased sample size. Interestingly, there was also efficiency improvement for the NIMA effect, which depends on the maternal genotype. The parameter estimates and their standard deviations, using theCJ L, are listed in Table A.2.1 of the Appendix.
In order to asses the performance of our method when both direct offspring and NIMA effects are under the null,β1=β2= 0, and cases in which there only exists a
direct offspring,β2= 0, or only a NIMA effect,β1 = 0, we simulated the scenarios
presented in Table 2.2 with the corresponding effect sizes. We first estimated the effects optimising the J L using only the families. Later, we added 100 twin pairs and optimized the CJ L. The estimated effect sizes remain unbiased. The results are listed in Tables A.2.2 and A.2.3 of the Appendix for theJ Land in Tables A.2.4, A.2.5 and A.2.6 of the Appendix for theCJ L.
Table 2.3: Summary statistics for parameter estimates of the J L (2.1) under the penetrance model (2.5) for each scenario described in Table 2.2. Each entry lists the mean estimates (standard deviation of estimates) over 1000 simulated data sets. JL: joint likelihood. True Values τu2= 1.5 β0=−3 β1=−.5 β2=−1 1 2.148 (2.538) -3.365 (1.460) -.477 (.349) -1.034 (.507) 2 1.584 (1.038) -3.049 (.579) -.501 (.319) -1.024 (.449) 3 1.571 (.771) -3.052 (.466) -.486 (.155) -.990 (.211) 4 1.543 (.411) -3.022 (.226) -.497 (.140) -1.000 (.197) τ2 u = 2.5 β0=−3 β1=−.5 β2=−1 5 3.517 (3.382) -3.476 (1.612) -.478 (.397) -1.016 (.541) 6 2.724 (1.662) -3.104 (.766) -.503 (.344) -1.022 (.469) 7 2.587 (1.152) -3.045 (.572) -.492 (.169) -1.001 (.231) 8 2.577 (.629) -3.033 (.290) -.504 (.149) -1.003 (.196) τ2 u = 1.5 β0=−3 β1=−.5 β2=−1 9 2.827 (3.001) -4.236 (2.864) -.519 (.265) -1.057 (.407) 10 1.980 (1.765) -3.408 (1.442) -.501 (.258) -1.020 (.386) 11 2.472 (2.290) -3.929 (2.194) -.499 (.120) -.999 (.173) 12 1.607 (.672) -3.091 (.560) -.497 (.112) -.994 (.167) τ2 u = 2.5 β0=−3 β1=−.5 β2=−1 13 3.468 (3.347) -3.673 (2.465) -.540 (.316) -1.056 (.459) 14 2.944 (2.032) -3.308 (1.341) -.501 (.299) -1.024 (.423) 15 3.524 (2.852) -3.778 (2.148) -.500 (.144) -1.016 (.205) 16 2.716 (1.084) -3.141 (.721) -.501 (.129) -.999 (.175)
2.3. SIMULATION STUDY 21 (a) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 15 0 50 100 150 200 250 300 Scenarios Efficiency impro vement 1 2 3 4 6 7 8 9 11 12 13 14 16 ● OAE NIMA (b) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 15 0 20 40 60 80 100 Scenarios Efficiency impro vement 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ● OAE NIMA
Figure 2.1: Efficiency improvement (EI) of (a)J LagainstP Land (b)CJ Lof fami- lies and twins, against theJ Lfor families-only, compared for different family/sample size, ascertainment schemes and variance of random effect. (a) Values below zero represent no EI by using theJ Land values above zero represent EI of theJ Lagainst theP L. (b) Values below zero represent no EI by using theCJ Land values above zero represent EI of theCJ Lagainst theJ L. Each point represents the EI in each of the sixteen scenarios presented in Table 2.2. J L: joint likelihood, P L: prospective likelihood, andCJ L: combined joint likelihood, OAE: offspring allelic effect, NIMA: non-inherited maternal antigens effect.
The performance of our approach will vary across different frequencies of the protective allele. All the results presented above concern an allele frequency of .15, in order to mimic the allele frequency in the population we are studying. To study the performance of the method when allele frequency is lower, we also applied the CJ L to samples generated with a protective allele frequency of .05. As expected, the parameter estimates are more biased for small sample sizes. Larger samples are needed to obtain unbiased estimates. Results are listed in Table A.2.7 of the Appendix.