I. MARCO TEÓRICO
1. Las personas mayores en Chile: situación, avances y desafíos
2.4. Etiología de los planteamientos acerca del constructo
In Chapter 2, we consider a two-phase ODS design in a cohort study. A two-phase ODS sample consists of complete observations under ODS scheme in the second phase and
Figure 1.3: Illustration for a two-phase ODS under a linear regression model
SRS ODS ODS population !" !# SRS ODS ODS $ ≤ !" !"< $ < !# $ ≥ !# Incomplete : Complete :
Figure 1.4: Conceptual illustration for a general two-phase ODS design
First stage : {Yi, Wi∶i=1,�, N};
Second stage : SRS {Yi, Xi, Wi∶i=1,�, n0};
ODS from the left tail {Yi, Xi, Wi�Yi≤c1∶i=1,�, n1};
ODS from the right tail {Yi, Xi, Wi�Yi≥c3∶i=1,�, n3}.
observations in the first phase. To fix notation letY denote a continuous outcome variable,Xbe a covariate vector, and W be a proxy measure for X. Figure 1.3 shows the two-phase ODS design in a cohort study under a linear model. In terms of the type of auxiliary information, our proposed method considers a continuous auxiliary variable,W, for a covariate of interest while Weaver and Zhou (2005) considered a categorical auxiliary variable in their discussion section. We assume that there are independent and identically distributed population samples of sizeN in the first phase. The domain of Y consists of 3 mutually exclusive intervals : C1∪C2∪C3 =(−∞, c1]∪(c1, c3]∪(c3,∞)wherec1 andc3 are fixed constants. In the second
phase, the ODS sample of sizenconsists of three parts, SRS sample of sizen0, a supplemental
ODS sample of size n1 from C1 and another supplemental ODS sample of size n3 from C3.
Thus, a two-stage ODS design in our study has the data structure as follows : The ODS sample in the second phase is a complete sample but that the rest of the observations in the population are incomplete observations that have missing in covariate. From the measurement error terminology, V denotes the validation sample set and V denotes the nonvalidation sample set.
Let nV be the total sample size of ODS that consists of complete observations, and
nV = N −nV, is the number of incomplete observations. nV = n0 +n1 +n3 where n0 is the
number of SRS sample andnk denotes the number of supplemental ODS samples from thekth
interval. Figure 1.4 is depicted to give a graphical understanding of two-phase ODS design. The ellipses parts representV of sizenV, and the shaded area represents V of sizenv, respectively.
We incorporate two methods : (1) a semiparametric empirical likelihood method for complete observations; (2) an updating method in Chen & Chen (2000) and Jiang & Zhou (2007) to update estimates from the ODS sample. With complete ODS observations from the second phase, We consider two regression models, a regression model that represents a relationship between the response and covariates of interest and one about a relationship between the response and auxiliary variable. Without loss of generality, we consider a regression model for a covariate of interest and continuous response variable,
Y =X +ex, (1.8)
where ’s denote regression parameters and ex ∼ N(0, x2). On the other hand, a regression
model for the auxiliary variable,
Y =W +ew, (1.9)
where ’s denote regression parameters andew ∼N(0, w2).By applying the likelihood in Zhou
et al. (2002) to two regression models with respect to =( 0, 1)′and =( 0, 1)′, respectively,
we have two likelihoods for complete observations in the second stage :
For the linear model in (1.8) with ODS samples that have the data structure of {Yi, Xi}, i =
1,�, nv,
= ��n0 i=1 f (y0i�x0i)gX(x0i)�×� � k=1,3 nk � i=1 P(yki, xki�Yi∈Ck)�,
whereGX andgX denote the cumulative distribution and density function ofX. For the linear
model in (1.9) with ODS samples that have the data structure of{Yi, Wi},i=1,�, nv,
L( , HW) = LSRS( , HW)⋅LODS( , HW) = ��n0 i=1 f (y0i�w0i)hW(w0i)�×� � k=1,3 nk � i=1 P(yki, wki�Yi∈Ck)�,
where HW and hW denote the cumulative distribution and density function of W. By the
semiparametric empirical likelihood method, we obtain (ˆ,ˆ) for true value of ( ∗, ∗) with
some constraints that will be given in Chapter 3. The multivariate normal distribution theory provides the asymptotic distribution of√nv(ˆ− ,ˆ− ). Since we assumed that all values of
auxiliary variable and response in the study population, a regression model for the population dataset is given as
Y =W +e,
where ’s denote regression parameters ande∼N(0, 2).The estimate of is obtained by using
maximum likelihood under SRS scheme for the population sample. We will study how to update ˆ by using the updating algorithm in Chen & Chen (2000) and Jiang & Zhou (2007) under two-phase ODS design. This approach has advantages of using more information in two-phase sampling and more efficient estimators than those in Weaver and Zhou (2005) and computational ease for multiple covariates and auxiliary variables.
Figure 1.5: Illustration for ODS with missing data under a linear regression model SRS ODS ODS population !" !# $ ≤ !" c"< $ < !# $ ≥ !#
Figure 1.6: Conceptual illustration for the general ODS with missing data
1.5.2 An Estimated likelihood approach to a missing data under an Outcome-dependent