• No se han encontrado resultados

4. TRABAJO DE CAMPO METODOLOGÍA

4.4. HERRAMIENTAS UTILIZADAS Y PROCESO DE ANÁLISIS

4.4.3. Interpretación de estilos de vida Metodología propia

To meet the previously mentioned practical need and deal with randomly distributed sen- sitive points, we propose the contaminated sparse functional GLM. We assume the data consist of independent, identically distributed pairs (Xi, Yi), i = 1, . . . , n, which are i.i.d. replicates of (X, Y ), where Y is a scalar response that could be non-Gaussian, and X is

a stochastic process on [0,1]. The contaminated sparse functional GLM is structured as follows.

Y |X ∼ Qλ, λ = α + βX(τ ), (6.2.1)

τ ∼ θ + W, given θ + W ∈ [0, 1]. (6.2.2)

Here Y is a scalar response, X is a functional predictor, and Qλ is a distribution from the exponential family defined as before. θ is the contamination-free sensitive point that we want to estimate. W is a random contamination that is independent of X, with a known density pW(·) that is smooth, unimodal and symmetric at 0. An example of such density would be the Gaussian density. Since the contaminated sensitive point τ cannot be outside of [0,1], we assume its distribution is the same as θ + W truncated to [0,1]. Thus the conditional density of Y given X and τ only depends on α and β, denoted as pα,β(Y |X, τ ), and the density of τ only depends on θ, denoted as pθ(τ ), and the contaminated sparse functional GLM is indexed by η = (α, β, θ). From the previous assumptions, we have

pα,β(y|X, τ ) = exp(λy − ψ(λ)), (6.2.3)

pθ(x) = pWR(x − θ) · 11 [0,1](x)

0 pW(x − θ) dx

. (6.2.4)

Inspired by the estimation procedure for the sparse functional GLM, we propose the maximum likelihood estimator of η in the contaminated sparse functional GLM, given by

ˆ

ηn= (ˆαn, ˆβn, ˆθn) = arg max

η Mn(α, β, θ), (6.2.5) where the log likelihood Mn(α, β, θ) = Pnm(α, β, θ),

m(α, β, θ) = log

Z 1

0

pα,β(Y |X, τ )pθ(τ )dτ, (6.2.6) and Pn is the empirical distribution of the data on (X, Y ).

6.2.1 Connection to generalized latent variable models

We notice that the contaminated sparse functional GLM has some similarity to latent variable models, since the contaminated sensitive points τi, i = 1, . . . , n are unobservable in reality. In fact, generalized latent variable models have long been established and applied

to multiple aspects of health sciences, for example repeated measures, measurement error and multilevel modeling (Skrondal and Rabe-Hesketh, 2003; Huber et al., 2004; Stefanski, 2000). A comprehensive survey can be found in Skrondal and Rabe-Hesketh (2004).

Depending on the context, latent variables can be defined in different ways. In general, they are random variables whose realizations are hidden from us. As Skrondal and Rabe- Hesketh (2004) presented, latent variables can be used to describe multiple phenomena, such as ‘true’ variables measured with error, hypothetical constructs, unobserved heterogeneity, missing data, counterfactuals or potential outcomes, and latent responses underlying cate- gorical variables. In particular, measurement error models combined with regression models can be used to avoid diluted regression effects when a covariate has been measured with error. It is well-known that the naive approach to estimate the slope in a simple linear regression when the predictor is measured with error produces biased estimates Stefanski (2000).

Another class of widely used models is mixed effects models or multilevel regression models. Multilevel data arise when units are nested in clusters, for example siblings in the same family. Repeated measurements taken on the same subject can be viewed as clustered data too. The units belonging to the same cluster share the same cluster-specific influences, but these influences cannot all be modeled as covariates in that we often have limited knowledge regarding relevant covariates and our data set may furthermore lack information on these covariates. As a result there is cluster-level unobserved heterogeneity leading to correlation between responses for units in the same cluster, after conditioning on covariates. Unobserved heterogeneity is modeled by including random effects in a multilevel regression model.

In addition to linear regression models, where the outcome variable is assumed to be continuous and often Gaussian, generalized latent variable models have been employed to cope with non-Gaussian variables, including dichotomous, grouped, censored, ordinal, unordered polytomous or nominal, pairwise comparisons, rankings or permutations, counts, and durations or survival responses. For example, generalized linear mixed models combines

mixed effects models with generalized linear models to incorporate non-Gaussian responses: E(Y |ν) = µ, g(µ) = ν = X0β + L X l=2 zll,

where g(·) is a link function, β is the vector of fixed effects, and z(l)0

is an Ml-dimensional vector of explanatory variables with random coefficients ζ(l) at level l.

It is easy to see the similarity between the generalized linear mixed model to our pro- posed model (6.2.1), since they are both derived from the GLM framework and have random variables in the linear predictor part. However, in our case the latent variable is an argu- ment of a random function, while in the latent variable literature, the relation between the response and the latent variable is characterized by a fixed functional form. This compli- cates the theoretical and numerical studies of the contaminated sparse functional GLM. For example, we may not be able to directly use the Newton-Raphson type of optimization scheme to obtain the maximum likelihood, since we do not know the functional form of the predictor trajectories.