Instrucciones de uso - Diagnóstico de desnutrición

2. MARCO DE REFERENCIA

2.4 Diagnóstico de desnutrición

2.5.1 Instrucciones de uso

Consider I items with feature vectors x ∈ X. The single-user approach to preference learning assumes an independent latent function for each of U users, gu(x, x0) :X2 7→

R. We approach the multi-user problem by assuming a common structure in these user latent functions. In particular, we assume a set of D shared latent functions, hd(x, x0) : X2 7→ R, where D U. The user latent functions are generated using a

linear combination of these shared functions,

gu(xi, xj) = D

d=1

wu,dhd(xi, xj) , (6.3)

where wu,d ∈ R is the weight given to function hd for user u. We place a GP prior over

the shared latent functions hd using the preference kernel described in the previous

section. This allows different users’ preferences to share some common structure repre- sented by the shared latent functions. This assumption results in a matrix factorization dimensionality reduction as is common in collaborative filtering.

We extend this model to the case where, for each user u, there is a feature vector uu containing relevant information about the user. We denote the set of all the users’

feature vectors as U = {u1, . . . , uU}. The user features are incorporated by placing

a separate GP prior over each user’s weights. That is, we replace the scalars wu,d in

Equation (6.3) with functions w0_d(uu). These weight functions describe the contribution

of shared latent function hd to the user latent function gu as a function of the user

feature vector uu.

In the multi-user setting we have a set of P pairs of items evaluated by the users, where P ≤ I(I − 1)/2 (the maximum number of item pairs). Denote a preference judgement as yi,u, for i ∈ {1, . . . , P }, u ∈ {1, . . . , U }, where yi,u = 1, indicates that

set of user/item-pair indices for which we have observed preference judgements as D. The complete data consists of the set of feature vectors for the users U (if available), features for the items X, and the preferences {yu,i}(i,u)∈D.

6.2.1 Probabilistic Description of the Model

To predict preferences on unseen item pairs we cast the model into a probabilistic framework. Let G be a real valued U × P ‘user-function’ matrix, where each row corresponds to a particular user’s latent function. That is, the entry in the u-th column and i-th row is gu,i = gu(xα(i), xβ(i)) where α(i) and β(i) denote respectively the first

and second item in the i-th pair. Let H be a D × P ‘shared-function’ matrix, where each row represents the shared latent functions, that is, the entry in the d-th row and i-th column is hd,i= hd(xα(i), xβ(i)). Finally, we introduce the U ×D weight matrix W,

where each row contains a user’s weights. The entry in the u-th row and d-th column is wd,u = wd(uu). Equation (6.3) can now be written as a matrix factorization G = WH.

Let Y be the U × P binary target matrix given by Y = sign[G + E], where E is a U × P noise matrix with entries sampled i.i.d. from a standard Gaussian. The function “sign[·]” retains only the sign of the elements in a matrix. Let YD and GD represent the elements of Y and G for which we have observed preferences. Then, the likelihood for GD given the observations YD, and conditional distribution for GD given H and W are p(YD|GD) = Y (u,i)∈D Φ(tu,igu,i) , p(GD|W, H) = Y (u,i)∈D δ(gu,i− wuh·,i)

respectively, where wu is the u-th row in W, h·,i is the i-th column in H and δ is the

Dirac delta function.

We now select the priors for W and H. We put GP priors on each function w1, . . . , wD with zero mean and some covariance function. Let Kusers be the U × U

covariance matrix for the entries in each column of W. Then p(W|U) =

d=1

N(w·,d; 0, Kusers) , (6.4)

where w·,dis the d-th column in W. If user features are unavailable, we use independent

standard Gaussian priors on each element in W, so Kusersbecomes the identity matrix.

Lastly, we put a GP prior on each shared latent function h1, . . . , hD with zero mean and

covariance matrix for the observed item pairs. The prior for H is p(H|X) = D Y d=1 N(hd; 0, Kitems) , (6.5)

where hj is the j-th row in H. The resulting posterior for the latent variables W, H

and GD is

p(YD|X, U) , (6.6)

where p(YD|X, U) is the (intractable) marginal likelihood, or model evidence.

6.2.2 The Predictive Distribution

Given a new item pair with index P + 1, we compute the predictive distribution for the preference of the u-th user on this pair by integrating over the posterior on parameters H, W and GD as

p(yu,P +1|YD, X) =

p(yu,P +1|gu,P +1)p(gu,P +1|wu, h·,P +1)

p(h·,P +1|H, X)p(H, W, GD|YD, X, U) dH dW dGD, (6.7)

where

p(yu,P +1|gu,P +1) = Φ(yu,P +1gu,P +1) ,

p(gu,P +1|wu, h·,P +1) = δ(gu,P +1− wuh·,P +1) , p(h·,P +1|H, X) = D Y d=1 N(hd,P +1; k>?K −1 itemshd, k?− k>?K −1 itemsk?) .

k? is the prior variance of hd xα(P +1), xβ(P +1) and k? is a P -dimensional vector that

contains the prior covariances between hd xα(P +1), xβ(P +1) and hd xα(1), xβ(1) , . . . ,

hd xα(P ), xβ(P ). The posterior (6.6) and predictive distribution (6.7) are intractable

so approximations must be used. For this, we use a combination of EP and VB.

In document Consumo de proteina y su relación con la herramienta de detección del riesgo de deterioro del estado nutricional y crecimiento (STRONGkids) en pacientes pediátricos del Hospital Icaza Bustamante. Guayaquil, 2017. (página 39-42)