• No se han encontrado resultados

2. MARCO DE REFERENCIA

2.4 Diagnóstico de desnutrición

2.5.1 Instrucciones de uso

Consider I items with feature vectors x ∈ X. The single-user approach to preference learning assumes an independent latent function for each of U users, gu(x, x0) :X2 7→

R. We approach the multi-user problem by assuming a common structure in these user latent functions. In particular, we assume a set of D shared latent functions, hd(x, x0) : X2 7→ R, where D  U. The user latent functions are generated using a

linear combination of these shared functions,

gu(xi, xj) = D

X

d=1

wu,dhd(xi, xj) , (6.3)

where wu,d ∈ R is the weight given to function hd for user u. We place a GP prior over

the shared latent functions hd using the preference kernel described in the previous

section. This allows different users’ preferences to share some common structure repre- sented by the shared latent functions. This assumption results in a matrix factorization dimensionality reduction as is common in collaborative filtering.

We extend this model to the case where, for each user u, there is a feature vector uu containing relevant information about the user. We denote the set of all the users’

feature vectors as U = {u1, . . . , uU}. The user features are incorporated by placing

a separate GP prior over each user’s weights. That is, we replace the scalars wu,d in

Equation (6.3) with functions w0d(uu). These weight functions describe the contribution

of shared latent function hd to the user latent function gu as a function of the user

feature vector uu.

In the multi-user setting we have a set of P pairs of items evaluated by the users, where P ≤ I(I − 1)/2 (the maximum number of item pairs). Denote a preference judgement as yi,u, for i ∈ {1, . . . , P }, u ∈ {1, . . . , U }, where yi,u = 1, indicates that

set of user/item-pair indices for which we have observed preference judgements as D. The complete data consists of the set of feature vectors for the users U (if available), features for the items X, and the preferences {yu,i}(i,u)∈D.

6.2.1 Probabilistic Description of the Model

To predict preferences on unseen item pairs we cast the model into a probabilistic framework. Let G be a real valued U × P ‘user-function’ matrix, where each row corresponds to a particular user’s latent function. That is, the entry in the u-th column and i-th row is gu,i = gu(xα(i), xβ(i)) where α(i) and β(i) denote respectively the first

and second item in the i-th pair. Let H be a D × P ‘shared-function’ matrix, where each row represents the shared latent functions, that is, the entry in the d-th row and i-th column is hd,i= hd(xα(i), xβ(i)). Finally, we introduce the U ×D weight matrix W,

where each row contains a user’s weights. The entry in the u-th row and d-th column is wd,u = wd(uu). Equation (6.3) can now be written as a matrix factorization G = WH.

Let Y be the U × P binary target matrix given by Y = sign[G + E], where E is a U × P noise matrix with entries sampled i.i.d. from a standard Gaussian. The function “sign[·]” retains only the sign of the elements in a matrix. Let YD and GD represent the elements of Y and G for which we have observed preferences. Then, the likelihood for GD given the observations YD, and conditional distribution for GD given H and W are p(YD|GD) = Y (u,i)∈D Φ(tu,igu,i) , p(GD|W, H) = Y (u,i)∈D δ(gu,i− wuh·,i)

respectively, where wu is the u-th row in W, h·,i is the i-th column in H and δ is the

Dirac delta function.

We now select the priors for W and H. We put GP priors on each function w1, . . . , wD with zero mean and some covariance function. Let Kusers be the U × U

covariance matrix for the entries in each column of W. Then p(W|U) =

D

Y

d=1

N(w·,d; 0, Kusers) , (6.4)

where w·,dis the d-th column in W. If user features are unavailable, we use independent

standard Gaussian priors on each element in W, so Kusersbecomes the identity matrix.

Lastly, we put a GP prior on each shared latent function h1, . . . , hD with zero mean and

covariance matrix for the observed item pairs. The prior for H is p(H|X) = D Y d=1 N(hd; 0, Kitems) , (6.5)

where hj is the j-th row in H. The resulting posterior for the latent variables W, H

and GD is

p(W, H, GD|YD, X, U) = p(YD|GD)p(GD|W, H)p(W|U)p(H|X)

p(YD|X, U) , (6.6)

where p(YD|X, U) is the (intractable) marginal likelihood, or model evidence.

6.2.2 The Predictive Distribution

Given a new item pair with index P + 1, we compute the predictive distribution for the preference of the u-th user on this pair by integrating over the posterior on parameters H, W and GD as

p(yu,P +1|YD, X) =

Z

p(yu,P +1|gu,P +1)p(gu,P +1|wu, h·,P +1)

p(h·,P +1|H, X)p(H, W, GD|YD, X, U) dH dW dGD, (6.7)

where

p(yu,P +1|gu,P +1) = Φ(yu,P +1gu,P +1) ,

p(gu,P +1|wu, h·,P +1) = δ(gu,P +1− wuh·,P +1) , p(h·,P +1|H, X) = D Y d=1 N(hd,P +1; k>?K −1 itemshd, k?− k>?K −1 itemsk?) .

k? is the prior variance of hd xα(P +1), xβ(P +1) and k? is a P -dimensional vector that

contains the prior covariances between hd xα(P +1), xβ(P +1) and hd xα(1), xβ(1) , . . . ,

hd xα(P ), xβ(P ). The posterior (6.6) and predictive distribution (6.7) are intractable

so approximations must be used. For this, we use a combination of EP and VB.

Documento similar