• No se han encontrado resultados

The aggregating property of DP makes it particularly effective to deal with clus- tering problems. In fact, arguably the most famous application of the DP is the Dirichlet Process Mixture (DPM) model (Lo [1984], Escobar and West [1995]), a class of models that can be expressed hierarchically as follows:

y1, . . . , yn| θ1, . . . , θn ind ∼ p(yi | θi) θ1, . . . , θn | G iid ∼ G (1.3.1) G ∼ DP(α, G0).

We writeind∼ to say independent distributed. This model assumes individual-level parameters θi for i = 1, . . . , n, but the discreteness of the DP distributed prior

Gimplies that the vector θ = (θ1, . . . , θn)can be rewritten in terms of its unique

1.3. Dirichlet Process Mixture Models 37

notation introduced for the BMU in Section 1.2.1). The latter defines a partition of the observations whose sets can be interpreted as clusters of individuals.

In Figure 1.2 we display three sets of (truncated) infinite mixtures of Normal densities (one per panel) generated after sampling five independent realisations of G. We set the latter to be distributed as a DP with α = {1, 10, 100} respectively in each panel. As we would expect, α = 1 gives a priori large probability to a small number of components of the mixture. On the contrary, α = 100 gives high probability to a large number of components.

In order to highlight that (1.3.1) defines an infinite mixture model we write an equivalent representation of the DPM model given by

y1, . . . , yn| G iid ∼ p(y | G) p(y | G) = Z p(y | θ)dG(θ) (1.3.2) G ∼ DP(α, G0).

Recalling the discrete nature of the DP samples as well as its representation in (1.2.5), we can rewrite the sampling model as an infinite mixture model:

y1, . . . , yn | G iid ∼ ∞ X h=1 whp(y | θh).

Alternatively, a DPM can be specified using the BMU and CRP, i.e. inte- grating out G from the joint distribution of θ1, . . . , θn. In particular, exploiting

the probability over partitions of the indexes {1, . . . , n} in (1.2.4) implied by a DP distributed random measure, we can rewrite (1.3.1) as a Random Partition Model (RPM, see Lau and Green [2007] for details). An RPM is characterised by within-cluster-submodels and by a prior distribution on the partition. So, DPM in (1.3.1) is equivalent to p(ρn, y, θ∗) ∝ k Y j=1    Y i∈Sj p(yi | θ∗j) g0(θj∗)α(nj − 1)!    , (1.3.3)

where g0 is the density associated with the distribution G0, while Sj = {i : si =

38 Chapter 1. BNP and covariate dependent random measures −15 −10 −5 0 5 10 15 0.00 0.05 0.10 0.15 0.20 0.25 −15 −10 −5 0 5 10 15 0.00 0.05 0.10 0.15 −15 −10 −5 0 5 10 15 0.00 0.02 0.04 0.06 0.08

FIGURE 1.2: DPM of Normal densities referring to (1.3.1). Top

panel displays five densities corresponding to five independent

realisations of G ∼DP(1, G0). In middle and bottom panels, G is

1.3. Dirichlet Process Mixture Models 39

G integrated out and parameterised in terms of the partition of the observa- tions and the unique values among individual parameters. The term α(nj − 1)!

is called cohesion function for the j-th group of observations and is often de- noted by c(Sj). Since p(ρn)can be seen as the product of the cohesion functions

for each of the groups, this links the DPM with a specific type of RPM called Product Partition Model (PPM, Barry and Hartigan [1992], Hartigan [1990]), characterised in the same way.

Extensions of the model in (1.3.1) and (1.3.3) can be obtained by employing more general classes of prior distributions for G (see for example Chapter 4). For detailed review see Lijoi and Prünster [2010].

Using (1.2.1), it is possible to specify the conditional posterior distribution of θi for the model in (1.3.3) as follows:

p(θi | θ−i, y) ∝ X l6=i p(yi | θi)δθl(θi) + α Z p(yi | θ)dG0(θ)g0(θi | yi), (1.3.4)

where θ−i is the vector obtained from θ after removing its i-th component and

g0(θi | yi)is equal

g0(θi | yi) =

p(yi | θi)g0(θi)

R p(yi | θ)dG0(θ)

.

The latter can be regarded as the posterior distribution of θi when siis different

from all other indicators in s.

1.3.1

Computational aspects of DPM models

Posterior inference in DPM models is often performed using Markov Chain Monte Carlo (MCMC) algorithms (for an introductory review on MCMC meth- ods see Andrieu et al. [2003]). Posterior computations involving DPM models include the challenging step of either sampling G | y or ρn | y. Both these

posterior distributions present interesting challenges which have been largely discussed in the literature. In particular, the posterior of the random measure G is composed by the sum of an infinite collection of locations and weights, whereas the posterior of the partition of the observation ρnis a distribution with

40 Chapter 1. BNP and covariate dependent random measures

grows very fast at rate O(nn)4.

We list below the main MCMC algorithms for posterior inference in DPM models. Exploiting the BMU versions of DPM model in (1.3.4), efficient algo- rithms were proposed by MacEachern and Müller [1998]. Given that G can be integrated out from the model, a Gibbs algorithm resamples the partition ρn

from its full conditional removing one point at time and assigning it to a clus- ter at random according to the posterior version of the probability in (1.2.4). A split and merge method was proposed by Jain and Neal [2004] as a solution for the recurrent problem of MCMC approaches for partitions to remain blocked in configurations with high probability due to the fact that transition states are often characterised by very low probabilities. Alternatively, without integrat- ing out G, the posterior distribution of interest, can be estimated through the slice sampler introduced by Walker [2007] or by the retrospective sampler of Papaspiliopoulos and Roberts [2008]. Approximated methods for sampling the posterior G | y were proposed by Ishwaran and Zarepour [2000] and Ishwaran and James [2001]. These consist in truncating the infinite mixture model im- plied by the DPM model in (1.3.2) and use the estimation techniques typical of the finite dimensional mixture models. A general review of available MCMC methods for DPM models is presented by Neal [2000].