• No se han encontrado resultados

CAPÍTULO II: MARCO TEÓRICO

2.2 FUNDAMENTACIÓN TEÓRICA

2.2.19 Ley de Régimen Tributario Interno

2.2.19.2 Clasificación de los tributos

Bayesian hierarchical modeling refers to a generic strategy for model building in which unobserved quantities are organized into a small number of discrete levels with logically distinct and scientifically interpretable functions and probabilistic relationships between them. These capture inherent features of the data. The hierarchy of levels makes it particularly suitable for modeling gene expression data, which arises from a number of processes and is affected by many sources of variability. In the Bayesian framework there are many approaches to modeling these different sources of variability using fixed effects, random effects and distributional assumptions.

We present in this section the main Bayesian contributions to clustering of time course data. However, it should be also mentioned the great contribution of reversible jump MCMC (Green, 1995) to the field of Bayesian clustering because it can be applied to mixture models (Richardson and Green, 1997) to allow the number of clusters to

vary. The implementation of this approach can be fully Bayesian since all parameters of interest can be treated as random variables and their posterior distribution can be approximated with reversible jump MCMC.

The first attempt to a model-based Bayesian time-invariant approach for time- course data is relatively recent, by Ramoni et al. (2002) with a pseudo-Bayesian method: an agglomerative clustering is performed with an heuristic search and a Bayesian ap- proach with improper priors is used to determine the number of clusters and score each partition, due to the computational effort otherwise necessary for a fully Bayesian model. Four years later the authors reviewed their algorithm (Wang et al., 2006) and proposed a polynomial basis function with proper priors, a more appropriate model for short time series. However, the heuristic search using a Euclidean measure is not convincing when used to explore partitions evaluated by a different method.

A significant hurdle in the identification of periodically expressed genes by mi- croarray experiments arises from the substantial amount of noise in the observations. Only when the sampled cells are in good synchrony can time course readings reflect cell cycle course transcriptions. Obtaining a pure synchronise dpopulation is non-trivial. For example, Lu et al. (2004) presents a model for resynchronising time series expres- sion data by assuming that expression profiles follow a specific pattern (sinusoids) and employing an empirical Bayes method to detect periodically expressed genes. Resyn- chronisation is an important aspect of microarray experiments but this is beyond the scope of this thesis.

A full Markov chain Monte Carlo (MCMC) approach is used in an early work by Wakefield et al. (2003), and then refined by Zhou et al. (2006). The use of a basis function representation with random effects is promising but the method is very computationally intensive, with the marginal likelihood not available analytically under

their model. The size of microarray experiments makes this approach infeasible because the run time to obtain reasonably accurate approximations to the marginal likelihood for a full hierarchy would be excessive. Zhou et al. (2006) use a filtering technique to reduce the high dimensionality of the data before running the clustering algorithm in order to overcome this problem, whilst clustering itself should be the tool used for this reduction.

There are few papers on Bayesian clustering of time-course data. Ray and Mallick (2006) proposed a nonparametric Bayesian wavelet model for clustering functional data, relying on a Dirichlet process prior for the distribution of the wavelet coefficients. The model is promising for those applications for which the use of wavelets is appropriate, even though it is computationally intensive and in the paper Ray and Mallick (2006) only include examples with up to six hundred genes. Quintana and Iglesias (2003), Vogl et al. (2005) and Lau and Green (2007) are excellent papers on Bayesian clustering, but they are not directly interested in the clustering of time-course data. Note also that our clustering problem is different from the one approached, for example, by Muller et al. (2008) who include a regression on covariates. We do not have any covariate information available.

Finally, it is the paper by Heard et al. (2006) that, in our opinion, stands out in the field of clustering of time-course data. The model they propose is based on the ideas developed in Denison et al. (2002). Heard et al. (2006) propose a fully Bayesian approach with a conjugate family and a hierarchical search, without approximating techniques such as MCMC. This allows the clustering of many thousands of genes without pre-filtering the data. Moreover, it is not necessary to use approximating measures such as BIC, as the exact marginal probabilities are available. For the specific purpose of time-course data, this model has the advantage of being time invariant, and it is also very flexible.

For example, instead of observations over time, this could be adapted for observations treated with different doses or exposed to different treatments. Finally, as for any Bayesian analysis, the summary statistics of the posterior distribution of the regression coefficients have a clear interpretation. We therefore review the method proposed by Heard et al. (2006) in the next section.

Documento similar