OVINOS 60 días. Si se compra el cordero pero no se mantiene por un mínimo de 60 días, entonces el establecimiento anterior también debe
4. Oficial de bienestar de los pollitos
2.1 SeasMig
We implemented in Java a tool (http://bitbucket.org/pascualgroup/seasmig) for migration model inference. This tool can also perform stochastic mapping, based on an initial distribution of trees and geographic annotations. Alternative migration model parameters can be inferred and compared by their marginal likelihoods including seasonal and epochal phylogeographic migration models. An empirical distribution of trees in nexus format is given as input. Our tool uses an MCMC to sample from a posterior distribution of model parameters and stochastically mapped migration events along branches and trunk lineages.
2.2 Bayesian Inference
In a Bayesian framework, both the data and the model parameters are assumed to be stochastic. Rather than finding the set of parameters that maximizes the likelihood of a particular observation, we estimate the distribution of the model parameters that can
146
lead to the observed data. The probability of observing a specific set of model parameters conditioned on observing the data is known as the posterior probability and can be written according to Bayes' rule:
(1)
denotes he prior probability of observing a specific set of parameters, while denotes the likelihood of observing the data given the model parameters . The probability of observing the data without the context of a model (or
models) is most often unknown. As such samples from the posterior distribution are known in probability often only with relation to other samples.
The probability of observing the data in a context of a specific model , used to fit the data, can be calculated by summing up the probability of observing the specific model parameters (prior) multiplied by the probability of observing the data conditioned on the model parameters (likelihood), across all the parameter values
(2)
or in a more general continuous notation:
(3)
Where is referred to as the marginal likelihood, is the prior distribution and is the likelihood function.
147 2.3 Non-Seasonal Migration Model
We assume that discrete geographic location diffuses along branches of the tree
following a continuous time Markov chain (CTMC) process. In this case, non-seasonal migration processes are characterized by a single rate matrix Q:
(4)
where represents the migration rate between location to location .
2.4 Non-Seasonal Migration Model Parameterization and Priors
Rates are assumed to have an exponential prior with a rate hyperprior
parameter which is shared across all the rates and is itself exponentially distributed with unit mean
Note: the rate hyper prior was added at a later stage and is not included in non-seasonal analysis in the main body of the text .
2.5 Matrix Exponentiation
Matrix exponentiation is used to convert migration rates, to probability distributions, which concern the state of nodes along the tree. We first focus on processes along individual branches of the phylogenetic tree.
148
Given a branch connecting parent node x to child node y, , of length . We assign node a vector which defines its probability of being at state :
(5)
We assume, in the simplest case, that states along branches behave as homogeneous Poisson processes with a rate matrix Q, as such the state distribution of node can be written as:
(6)
Where is the transition probability matrix and can be calculated as follows:
(7)
The matrix exponent can be defined by the Taylor expansion of the exponent function:
(8)
Multiple alternative algorithms are implemented for matrix exponentiation. Several algorithms were either imported (JBLAS) or directly implemented in the code including the Taylor Series, the Padé approximate (Higham 2009), and Eigen- Decomposition. For matrices of rank 3 or less, and for specific cases of matrices of rank 4 (HKY (P. Beerli and Felsenstein 1999), JC69 (Jukes and Cantor 1969)), analytic solutions exist and were implemented. All matrix exponentiation algorithms were cross-validated within the package.
149 2.6 Two Seasonal Migration Model
A seasonal migration model which is a variation of (Bielejec et al. 2014), was
established by using two different migration rate matrices and for two parts (seasons) of the year:
(9)
The exact partitioning of the year is defined by the start and end of season A (without loss of generality). Where: and .
For example, for and the rate matrix applies to all branch parts within January-March, while rate matrix applies to all branch parts within April- December. To estimate the transition probabilities between two states at different times, the respective transition probability matrices are calculated for the individual year parts through matrix exponentiation. For instance, for the same partitioning of the year, given a branch spanning from year to year
, the state distribution of node can be calculated as:
(10)
150
(11)
is the fraction of the branch within season A, and is the fraction of the branch within season B.
2.7 Two Seasonal Migration Model Parameterization and Priors
2.7.1 Migration Seasonality Based on a Specific Source and Destination
Migration rates for the two partitions of the year were parameterized as follows:
(12)
Where is referred to as the mean migration rate, and are referred to as the seasonal scaling parameters.
As is the case in the non-seasonal model, mean rates are assumed to have an exponential prior with a rate hyper prior parameter which is shared
across all the rates and is itself exponentially distributed with unit mean The seasonal scaling parameters ( are assumed to have a uniform prior
. The scaling parameter was used instead of two separate rates, to
separate the inference of mean migration rates, from the inference of the seasonality of migration.
151
For source based migration seasonality rates (Equation 9) are parameterized in the following way:
(13)
where are the source based seasonal scaling parameters. Mean rates are assumed to
have an exponential prior with a rate hyper prior parameter which is
shared across all the rates and is itself exponentially distributed with unit mean The seasonal scaling parameters ( are assumed to have a uniform
prior
2.7.3 Migration Seasonality Based on Destination
For destination based migration seasonality rates (Equation 9) are parameterized in the following way:
(14)
where are the destination based seasonal scaling parameters. Mean rates are
assumed to have an exponential prior with a rate hyper prior parameter
which is shared across all the rates and is itself exponentially distributed with unit mean The seasonal scaling parameters ( are assumed to have a
uniform prior .
2.7.4 Migration Seasonality Based on Source and on Destination
For destination based migration seasonality rates (Equation 9) are parameterized in the following way:
152
where and are the source and destination based seasonal scaling parameters
respectively. Mean rates are assumed to have an exponential prior with a
rate hyper prior parameter which is shared across all the rates and is itself exponentially distributed with unit mean The seasonal scaling parameters ( are assumed to have a uniform prior or and
2.8 Tree Likelihood Calculation
Given a tree, a specific and parameterized trait evolutionary (substitution) model, and the state of traits on the tips of the tree, a tree likelihood can be calculated (J. Felsenstein 1981).
In general, this likelihood can be calculated by integrating (enumerating and summing up) the likelihood of all possible internal node states. This is done efficiently by calculating and storing the likelihood of sub-trees, recursively progressing from the tips towards the trunk of the tree (J. Felsenstein 1981).
The transition probability matrix is defined according to Equation 7 for a non-seasonal model and according to Equation 11 for a two seasonal model. The transition probability matrix is used to calculate the likelihood of node states along individual branches of the tree.
The prior assumption about the state of the root of the tree usually follows either an equal probability of being at each state, an empirical estimate of being at a given state, or the stationary distribution of the substitution model:
(16)
where is the initial state of the system and assumed to be an equal probability of being in each state The value of is only relevant if isolated populations exist, and stationary conditions depend on their populace.
153
Since there is no such stationary distribution for a seasonal model, we used the stationary distribution of the corresponding seasonal migration matrix at the root node time, this assumes some convergence to the stationary distribution within each season. Alternative estimates can be derived. The inference is not sensitive to the specific root prior assumptions in this case.
2.9 Stochastic Mapping
Stochastic mapping is an additional step following the calculation of tree likelihood and ancestral state reconstruction at the nodes of a tree. This mapping allows us to generate a stochastic realization of the state of branches along the tree, in addition to the state of internal nodes, and in so doing, provides samples of migration and mutation events, and their timing along the tree that lead to the observed tip states. Stochastic mapping of both sequence (nucleotide) and character (e.g. geographic) annotations is available in SeasMig, together with the option of incorporating seasonal migration models. Stochastic mapping is implemented directly in our code based on (J. Bollback 2006). Improved performance could be achieved using (Minin and Suchard 2008).
A given type of event, migration or mutation, is assumed to behave as a Poisson process along a branch with a rate matrix Q:
Q=
(17)
As such, the timing of the next event given the present state follows an exponential distribution with the rate parameter , where x is the present character
154
Once the timing of the next event is determined, it is chosen based on its relative probability compared to other transition (emigration) events:
(18)
Given a branch connecting parent node to child node , defined to span from time to time , and with ancestrally reconstructed states and respectively. Stochastic events are generated starting from , repeatedly until the state of node y is correctly reconstructed. That is, until an event prior to the timing of node y results in the state
, and an additional event time generated is timed to be beyond .
Branch reconstructions that span across seasons were performed by stochastically reconstructing the state between the seasons’ boundaries using the initial migration matrix, and by continuing the stochastic mapping forward using the second seasonal matrix and so forth. This is process is reinitialized from node x, until the state of the node y is correctly mapped. The validity of these processes relies on the memory less nature of the Poisson process.
2.10 Markov-Chain Monte Carlo (MCMC)
Markov-chain Monte Carlo, or MCMC, is an algorithm that allows sampling from analytically intractable distributions. Such distributions include the distribution of tree likelihoods given a mutational or a migration model.
The general idea of an MCMC method is to set up a sequence of dependent samples that is guaranteed to converge to a target distribution, in this case the posterior distribution of our model. In the Metropolis-Hastings algorithm, a change is proposed to the current state, drawn from a proposal distribution over possible changes . This change is either rejected, in which case the current sample is repeated,
155
or the proposed change is accepted as the new sample. The Metropolis-Hastings acceptance probability (Metropolis et al. 1953; Hastings 1970):
(19)
guarantees that the sequence of samples will converge to the posterior distribution.
2.11 Metropolis-coupled MCMC (MC3)
We use (http://github.com/edbaskerville/mc3kit) for MCMC functionality (Baskerville, et al. 2011; Baskerville et al. 2013). Additional functions for sampling and evaluating tree likelihoods were implemented.
Although the Metropolis-Hastings algorithm is guaranteed to converge to the target distribution at some point, local maxima in the likelihood surface can cause a chain to become stuck for long periods of time. One approach to avoiding this problem, known as “Metropolis coupling”, involves running multiple chains in parallel. One chain, the “cold chain”, explores the target distribution, while the other chains, “hot chains”, explore low-likelihood configurations more freely. Periodically, swaps are proposed between chains, allowing good configurations discovered on hot chains to propagate toward to the cold chain.
Rather than exploring the target distribution , heated chains explore
156
Where is a heating parameter. We use uneven spaced values of (Friel and Pettitt 2008), with the hottest chain exploring the prior ( and the coldest chain exploring the posterior ( .
Swap moves are standard Metropolis-Hastings proposals, but rather than considering a change to a single chain, they consider a change to the joint distribution of two chains. The acceptance proability is thus the ratio of the joint distribution after and before the move:
(21)
Where are the configurations that begin in chains i and j, and are the heat parameters of the two chains.
The use of multiple heated chains has the side effect of drastically improving estimates of marginal likelihoods for model selection, as described in the next section.
2.12 Marginal Likelihood Estimation
Enumeration across all possible model parameters is computationally costly and grows exponentially with the number of model parameters. We would like to use MCMC to estimate the marginal likelihood for the sake of comparison among different models. Marginal likelihood estimates derived from a single chain, such as the harmonic mean estimator of Raftery (Kass and Raftery 1995), converge very slowly, because MCMC fails to sample sufficiently from low-likelihood areas. However, it is possible to use the information gathered about low-likelihood areas in heated chains using a technique called thermodynamic integration (Lartillot and Philippe 2006; Peter Beerli and Palczewski 2010), or path sampling (Calderhead and Girolami 2009).
157
Assuming a continuum of heated chains, the thermodynamic estimator of the log- marginal likelihood is:
(22)
where m is the number of samples in the MCMC output, and is a single sample
from the output in a chain with heat parameter (Peter Beerli and Palczewski 2010). With a finite number of chains, we use the trapezoid rule to numerically integrate this integral (Figure 1), using uneven spacing of heats to improve the estimate (Friel and Pettitt 2008).
158
Figure42A.1 Thermodynamic integration of the marginal likelihood
The mean likelihoods of each chain (black dots) are interpolated and used to estimate the marginal likelihood (gray area) (Friel and Pettitt 2008). The maximum likelihood (dotted line) is asymptotically approaches the mean likelihood as .
2.13 Model Selection via Marginal Likelihood
The Bayesian framework provides a natural way to make probabilistic inferences based on a particular model. However, we also want to be able to choose between different models by quantifying their relative goodness of fit. One approach to Bayesian model selection can be framed directly in terms of Bayes’ rule, mirroring the process for estimating the posterior distribution over parameters for a single model.
Consider two models, and , to which we assign prior weight and . After the data has been observed, we can calculate the posterior probability of the models using Bayes’ rule:
(23)
159
Where the denominator is equal to the probability of observing the data unconditional of the particular model at play, . The probabilities and are the marginal likelihoods of the two models, corresponding to Equation 3. If we give the two models equal prior weight, then the relative posterior weight of the two models is simply given by the marginal likelihoods. This reasoning extends naturally to any number of models.
The ratio of the marginal likelihoods is often called the Bayes factor (Jeffreys 1935; Jeffreys 1961; Kass and Raftery 1995), and is equal to the posterior odds ratio of the two models, assuming equal prior weight:
(24)
The Bayes factor provides a convenient way to compare models: if B12=10, then we
consider support for model M1 to be ten times stronger than model M2. In AIC-based
selection, the Bayes factor is analogous to a ratio of Akaike weights (Burnham and Anderson 2002).
The marginal likelihood of a a model is the likelihood averaged over the prior distribution. That is, it is the likelihood one would expect by randomly sampling parameters from the prior distribution:
(25)
This value serves as a useful measure of model fit because it directly incorporates the dependence of the likelihood on uncertainty in parameter values, implicitly penalizing extra degrees of freedom (Bolker 2008). If an additional parameter
160
improves the maximum likelihood but decreases the average likelihood, the model suffers from over fitting relative to the simpler model.
2.14 Convergence
Methods for estimating model convergence were not directly implemented within our package. Such tools include (A. Rambaut and Drummond 2003) which can be used to estimate the number of effective number of samples from an MCMC chain. This is necessary since MCMC chains include auto correlated samples.
2.15 Variable Selection
To assess whether the inclusion of migration between different communities is informative, and to establish if rates are seasonal, Bayesian variable selection (O’Hara and Sillanpää 2009) was implemented.
Our implementation is based on (Kuo and Mallick 1998) but differs in that it is implemented within an MC3 framework. Indicator variables which can take a value of either 0 or 1 prefix parameters of interest. Bayes factors for the inclusion of a specific parameter are calculated as:
(26)
and represent the ratio of the marginal likelihoods of the two models, with and without the variable of interest parameterized. Symmetric non-informative priors were used for the indicators. Bayes factors are estimated as the ratio of the number of posterior samples of the cold chain in which the indicator was 1 compared to 0. The use of an MC3 framework reduces the probability of variables getting stuck in a specific configuration (on or off) as heated chains continue to sample from the prior and flattened likelihood distributions. In theory, it may be possible to use thermodynamic integration to obtain better estimates of Bayes factors.
161
Rates (Equation 4) are parameterized in the following way where
have an exponential prior with a rate hyper prior parameter which is
shared across all the rates and is itself exponentially distributed with unit mean The indicators are drawn from an equal probability prior distribution.
Note: a rate hyper prior was added at a later stage and is not included in non-seasonal analysis in the main body of the text.
2.17 Two-Seasonal Migration Model Parameterization with Variable Selection
Rates (Equation 9) are parameterized in the following way:
(27)
where is referred to as the mean migration rate, and are referred to as the seasonal scaling parameters. As is the case in the non-seasonal model, mean rates are
assumed to have an exponential prior with a rate hyper prior parameter
which is shared across all the rates and is itself exponentially distributed with unit mean The seasonal scaling parameters ( are assumed to have a
uniform prior . The seasonal scaling indicators , and the rate indicators are drawn from an equal probability prior distribution.
2.18 Combining the likelihood of multiple protein trees
A conservative approach was used to combine the information present in multiple protein trees with respect to the model likelihood. The combined protein tree log- likelihood is averaged across the multiple protein trees, to account for the possible lack of independence in the information contained in the two trees with respect to migration rates and seasonality. This choice does not affect the maximum likelihood model parameter choice but has the effect of widening confidence intervals when the multiple protein trees provide independent data, while providing the correct
162
confidence interval when the proteins are in complete linkage and have the exact same evolutionary history. Tree weights can be specified as configuration parameters.
163
A.3 Results
3.1 Inference of non-seasonal and seasonal migration rates
In this analysis we infer seasonal and non-seasonal migration rates from a single tree topology and stochastically generated tip locations based on a known input migration model. A single hemagglutinin tree topology with 2859 tips was used for this analysis. Tip collection dates span from 1981-2009. Non-seasonal and two-seasonal migration