Estudio del gen nicC - Análisis de los datos de secuencia

III MATERIALES Y MÉTODOS

11 Análisis de los datos de secuencia

3.2 Estudio del gen nicC

We first define the standard conditional composite likelihood function. For i = (iv, v ∈ V ), let X(1), . . . , X(N ) be a sample of size N from the distribution of X, which belongs to a hierarchical log-linear model M∆. We recall that the global log-likelihood function is

l(θ) ∝ N X

i=1

log p(X(i)) = hθ, ti − N k(θ) (4.1.1)

For a given vertex v ∈ V , let Nv be the set of neighbours of v in the given graph G. The composite likelihood function based on the local conditional distribution of Xv given XV\{v} or equivalently, due to the Markov property, the conditional distribution of Xv given its neighbours XNv is L

P S_{(θ) =} Q v∈V Lv,P S(θ) where Lv,P S(θ) = N Y i=1 p(X_v(i)|X_N(i)_v; θ) (4.1.2) and the superscript ”P S” stands for ”pseudo-likelihood”, the name often given to the conditional composite likelihood (Besag (1974)). As given by (2.1.4), for a given cell i, we have

log p(i) = log p(Xv = iv, v ∈ V ) = θ0+ X j/i θj = θ0+ X j/i, S(j)⊆v∪Nv,S(j)6⊆Nv θj + X j/i, S(j)⊆Nv θj + X j/i, S(j)6⊆v∪Nv θj

Let

JP Sv _{= {j ∈ J | S(j) ⊆ v ∪ N}

v, S(j) 6⊆ Nv} = {j ∈ J | v ∈ S(j)},

next we show that elements of set JP Sv _{index the parameters in the v-th component in the condi-}

tional likelihood function, i.e. p(Xv(i)|X_N(i)_v). For iv 6= 0, we have

p(X_v = i_v| X_N_v = i_N_v) = p(X_v = i_v| X_V_\{v}= i_V_\{v}) = p(XV = iV) p(XV\{v}= iV\{v})

= e

θ0+P_{j/i, j∈JP Sv}θj+Pj/i, S(j)⊆Nvθj+Pj/i, S(j)6⊆v∪Nvθj

P k∈I| kV\{v}=iV\{v} eθ0+P_{j/k, j∈JP Sv}θj+P_{j/k, S(j)⊆Nv}θj+P_{j/k, S(j)6⊆v∪Nv}θj = e P j/i, j∈JP Svθj 1 +P k∈I| kV\{v}=iV\{v}, kv6=0e P j/k, j∈JP Svθj (4.1.3) and p(Xv = 0| XV\{v} = iV\{v}) = 1 1 +P k∈I| kV\{v}=iV\{v}, kv6=0e P j/k, j∈JP Svθj (4.1.4)

Equality (4.1.3) is due to the fact that the set of j ∈ J such that j / k, S(j) 6⊆ v ∪ Nv, is the same whether kv = iv or kv 6= iv, and therefore the term e

θ0+Pj/k, S(j)_6⊆kv∪Nvθj

cancels out at the numerator and the denominator. The same goes for the set of j ∈ J such that j / k, S(j) ⊆ Nv.

Remark 4.1.1. In the equation above, we worked with p(Xv|XV\{v}) rather than with P (Xv|XNv),

though the two are equal; we did this to emphasize that

θv,P S = (θj, j ∈ JP Sv), v ∈ V (4.1.5) of the v-th component Lv,P S of conditional composite distribution is a sub vector of θ, the parameter of the global likelihood function.

Except for the pseudolikelihood, there are also some other types of conditional composite likelihood methods. Asuncion et al. (2010) proposed their version of composite likelihood which is the

conditional likelihood of a subset of random variables conditional on another subset. By increasing the size of the local components, the composite likelihood estimation can be made more accurate, but computational complexity is sacrificed. In our research, we modified the pseudo-likelihood based on this idea and proposed the two-hop conditional composite likelihood.

The two-hop conditional composite likelihood function is LP S2_{(θ) =}Q

v∈V Lv,P S2(θ) where Lv,P S2_{(θ) =} N Y i=1 p(X_v(i), X_N(i) v|X (i) N2v). (4.1.6)

The expression of p(Xv(i), X_N(k)_v|X_N(k)_2v) is the same as (4.1.3) and (4.1.4) but with Jv,P S replaced by Jv,P S2 _where

Jv,P S2 _{= {j ∈ J | S(j) ⊆ M}

v, S(j) 6⊆ N2v}. In a parallel way to Remark 4.1.1, we note that

θv,P S2 _{= {θ}

j, j ∈ Jv,P S2}

is a sub vector of θ = (θj, j ∈ J ), the argument of the global likelihood function.

Let Mv be the one-hop or two-hop neighborhood of v. The marginal composite likelihood is the product LM(θ) = Y v∈V N Y k=1 p(X_M(k)_v) = Y v∈V LMv_(θ). _(4.1.7) where LMv_{(θ) =} QN k=1p(X (k)

Mv). The Mv-marginal model is clearly multinomial and the corre-

sponding data can be read in the Mv-marginal contingency table obtained from the full table. The density of the Mv-marginal multinomial distribution is of the general exponential form

f (tMv_{; θ}Mv_{) = exp{ht}Mv_{, θ}Mv_{i − N k}Mv_(θMv_)} _(4.1.8)

where tMv_{, θ}Mv _{and k}Mv _{are respectively the M}

v-marginal canonical statistic, canonical parameter and cumulate generating function.

In order to identify the M_v-marginal model, we first establish the relationship between θ and θMv_{. For the remainder of this thesis, the symbol j is to be understood as an element of I}

whenever used in the notation θMv

j , and it is to be understood as the element of J obtained by padding it with entries j_V_\M_v = 0 whenever used in the notation θ_j. We now give the general relationship between the parameters of the overall model, and those of the Mv-marginal model. The proof is given in Appendix B.1.

Lemma 4.1.1. Let M_v be the one-hop or two-hop neighborhood of v ∈ V . For j ∈ J, S(j) ⊂ M_v, the parameter θj of the overall model, and the parameter θMj v of the marginal model are linked by the following: θMv j = θj + X j0_|j0_/₀_j (−1)|S(j)−S(j0)|log 1 + X i∈I,iMv=j0 expX k|k/i k6/j0 θ_k (4.1.9)

We want to identify which of the marginal parameters are equal to the corresponding overall parameter, and in particular which marginal parameters are equal to zero when the global parameter is equal to zero. Let Mc_v denote the complement of Mv in V . We define the buffer set at v as follows:

Bv = {w ∈ Mv | ∃w0 ∈ Mcv with (w, w0) ∈ E}. (4.1.10)

We have the following result.

Lemma 4.1.2. Let Mv be the one-hop or two-hop neighborhood of v ∈ V . For j ∈ J, S(j) ⊂ Mv the following holds:

(1.) if S(j) 6⊂ Bv, then θMj v = θj,

(2.) if S(j) ⊂ Bv, then in general θjMv 6= θj, and (4.1.9) holds. Moreover, for i ∈ I, S(i) ⊂ Mv,

(3.) If S(i) 6⊂ B_v, then θMv

i = 0 whenever θi = 0.

The proof is given in Appendix B.2. From the lemma above, we see that, for j ∈ J such that S(j) ⊂ Mv, S(j) 6⊂ Bv, the corresponding global and Mv-marginal log-linear parameters are equal. We see also that for i ∈ I such that S(i) ∈ Mv, S(i) 6⊂ Bv, if the log-linear parameter is zero in the global model, it remains zero in the Mv-marginal model.

In document Análisis genómico del catabolismo de compuestos aromáticos en Pseudomonas putida KT2440: Caracterización molecular de la ruta de degradación del ácido nicotínico (página 127-131)