• No se han encontrado resultados

Artículo X. Aplicación Supletoria de la Ley Penal.

3.4. PRINCIPIO DE PROTECCIÓN DE LOS BIENES JURÍDICOS O DE LESIVIDAD

• Result I. Consistency: Convergence of MLE s and the scaled log-likelihood. • Result II: The Asymptotic Normality of the score function.

• Result III. Contiguity- We prove the asymptotic equivalence of observed

information and Fisher’s matrix. This, combined with Result II leads directly to the asymptotic distribution of parameter estimates.

• Result IV. Posterior Convergence:discusses the convergence of posterior under any continuous prior distribution.

5.3

Consistency

Here, we prove the consistency of HMM parameter estimates. We employ a martingale approach to centered ratios of log likelihood. In the course of this proof, we shall exploit the uniform forgetting property of the hidden markov models. The latter will follow from the results on the ignorability of the initial condition and equating moments of shifted markov processes. In a latter subsection, we prove identifiability. This will serve as a crucial condition for the final proof. Let

Dk=log(Lk)−log(Lk−1)

denote the log likelihood increment. For notational convenience we will writefi(Yk) as simplyfi,k We can write this as

log a[k,0]f0,k+a[k,1]f1,k a[k−1,0]f0,k−1+a[k−1,1]f1,k−1

This can be further written as log[[a[k1,0]f a[k,0]

0,k−1+a[k−1,1]f1,k−1]f0,k+ [

a[k,1]

a[k−1,0]f0,k−1+a[k−1,1]f1,k−1]f1,k]

Which is another form of the expression

log(P(Xk = 0|Y1....YK−1)f0,k+P(Xk= 1|Y1....Yk−1)f1,k)

We start by showing that E(P(Xk= 0|Y1....Yk−1)f0,k+P(XK = 1|Y1....Yk−1)f1,k) is a Cauchy sequence in k.

Form > k consider the expression

E((P(Xm= 0|Y1....Yk−1)f0,m+P(Xm = 1|Y1....Yk−1)f1,m) −(P(Xk= 0|Y1....Yk−1)f0,k+P(Xk= 1|Y1....Yk−1)f1,k)) We employ the trick of writing the above expression as

E((P(Xm= 0|Y1....Yk−1)f0,m+P(Xm = 1|Y1....Yk−1)f1,m)

−(P(Xm= 0|Y1....Yk−1, XM−k= 0)f0,m+P(Xm = 1|Y1....Yk−1, Xm−k= 1)f1,m)

−(P(Xk= 0|Y1....Yk−1)f0,k+P(Xk= 1|Y1....Yk−1)f1,k)) Which can be written as the sum of two components:

E((P(Xm = 0|Y1....Yk−1)f0,m+P(Xm = 1|Y1....Yk−1)f1,m)−

(P(Xm = 0|Y1....Yk−1, XM−k= 0)f0,m+P(Xm= 1|Y1....Yk−1, Xm−k = 1)f1,m)) and

E(P(Xm= 0|Y1....Yk−1, Xm−k= 0)f0,m+P(Xm = 1|Y1....Yk−1, Xm−k= 1)f1,m) −(P(Xk= 0|Y1....Yk−1)f0,k+P(Xk= 1|Y1....Yk−1)f1,k))

Note that the second component is 0 since the two random variables whose first moments are taken have the same distribution. One is a shifted version of the other. Due to the Markov structure, the distribution of the processes depend only on the initial condition. For the first component we use the uniform forgetting property. This gives us the bound: ||(P(Xm= 0|Y1....Yk−1)−P(Xm = 0|Y1....Yk−1, Xm−k= 0)||T V <(1−σ

σ+)k =τk whereτ

is less than 1.

This shows that E(P(Xk= 0|Y1....Yk−1)f0,k+P(XK = 1|Y1....Yk−1)f1,k is a Cauchy sequence in k.

Now|log(x)−log(y)|< min|x−(x,yy|)

Using the above and the fact that (P(Xk= 0|Y1....Yk−1)f0,k+P(Xk= 1|Y1....Yk−1)f1,k) is bounded it is easy to show that

log[E(P(Xk= 0|Y1....Yk−1)f0,k+P(Xk= 1|Y1....Yk−1)f1,k)] is a Cauchy sequence as well. Jensen’s inequality can now be used to show that:

E[log(P(Xk= 0|Y1....Yk−1)f0,k+P(Xk= 1|Y1....Yk−1)f1,k)] is a Cauchy sequence. We now write the entire log likelihood

Ln=L0+

X

[Lk−Lk−1]

LetE(Lk−Lk−1) be denoted asUk.

Now (1/n)Ln= (1/n)[L0+Pni=1[Zk+Uk]]

We have shown thatUk is Cauchy and hence convergent. The sequenceZk is a mean zero martingale. Recall that Zk=log(P(Xk= 0|Y1....Yk−1)f0,k+P(Xk= 1|Y1....Yk−1)f1,k) P(Xk = 0|Y1....Yk−1)f0,k+P(Xk = 1|Y1....Yk−1)f1,k<1

and since the parameters lie in a bounded set. So any continuous function in this set is bounded . The boundedness ofZk implies that Kolmogrov Strong Law of Large numbers for martingales holds andP

(Zk/n) is convergent almost surely. From this, we get that Ln/n is convergent.

5.3.1 Identifiability

We go on to establish the identifiability conditions for the base and emission models by extending the results of Teicher (1967) Teicher (1967) . For this, we use the following main definition and result.

Definition: Let fφ(y) be a parametric family of densities of Y with respect to a common dominating measureµand parameterφin some set Φ If π is a probability measure on Φ then the density

fπ(y) = Z

Φ

fφ(y)πd(φ) is called a mixture density .

We say that the class of (all) mixtures of (fφ) is identifiable if fπ =fπ0µ a.e iffπ=π0

Furthermore we say that the class of a finite mixtures of fφis identifiable if for all measuresπ and π0 with finite supportfπ =fπ0µa.e iffπ =π0.

Result 1: The class of joint finite mixtures of the normal family is identifiable (Telcher,1960).

Result 2: Assume that the class of finite mixtures of the familyfφof densities of Y with parameterφ∈Φ is identifiable . Then the class of finite mixtures of n-fold product densitiesfφ(n)(y) =fφ(y1)...fφ(yn) with parameterφ∈Φnis identifiable. The result was proved by induction on n. [ Teicher (1967)].

We shall now apply the above results to prove the identifiability of the base model. (Note that any hidden markov model is a finite mixture of n-fold product densities, where the the weights of the mixture are functions of the transition probabilities). Now, suppose there are two sets of parameters

Θ = (λ, µ, µ1, µ2, σ1, σ2)

and

Θ0= (λ0, µ0, µ01, µ02, σ10, σ02) yielding the same likelihood. We need to show that

Θ = (λ, µ, µ1, µ2, σ1, σ2) (5.3.1)

= (λ0, µ0, µ01, µ02, σ01, σ02) (5.3.2)

Proof: From result 1 and the fact that and our model is a finite mixture of n-fold densities of normal likelihoods we can directly invoke Result 2 to get the following. The finite collection of n dimensional densities and the mixture weights attached to these densities shall be the same for the two parameter settings.

Now since the n-fold densities arise from the normal family, they have a one to one correspondence with the collection of n dimensional parameter sets formed by the n-fold convolution of the sets (µ1, σ1) and (µ2, σ2) . Since the finite collection of n dimensional

densities have a one to one correspondence with the emission parameters, and the equality of the collection is established from result 1) we have

(µ1, σ1),(µ2, σ2) = (µ01σ10),(µ02, σ02)

and hence the individual parameters are equal up to a permutation or reordering. Now we make use of the fact that the mixture weights are equal. In order to translate this condition to an equality condition on the transition rates, we exploit the relation between the transition rates and the transition probability functions. From the equality of the mixture weight probabilities, it can be inferred that the laws of these two processes are the same, i.e P(Xi =a|Xi−1 =b) for any sequence position i, and for any states a, b,

up to a permutation of the indices a and b. Let the permuted indices be denoted asa0, b0. Thus, if we denote the transition probability under Θ0 asP0(Xi=a|Xi−1=b) we have

P =P0

Adding up the equations forP00[t] andP11[t] for both Θ andP000 [t] andP110 [t] for both Θ0

we get

e(−λ−µ)t=e(−λ0−µ0)t∀ t (5.3.3) ⇒λ−λ0 =µ−µ0 (5.3.4) Assumea0 =aand b0 =b.

Then, from the equation ofP00[t] we have:

µ λ+µ+ λ λ+µexp−(λ+µ)t= µ0 λ0+µ0 + λ0 λ0+µ0 exp−(λ 0 +µ0)t

Subtracting Right hand side from left hand side we get

[ µ λ+µ − µ0 λ0+µ0] + [ λ λ+µ− λ0 λ0+µ0] exp−(λ 0+µ0)t] = 0

Noting thatλ−λ0 =µ−µ0 and equivalentlyλ+µ=λ0+µ0 we get

(λ−λ0)(1−exp((−λ−µ)t)) = 0∀ t (5.3.5) ⇒λ=λ0 = 0 (5.3.6) Again using λ−λ0=µ−µ0 we get µ=µ0

If we assumea0 =b and b0=awe can repeat the above steps to see thatλ=µ0 and µ=λ0 Thus the identifiability of the model is completely proved.

5.3.2 Identifiabilty and Asymptotics of log likelihood gives us