• No se han encontrado resultados

3. MARCO METODOLÓGICO

3.6. TÉCNICAS DE PROCESAMIENTO, ANÁLISIS Y DISCUSIÓN DE

3.6.1.1. La Educación en el Ecuador

3.6.1.1.6. Tasa Neta de Escolarización

As a simple demonstration that conjugate models might not react to prior-data conict reasonably, inference on the mean of data from a scaled normal distribution and inference on the category probabilities in multinomial sampling will be described in the following two subsections.

A.1.2.1. Samples from a scaled Normal distribution

The conjugate distribution to an i.i.d.-samplexof sizen from a scaled normal distribution

with meanµ, denoted byN(µ,1)is a normal distribution with meanµ(0)and varianceσ(0)29.

7See also, e.g., the list of applications of the IDM given in Section 3.1.3.

8For more details on the topic of imprecision and prior-data conict, see Section 3.3.

9Here, and in the following, parameters of a prior distribution will be denoted by an upper index (0),

A.1 Bayesian Linear Regression: Dierent Conjugate Models and Their

(In)Sensitivity to Prior-Data Conict 137

The posterior is then again a normal distribution with the following updated parameters:10

µ(n)= 1 n 1 n +σ (0)2µ (0)+ σ(0)2 1 n+σ (0)2x¯= 1 σ(0)2 1 σ(0)2 +n µ(0)+ 1 n σ(0)2 +n ¯ x (A.1) σ(n)2 = σ (0)2· 1 n σ(0)2+ 1 n = 1 1 σ(0)2 +n . (A.2)

The posterior expectation (and mode) is thus a simple weighted average of the prior mean

µ(0) and the estimation from data x¯, with weights 1/σ(0)2 and n, respectively.11 The variance of the posterior distribution is getting smaller automatically.

Now, in a situation where data is scarce, but with prior information one is very condent about, one would choose a low value for σ(0)2, thus resulting in a high weight for the prior mean µ(0) in the calculation of µ(n). The posterior distribution will be centered around a mean between µ(0) and x¯, and it will be even more pointed as the prior, because σ(n)2 is considerably smaller than σ(0)2, the factor to σ(0)2 in (A.2) being quite smaller than one.

The posterior basically would thus say that one can be quite sure that the mean µ is

aroundµ(n), regardless ifµ(0) and x¯were near to each other or not, where the latter would be a strong hint for prior-data conict. The posterior variance does not depend on this; the posterior distribution is thus insensitive to prior-data conict.

Even if one is not so condent about one's prior knowledge and thus assigning a relatively large variance to the prior, the posterior mean is less strongly inuenced by the prior mean, but the posterior variance still is getting smaller, no matter if the data support the prior information or not.

The same insensitivity appears also in the widely used Dirichlet-Multinomial model as presented in the following subsection:

A.1.2.2. Samples from a Multinomial distribution

Given a sample of sizenfrom a multinomial distribution, with probabilitiesθj for categories or classes j = 1, . . . , k, subsumed in the vectorial parameter θ (with Pk

j=1θj = 1), the conjugate prior on θ is a Dirichlet distribution Dir(α). Written in terms of the canonical

parameters n(0) and y(0) as in Section 1.2.3.5, α

j = n(0) ·y (0) j , such that Pk j=1y (0) j = 1,

(y1(0), . . . , yk(0))T =:y(0). Recall that the components of y(0) have a direct interpretation as prior class probabilities, whereasn(0) is a parameter indicating the condence in the values

of y(0), similar to the inverse variance as in Section A.1.2.1 (the quantity n(0) will appear also in Section A.1.4).12

10This is the Normal-Normal model from Section 1.2.3.4, whereσ2

0 = 1,y(0)=µ(0), andn(0)= 1/σ(0)2.

11The reason for using these seemingly strange weights will become clear later. 12If θ Dir(n(0),y(0)), then Var(θ

j) =

yj(0)(1−y(0)j )

n(0)+1 . If n

(0) is high, then the variances of θ will become

138 A. Appendix As seen in Section 1.2.3.5, the posterior distribution, obtained after updating via Bayes' Rule with a sample vector n = (n1, . . . , nk),

Pk

j=1nj = n collecting the observed counts in each category, is a Dirichlet distribution with parameters

yj(n) = n (0) n(0)+ny (0) j + n n(0)+n · nj n , n (n) =n(0)+n .

The posterior class probabilities y(n) are calculated as a weighted mean of the prior class probabilitiesy(0)and njn, the proportion in the sample, with weightsn(0)andn, respectively;

the condence parameter n(0) is incremented by the sample size n.

Also here, there is no systematic reaction to prior-data conict. The posterior variance for each class probability θj is

Var(θj |n) =

yj(n)(1−yj(n)) n(n)+ 1 =

yj(n)(1−yj(n)) n(0)+n+ 1 .

The posterior variance depends heavily on y(n)j (1−yj(n)), having values between 0 and 14,

which do not change specically to prior data conict. The denominator increases from

n(0)+ 1 ton(0)+n+ 1.

Imagine a situation with strong prior information suggesting a value of yj(0) = 0.25, so

one could choose n(0) = 5, resulting in a prior class variance of 1

32. Consider a sample of size n = 10 with all observations belonging to class j (thus nj = 10), being in clear contrast to the prior information. The posterior class probability is then y(n)j = 0.75,

resulting the enumerator value of the class variance to remain constant. Therefore, due to the increasing denominator, the variance decreases to 3

256, in spite of the clear conict between prior and sample information. Of course, one can also construct situations where the variance increases, but this happens only in case of an update of y(0)j towards 12. If y(0)j = 12, the variance will decrease for any degree of prior-data conict.

A.1.3. The Standard Approach for Bayesian Linear Regression