Tasa Neta de Escolarización - La Educación en el Ecuador

3. MARCO METODOLÓGICO

3.6. TÉCNICAS DE PROCESAMIENTO, ANÁLISIS Y DISCUSIÓN DE

3.6.1.1. La Educación en el Ecuador

3.6.1.1.6. Tasa Neta de Escolarización

As a simple demonstration that conjugate models might not react to prior-data conict reasonably, inference on the mean of data from a scaled normal distribution and inference on the category probabilities in multinomial sampling will be described in the following two subsections.

A.1.2.1. Samples from a scaled Normal distribution

The conjugate distribution to an i.i.d.-samplexof sizen from a scaled normal distribution

with meanµ, denoted byN(µ,1)is a normal distribution with meanµ(0)and variance_σ(0)29.

7_{See also, e.g., the list of applications of the IDM given in Section 3.1.3.}

8_{For more details on the topic of imprecision and prior-data conict, see Section 3.3.}

9_{Here, and in the following, parameters of a prior distribution will be denoted by an upper index} (0)_,

A.1 Bayesian Linear Regression: Dierent Conjugate Models and Their

(In)Sensitivity to Prior-Data Conict 137

The posterior is then again a normal distribution with the following updated parameters:10

µ(n)= 1 n 1 n +σ (0)2µ (0)₊ σ(0)2 1 n+σ (0)2x¯= 1 σ(0)2 1 σ(0)2 +n µ(0)+ ₁ n σ(0)2 +n ¯ x (A.1) σ(n)2 = σ (0)2_· 1 n σ(0)2₊ 1 n = ₁ 1 σ(0)2 +n . (A.2)

The posterior expectation (and mode) is thus a simple weighted average of the prior mean

µ(0) _{and the estimation from data} _x_¯_{, with weights} _1/σ(0)2 _and _n_{, respectively.}11 _The variance of the posterior distribution is getting smaller automatically.

Now, in a situation where data is scarce, but with prior information one is very condent about, one would choose a low value for σ(0)2_{, thus resulting in a high weight for the prior} mean µ(0) in the calculation of _µ(n). The posterior distribution will be centered around a mean between µ(0) _and _x_¯_{, and it will be even more pointed as the prior, because} _σ(n)2 _is considerably smaller than σ(0)2, the factor to σ(0)2 in (A.2) being quite smaller than one.

The posterior basically would thus say that one can be quite sure that the mean µ is

aroundµ(n), regardless if_µ(0) and _x_¯were near to each other or not, where the latter would be a strong hint for prior-data conict. The posterior variance does not depend on this; the posterior distribution is thus insensitive to prior-data conict.

Even if one is not so condent about one's prior knowledge and thus assigning a relatively large variance to the prior, the posterior mean is less strongly inuenced by the prior mean, but the posterior variance still is getting smaller, no matter if the data support the prior information or not.

The same insensitivity appears also in the widely used Dirichlet-Multinomial model as presented in the following subsection:

A.1.2.2. Samples from a Multinomial distribution

Given a sample of sizenfrom a multinomial distribution, with probabilitiesθj for categories or classes j = 1, . . . , k, subsumed in the vectorial parameter θ (with Pk

j=1θj = 1), the conjugate prior on θ is a Dirichlet distribution Dir(α). Written in terms of the canonical

parameters n(0) and _y(0) as in Section 1.2.3.5, _α

j = n(0) ·y (0) j , such that Pk j=1y (0) j = 1,

(y₁(0), . . . , y_k(0))T =:y(0). Recall that the components of _y(0) have a direct interpretation as prior class probabilities, whereasn(0) is a parameter indicating the condence in the values

of y(0)_{, similar to the inverse variance as in Section A.1.2.1 (the quantity} _n(0) _{will appear} also in Section A.1.4).12

10_{This is the Normal-Normal model from Section 1.2.3.4, where}_σ2

0 = 1,y(0)=µ(0), andn(0)= 1/σ(0)2.

11_{The reason for using these seemingly strange weights will become clear later.} 12_If _θ _∼_Dir(_n(0)_,_y(0)₎_{, then} _Var(_θ

j) =

y_j(0)(1−y(0)_j )

n(0)₊₁ . If n

(0) _{is high, then the variances of} _θ _{will become}

138 A. Appendix As seen in Section 1.2.3.5, the posterior distribution, obtained after updating via Bayes' Rule with a sample vector n = (n1, . . . , nk),

j=1nj = n collecting the observed counts in each category, is a Dirichlet distribution with parameters

y_j(n) = n (0) n(0)₊_ny (0) j + n n(0)₊_n · nj n , n (n) ₌_n(0)₊_{n .}

The posterior class probabilities y(n) are calculated as a weighted mean of the prior class probabilitiesy(0)and nj_n, the proportion in the sample, with weightsn(0)andn, respectively;

the condence parameter n(0) _{is incremented by the sample size} _n_.

Also here, there is no systematic reaction to prior-data conict. The posterior variance for each class probability θj is

Var(θj |n) =

y_j(n)(1−y_j(n)) n(n)_{+ 1} =

y_j(n)(1−y_j(n)) n(0)₊_n_{+ 1} .

The posterior variance depends heavily on y(n)_j (1−y_j(n)), having values between 0 and 1₄,

which do not change specically to prior data conict. The denominator increases from

n(0)_{+ 1} to_n(0)₊_n_{+ 1}.

Imagine a situation with strong prior information suggesting a value of y_j(0) = 0.25, so

one could choose n(0) _{= 5}_{, resulting in a prior class variance of} 1

32. Consider a sample of size n = 10 with all observations belonging to class j (thus nj = 10), being in clear contrast to the prior information. The posterior class probability is then y(n)_j = 0.75,

resulting the enumerator value of the class variance to remain constant. Therefore, due to the increasing denominator, the variance decreases to 3

256, in spite of the clear conict between prior and sample information. Of course, one can also construct situations where the variance increases, but this happens only in case of an update of y(0)_j towards 1₂. If y(0)_j = 1₂, the variance will decrease for any degree of prior-data conict.

A.1.3. The Standard Approach for Bayesian Linear Regression

In document La inversión en educación y su incidencia en el crecimiento económico del Ecuador periodo 2000 2014 (página 76-86)