• No se han encontrado resultados

2. MARCO TEÓRICO

2.1 FUNDAMENTACIÓN TEÓRICA

2.1.3 Inversión Privada

2.1.3.5 Inversión Extranjera Directa

As the special case of the multinomial model (1.2) with only two categories, we will consider the Binomial model

f(x|θ) = n s θs(1−θ)n−s, (1.9)

wherex, the vector ofnobservations, is composed of scalarxi's being either0or1, denoting

`failure' or `success' in an experiment with these two outcomes. s=Pn

i=1xi is the number of successes, and the (unknown) parameter θ ∈ (0,1) is the probability for `success' in a

single trial. (1.9) can be written in the canonical exponential family form (1.4):

f(x|θ)∝exp n log θ 1−θ s−n −log(1−θ) o .

We have thus ψ = log(θ/(1− θ)), b(ψ) = −log(1 −θ), and τ(x) = s. The function log(θ/(1−θ)) is known as the logit, denoted by logit(θ).

From these ingredients, a conjugate prior on ψ can be constructed along (1.5), leading

here to plog θ 1−θ |n(0), y(0)dψ ∝expnn(0)hy(0)log θ 1−θ + log(1−θ)iodψ .

This prior, transformed to the parameter of interest θ,

p(θ|n(0), y(0)) dθ ∝θn(0)y(0)−1(1−θ)n(0)(1−y(0))−1dθ ,

is a Beta distribution with parametersn(0)y(0) and n(0)(1y(0)), in short,

θ∼Beta(n(0)y(0), n(0)(1−y(0))).

10For more details, see, e.g., Robert (2007, Ÿ2), where loss functions typical for statistical settings are

12 1. Introduction ˆ θ θ l(ˆθ−θ) A ˆ θ θ l(ˆθ−θ) B ˆ θ θ l(ˆθ−θ) C

Figure 1.1.: The quadratic (left), absolute (center), and check (right) loss functions. The combination of a Binomial sampling model with this conjugate Beta prior is called Beta-Binomial model. Here, y(0) = E[θ | n(0), y(0)] can be interpreted as prior guess of θ, while n(0) governs the concentration of probability mass around y(0), with large values of

n(0) giving high concentration of probability mass. Due to conjugacy, the posterior onθ is

a Beta(n(n)y(n), n(n)(1y(n))), where the posterior parameters n(n) and y(n) are given by (1.6).

A point estimator for θ can be extracted from the posterior distribution p(θ | x) by

considering Θ as the set of possible acts, and choosing a loss function. The loss function l gives a functional form for the severity of deviations of an estimator to its goal; here, it

values the distance of a point estimatorθˆtoθ.

The quadratic loss function l(ˆθ, θ) = (ˆθ − θ)2 values small deviations relatively low, whereas large deviations are given a high weight (see Figure 1.1 A). As can be shown (see, e.g., Casella and Berger 2002, pp. 352f), the quadratic loss function leads to the posterior expectation as the Bayesian point estimator. Here, E[θ | x, n(0), y(0)] = E[θ | n(n), y(n)] =

y(n), and so the posterior expectation of θ is a weighted average of the prior expectation

E[θ|n(0), y(0)] =y(0) and the sample proportions/n, with weightsn(0) and n, respectively.

Taking the absolute loss function l(ˆθ, θ) = |θˆ−θ| leads to the median of the posterior

distribution as the point estimator (see Figure 1.1 B). Here, med(θ | n(n), y(n)) has no

closed form solution, and must be determined numerically. More generally, taking the check function as the loss function,

l(ˆθ, θ) =

(

2q(ˆθ−θ) if θˆ−θ ≥0 2(q−1)(ˆθ−θ) if θˆ−θ <0 ,

a tilted version of the absolute loss function (see Figure 1.1 C), leads to the quantile

q∈(0,1)of the posterior as point estimate.11

11The check function is usually given without the factor2, as it is not relevant for the optimisation. We

have included it to make the relation to the absolute loss function more clear; for q= 0.5, the check

1.2 Some Fundamentals 13 The indicator loss function

l(ˆθ, θ) =

(

0 |θˆ−θ| ≤ 1 else ,

for→0, leads to the maximum of the posterior, often abbreviated as MAP (maximum a

posteriori) estimator (see, e.g., Bernardo and Smith 2000, Ÿ5.1.5, p. 257, or Robert 2007, Ÿ4.1.2, p. 166). For a Beta(n(n)y(n), n(n)(1y(n))), the mode is

modep(θ |n(n), y(n)) = n

(n)y(n)1

n(n)2 =

n(0)y(0)−1 +s n(0)2 +n ,

and thus is a weighted average of the prior mode n(0)y(0)1

n(0)2 (n

(0) > 2) and the sample proportion s/n, with weights n(0)2 and n, respectively.

Note that asymptotic optimality properties of maximum likelihood estimators (consis- tency, eciency) are usually preserved for these Baysian point estimators (e.g., Robert 2007, Note 1.8.4, pp. 48f).

In the Bayesian approach, interval estimation is rather simple, as the posterior distri- bution delivers a direct measure of probability for arbitrary subsets of the parameter space

Θ. Mostly, so-called highest posterior density (HPD) intervals are considered, where for a

given probability level γ the shortest interval covering this probability mass is calculated.

For unimodal densities, this is equivalent to nding a thresholdα such that the probability

mass for the set of all θ with p(θ | n(n), y(n)) ≥ α equals γ, hence the name.12 For the

Beta posterior, a HPD interval forθ must be determined by numeric optimisation. For ap-

proximately symmetric (around 1

2) Beta posteriors, a good approximation is the symmetric credibility interval, delimited by the 1−γ

2 - and the 1+γ

2 -quantile of the posterior.

The testing of hypotheses concerning the parameter of interest θ can be done by

comparing posterior probabilities of two (disjunct) subsets of the parameter space. Like in frequentist Neyman-Pearson testing, these are often denoted by Θ0 and Θ1, but unlike there, in Bayesian testing the hypotheses H0 : θ ∈ Θ0 and H1 : θ ∈ Θ1 play a symmetric role. Therefore, it is also possible to express evidence in favour of H0, whereas frequentist

tests are constructed such that they can express conclusive evidence only when H0 is rejected.13 However, point hypotheses, where one ofΘ

0 orΘ1 consists of a single element of the parameter space only (usually,Θ0 ={θ0}),14 require a special treatment if the prior on

Θis absolutely continuous, as is the case for the priors (1.5) considered here, because then, P(θ∈Θ0) = 0 for any θ0. Such an inference task can be considered in terms of a problem of model selection, whereθ =θ0 vs.θ6=θ0 decides between two dierent statistical models,

12See, e.g., Bernardo and Smith (2000, Ÿ5.1.5, pp. 259f), or Robert (2007, Def. 5.5.3, p. 260).

13In Neyman-Pearson testing, only the probability for the error of the rst kind, denoted byα, of rejecting

H0although it is true, is set to a low predened level, whereas the probability for the error of the second

kind, denoted by β, of acceptingH0 although it is false, may be very large. 1−β, the probability of

correctly rejectingH0, is also known as the power of a test.

14 1. Introduction to each of which a prior probability is assigned (e.g., Robert 2007, Ÿ5.2.4). An overview on Bayesian testing, including a detailed comparison with classical testing procedures, is given in Robert (2007, Ÿ5.25.4).

Especially in model selection problems, evidence against, or in favor of, a hypothesis is often not expressed in posterior probability for hypotheses, but by means of the so-called Bayes factor, arising when considering odds instead of probability:15

p(H1 |x) p(H0 |x) = f(x|H1) f(x|H0) · p(H1) p(H0) Here, the factor

B10:=

f(x|H1)

f(x|H0)

,

translating prior to posterior odds, is the Bayes factor for comparing H1 toH0.16

Improper priors are problematic in Bayesian testing (Robert 2007, Ÿ5.2.5) and should be avoided for parameters the test decides upon (Kass and Raftery 1995, p. 782).17 We see

this as a strong argument against the use of improper priors. An example where improper priors seem inadequate also for parameter estimation will be given in Section 1.3.4; com- ments on improper priors from the viewpoint of generalised Bayesian inference are given in Section 3.1.2, item V, and in Section 3.1.3.

The posterior predictive distribution, giving the probability for s∗ successes in n∗

future trials after having seen s successes in n trials, is f(s∗ |n(n), y(n)) = n∗ s∗ B s+n(n)y(n), ns+n(n)(1y(n)) B n(n)y(n), n(n)(1y(n)) ,

known as the Beta-Binomial distribution.18

Documento similar