1. Problema de investigación
2.4. Otras miradas de las políticas educativas en Colombia
6.2.1
Independent observations and random variables
In a classical estimation problem, we have a parametric family (Pθ0)θ∈Θ of precise proba- bility distributions on a sample space (X,A0) . The task is to estimate the true parameter θ0 ∈ Θ . Most often, it is assumed that the estimation can be based on a whole set of data
x1, . . . , xn ∈ X
which are independent identically distributed according to the true distributionPθ0
0. That
is, the vector x = (x1, . . . , xn) consisting of all observations is distributed according to
the product measure Pθ0
0
⊗n
.
In a (more realistic) imprecise probability setup, it is natural to replace the precise model (Pθ0)θ∈Θ by an imprecise model (P
0
θ)θ∈Θ which consists of coherent upper previsions P 0
θ.
Hence, it is assumed that the data
x1, . . . , xn ∈ X
are independent identically distributed according to the true P0θ
0 or – in other words –
the vector x = (x1, . . . , xn) consisting of all observations is distributed according to a
coherent upper product prevision P0 ⊗θ n
0 .
As stated in the introductory Section 6.1, there are several different ways to define such products of coherent upper previsions. In the following, the type-2 product8 is used which corresponds to a strict sensitivity analyst’s point of view. This product prevision is defined to be that coherent upper prevision
P0 ⊗θ n : L∞ Xn,A0⊗n
→ R
7See e.g. Parr and Schucany (1980), Millar (1981), Donoho and Liu (1988), (Rieder, 1994, §6) and
¨
Ozt¨urk and Hettmansperger (1998)
which has credal set
c`coPθ0⊗n Pθ0 ∈ M0θ where M0
θ denotes the credal set of P
0
θ.
Though this definition of the type-2 product is commonly used, it is not enough elaborated for the following investigations. This is because the minimum distance estimator is based on the empirical measure and, therefore, we have to deal with stochastic processes. In this context, a detailed mathematical formulation of the setup is necessary. This is done by use of random variables and image measures in classical probability theory and mathematical statistics. In the following, it is shown how this formalization can be adopted for imprecise probabilities.
Firstly, let us recall the classical setup: There, a random observation or data point x0 in a set X is mathematically formalized by a map
X0 : Ω → X, ω → X0(ω)
where Ω is a fixed set which is rarely specified more closely. There are a fixedσ-algebraF on Ω and a fixedσ-algebra A0 onX and it is assumed thatX
0 is measurable with respect to these σ-algebras. X0 is calledrandom variable.
Next, it is assumed that an unspecified event ω has randomly happened which, by (de- terministic) physical principles, has led to the observation
x0 = X0(ω)
The events ω ∈ Ω are distributed according to a (precise) distribution U or a (precise) distribution Uθ on (Ω,F) where θ is an unknown parameter.
Let A0 ∈ A0 be a measurable subset of X. Then, the probability that the observation x 0 lies in A0 is equal to Uθ {ω ∈Ω| X0(ω)∈A0}
That is, x0 is distributed according to the precise probability measure Pθ0 : A0 → [0,1], A0 7→ Uθ
{ω∈Ω| X0(ω)∈A0}
(6.2)
This defines a (precise) statistical model (Pθ0)θ∈Θ for the observation x0. Pθ0 defined by
(6.2) is called image measureof Uθ under X0 and is denoted by Pθ0 =X0(Uθ) .
A whole set of observations/data x1, . . . , xn, is modeled via several random variables
Xi : Ω → X, i∈ {1, . . . , n}
Accordingly, it is assumed that the (unspecified) event ω ∈ Ω has led to the observa- tions/data
x1 = X1(ω), . . . , xn = Xn(ω)
The random variables
are calledindependent identically distributedwith respect toUθif their joint image measure
is equal to the product of the single image measures and these image measures coincide: X1 · · Xn Uθ = X1(Uθ)⊗ · · · ⊗Xn(Uθ) = Pθ0 ⊗n
Now, let us turn over to imprecise probabilities again: Due to our sensitivity analyst’s point of view, it is assumed in the imprecise probability setup that there is a coherent upper prevision Uθ and the distribution Uθ of the events ω ∈ Ω is unknown and can be
any element of the credal set Uθ of Uθ.
Analogously to the ordinary image measure, we can define the image of a coherent upper prevision:
Definition 6.1 The upper coherent prevision P0θ on L∞(X,A0) which corresponds to the
credal set
M0θ = X0(Uθ)
Uθ ∈ Uθ (6.3)
is called image ofUθ under X and is denoted by
P0θ = X(Uθ)
Lemma 6.2 below shows that this is defined well. That is, the image of a coherent upper prevision is again a coherent upper prevision. This provides a nice generalization of classical probability theory which is based on the fact that the image of a probability measure is again a probability measure.
In this way, we get an imprecise model (P0θ)θ∈Θ. Since Uθ is any element of the credal
set Uθ, the distribution of the observation x0 modeled by the random variable X0 is any element of the credal set M0
θ. The essential difference to the precise setting is the that,
given θ, the true Uθ ∈ Uθ and, accordingly, the true Pθ0 ∈ M
0
θ are totally unknown.
Lemma 6.2 Mθ defined by (6.3) is a credal set on (X,A0).
Proof: The map
ξ : ba(Ω,F) 7→ ba(X,A0), ν 7→ ξ(ν) defined by
ξ(ν)(A0) = ν X0−1(A0)
is linear and continuous with respect to the L∞(Ω,F) - topology on ba(Ω,F) and the L∞(X,A0) - topology on ba(X,A0) . Together with
ξba+1(Ω,F) ⊂ ba+1(X,A0)
this implies that ξ(Uθ) is a convex and L∞(X,A0) - compact subset of ba+1(X,A0) . Ac- cording to Corollary 2.16, ξ(Uθ) is a credal set and the definitions imply
ξ(Uθ) = Mθ
Just as in the precise case, it is assumed that the random variables Xi : Ω → Xi, i∈ {1, . . . , n}
are independent identically distributed. That is, the joint distribution of observations is equal to X1 · · Xn Uθ = X1(Uθ)⊗ · · · ⊗Xn(Uθ) = Pθ0 ⊗n
Since Uθ may be any element of Uθ, the distribution of the vector x = (x1, . . . , xn) con-
taining all observations may be any element of
Nθ0 := Pθ0⊗n Pθ0 ∈ M0θ
This set of product probabilities defines a coherent upper prevision P0 ⊗θ n : L∞ Xn,A0⊗n → R, g0 7→ sup P0 θ ⊗n∈N0 θ Pθ0⊗n[g]
According to Proposition 2.15, the credal set of this coherent upper prevision is equal to c`co
Pθ0⊗n Pθ0 ∈ M0θ
so that, in fact, we end up with the usual type-2 product of coherent upper previsions again.
Note that the credal sets M0
θ may also contain probability charges which are not σ-
additive. Products of probability charges such as Pθ0⊗n are defined according to (K¨onig, 1997, Proposition 20.4). However, these products are not defined on the productσ-algebra A0 ⊗n but on the (usually) smaller product algebra denoted by A0⊗ˆn. This is the smallest
algebra on Xn which contains all rectangles
A01× . . . ×A0n ⊂ Xn where A01, . . . , An ∈ A0
That is,P0 ⊗θ n is defined on the product algebraA0⊗ˆnat first. Next,P0 ⊗n
θ can be extended
to a coherent upper prevision on the usual product σ-algebra A0 ⊗n by natural extension.
6.2.2
Discretizations in estimation problems
As argued in Subsection 5.4.1, discretizing the parameter space Θ may be considered as part of modeling in estimation problems because coarsening Θ also means to change the purpose of the estimation problem and this change of the purpose is desirable from the point of view of the theory of imprecise probabilities; confer Subsection 5.4.1.
Modelers will nevertheless often produce an infinite parameter space Θ . Therefore, an ad hoc method for discretizing Θ is developed in the following:
Let Θ be any index set and (P0θ)θ∈Θ be an imprecise model on a sample space (X,A0) . For every θ ∈Θ , letM0θ be the credal set of P0θ on (X,A0) .
In order to discretize Θ, let
be a finite partition of Θ . Now, the parameter set in our estimation problem is H and we want to estimate the trueH ∈ H. This is the setH ∈ Hin which the true parameter θ lies. That is, we do not want to discriminate between different elements θ1 and θ2 of one H any more. In this sense, the estimation problem gets coarser. The (upper) risk function depending on H ∈ H is canonically defined by
H → R, H 7→ sup θ∈H sup Pθ0∈M0 θ Z X Z Θ Wθ(ˆθ)τx(dθ)ˆ Pθ0(dx) (6.4)
where (Wθ)θ∈Θ⊂ L∞(Θ,2Θ) is a loss function andτ is a (randomized) decision function, i.e. an estimator. Since we do not want to discriminate between different elements θ1 and θ2 of one H, it is natural to choose a loss function which does only depend onH and not on the specific θ; that is, we have a loss function
(WH)H∈H ⊂ L∞(H,2H)
Furthermore, the decision space changes from Θ to H and the risk function becomes H → R, H 7→ sup θ∈H sup Pθ0∈M0 θ Z X Z H WH( ˆH)τx(dH)ˆ Pθ0(dx) (6.5)
for an estimator τ. Next, put
M0H := c`co [
θ∈H
M0θ , ∀H ∈ H where c`co denotes the convex L∞(X,A0) - closure. That is, M0
H is the credal set of the
coherent upper prevision P0H defined by
P0H : L∞(X,A0) → R, f 7→ sup
θ∈H
P0θ[f] According to Lemma 8.29, the risk function defined by (6.5) is equal to
H → R, H 7→ sup P0 H∈M0H Z X Z H WH( ˆH)τx(dH)ˆ Pθ0(dx) (6.6)
and this function exactly coincides with the usual risk function defined in Section 3.2 if (P0H)H∈H is our imprecise model. That is, discretizing Θ naturally leads to the imprecise model (P0H)H∈H, where H is a finite index set.
Of course, a thoughtless application of this discretization may lead to very bad results. This is because discretizing Θ means that we do not want to discriminate between different elementsθ1 andθ2 of oneH and, therefore, it is crucial to choose a sensible partition of Θ in order to get sensible results – the more since choosing a partition of Θ means choosing the statistical purpose.
So far, this method can be justified well. However, problems arise in applications since it is a necessary assumption for the applications presented in the present book that credal sets are given by a finite number of restrictions; cf. e.g. (5.30). However, even if there is a finite set K ⊂ L∞(X,A0) such that
M0θ = Pθ0 ∈ba+1(X,A0) Pθ0[f]≤P 0
it does not seem to be clear if assumption (5.30) is fulfilled for M0
H which would be
necessary to successfully work with M0
H in our applications. An ad hoc solution of this
problem is to use the credal set ˆ
M0H = PH0 ∈ba+1(X,A0) PH0 [f]≤P 0
H[f] ∀f ∈ K
as an “approximation” of M0
θ. It is easy to see that
M0θ ⊂ Mˆ0H
After that, (X,A0) may be discretized according to Subsection 5.4.2 where the index set is given by H.