1.2 MARCO REFERENCIAL
1.2.6. CARACTERÍSTICAS DE LAS NECESIDADES EDUCATIVAS
As stated above, under the nested error model (3.1)-(3.2), yir|y¯is follows exactly the same distribution as yir|yis and the best predictor of Hij = h(Yij), j ∈ ri, can be expressed asH˜ijB=E[h(Yij)|y¯is]. When the sample selection mechanism is informative, to avoid a bias due to a non-representative sample, the estimation procedure should incorporate the sampling weights. Letwij be the sampling weight of j-th unit within
i-th domain and wi· = Pj∈siwij. We consider the same conditioning idea of the EB estimator, but now we condition on the weighted sample meany¯iw =w−i·1
P
j∈siwijyij
instead of the unweighted sample mean y¯is. Thus, we define the pseudo best (PB) estimator ofHij =h(Yij)as
˜
HijP B(θ) =E[h(Yij)|y¯iw;θ]. (3.13) The PB estimator of the additive area parameterHiis then
˜ HiP B(θ) = 1 Ni X j∈si h(Yij) + X j∈ri ˜ HijP B(θ) . (3.14)
Jiang and Lahiri(2006) used a similar approach in the special case of area means under the nested error model and also in the case of a binary response variable and a logit linking model. Their method is applicable only for area level covariates in the unit level models. For example, when using the area mean vectorX¯i=Ni−1
PNi
i=1xij as area level covariates in the unit level model.
model parametersθ = (β0, σv2, σ2e)0, which need to be estimated. We define the pseudo EB (PEB) predictor as the PB predictor withθ replaced by a consistent estimator. The approach ofPfeffermann and Sverchkov(2007) based on the sample likelihood can be used to find correct maximum likelihood (ML) estimates of the regression parameterβ and of the variancesσv2 andσe2. Alternatively,βcan be estimated using the weighted method of moments used inYou and Rao(2002) and using ML (or REML) estimators of
σv2andσe2.
For an out-of-sample variable Yij,j ∈ri, under the nested error population model (3.1)-(3.2), we have Yij|y¯iw ind. ∼ N(µwij|s, σ2irw|s), (3.15) µwij|s =x0ijβ+γiw(¯yiw −x¯0iwβ), σij2w|s=σ 2 v(1−γiw) +σe2, (3.16) where x¯iw = wi−·1 P j∈siwijxij and γiw = σ 2 v/(σ2v + σe2δ2i), for δ2i = w −2 i· P j∈siw 2 ij. Observe that the mean µwij|s is obtained from µij|s given in (3.7) by replacing the unweighted best predictor˜vis =γis(¯yis−¯x0isβ)of the domain effectviby its weighted version, given byv˜iw =γiw(¯yiw−x¯0iwβ). Even if the conditional distribution (3.15)–(3.16) is obtained assuming that the sample units satisfy the same population model (3.1)–(3.2) (i.e. non-informative sampling), we will see that conditioning on the weighted sample meany¯iw protects against informative sampling.
For the FGT poverty indicators of orderα= 0,1, the PB are given by (3.10) and (3.11) withµij|sandσij2|sreplaced by the weighted versionsµ
w
ij|sandσ
2w
ij|s. For more complex additive parameters, such as the FGT indicators forα >1, we can apply a Monte Carlo procedure to approximate the PEB predictor ofHij =h(Yij)similarly as done for the EB predictor. We generateLreplicates{Yij(`);`= 1, . . . , L}ofYij,j∈ri, from the estimated conditional distribution ofYij|y¯iw given in (3.15)–(3.16), calculateh(Yij(`))for each`and then average over theLreplicates asHˆijP EB =L−1PL
`=1h(Y (`)
ij ).
Similarly as in the Census EB estimator given in (3.12), we define the Census PEB estimator as ˆ HiCP EB = 1 Ni Ni X j=1 ˆ HijP EB. (3.17)
Note that the Census PEB estimator (3.17) is obtained by predicting also the sample valuesHij = h(Yij) as if they were out of sample. Under general sampling designs, in Appendix B we show that, for known θ, the Census PEB estimator FˆαiCP EB of the poverty indicatorFαi, forα= 0,1, is consistent asni→ ∞andNi → ∞, under the joint distribution of the sampling design and the considered model forYij|vi given in (3.1), without any assumption on the distribution ofvi.
For the special case of a domain meanHi = ¯Yi, if βis estimated by the weighted regression estimator βˆw given in You and Rao (2002), the Census PEB estimator of
Hi = ¯Yiequals the pseudo EBLUP ofYou and Rao(2002). Similarly, the PEB estimator obtained from (3.14) tends to the pseudo EBLUP as the domain sampling fraction
fi =ni/Nibecomes small. Thus, for a domain meanY¯i, the Census PEB estimator (and PEB for small domain sampling fraction) preserves the good properties of the pseudo EBLUP: a) design consistency asni becomes large, and b) automatic benchmarking to the survey regression estimator of the overall population total, provided the sampling weights are calibrated to agree with the known population totalwi· = Ni. Stefan et al. (2005) and Verret et al. (2015) showed that the pseudo EBLUP of the area mean Y¯i performs well under informative sampling in terms of bias and mean squared error (MSE) under the model.