coefficient of variation (CV) and coefficient of consistency for small area estimators. The coefficient of variation was obtained using the bootstrap technique analogous to the method proposed by P.J.McCarthy and C.B.Snowden (1985). The coefficient of consistency for synthetic versus direct estimator was obtained using the following formula:
ws ws ws ws t t x z = − (4.1) where xwsis obtained according to (3.1) and twsis the direct estimator.
Similarly — for composite estimator
ws ws ws ws ws z t t y u = − =0.5 (4.2) where ywsis obtained analogously to (3.3) and twsis the direct estimator. Because of the dependence uws=0.5 zws only the coefficient of consistency between direct and synthetic, and between composite and synthetic was investigated.
In earlier the comparison of performance of different small area estimators was made. The distribution of CV’s (presented both in graphs and in deciles tables) show, that the synthetic estimator has the best precision, the composite estimator has the intermediate precision. The direct estimator, as it was expected, has the worst performance. Moreover, the efficiency of such estimates is better, when the considered small area was larger (for regions), what can be easily explained, since the sample size for regions is much larger than for counties. However, since of the bias of synthetic estimates, it is probably valid, that accuracy of composite estimator may be better, than for synthetic estimator. The distribution of CV’s for regions and subregions shows distinctively the right asymmetry, practically in every considered situation.
The analysis of coefficient of consistency shows, that consistency of estimates is poor for unemployment estimates. It is mainly because the size of
such population is relatively small with comparison to other groups (working, and non-active persons). The distribution of such coefficient in all cases is almost symmetric (seldom shows right asymmetry). The concentration of consistency coefficients for synthetic versus composite distribution is larger than for synthetic versus direct, what means, that particular deciles is larger (mostly two times) for synthetic vs. composite than for synthetic versus direct.
In both discussed papers (Bracha et al. (2003, 2004)) the results are presented for regions (voivodship), subregions and counties (poviats). However, the accuracy of results obtained using direct, synthetic and composite estimators is limited particularly because of not acceptable precision (like in case of direct estimator) or significant bias (in a case of synthetic estimator). Also, for some counties (poviats), there are no observed data, or (mostly for poviats, that has less than 10 PSU selected) there is too few data to make credible estimates of most parameters. Here the model approach can be applied, for example using empirical and hierarchical Bayes method.
The quality of such estimates is connected with the size of particular unit (i.e. county) and also quality of used model. The results presented in second paper (published in 2004) reveals, that despite relatively better precision in most cases for EB estimates, than for direct estimates, the CV characteristics (most CV obtained for synthetic estimates is smaller than for EB estimator) are better for synthetic estimates. The distribution of CV shows strong right asymmetry, and almost 75% of values belong to the first two class intervals.
The results of HB estimation shows, that the precision for such estimates has slightly less efficiency, than for EB estimators. Similarly — the distribution of estimates is highly skewed, with strong right asymmetry. However, as Bracha et al. (2004) pointed out, the characteristics of such estimates may depend on assumption of distribution type (and particularly — the parameters of such distribution), and also implementation of MCMC procedure used by software, that make the estimates. Author of this paper also confirmed such behavior. Using different initial parameters for model and a-priori estimates, that has different quality (for example — obtained using direct and synthetic estimation), it can be showed experimentally that such selection has significant impact on quality of estimates. Such estimates were done for counties in łódzkie region. The model uses PLFS 2003 estimates from the Bracha, et al. (2004) paper together with data from administrative sources, that is available at Polish Public Statistics web pages. It will be interesting to compare similar estimates that are based on Census 2002 data, what may reveal usefulness census vs. administrative data. Such comparison may also reveal the accuracy of model, that uses Census explanatory variables and current data from administrative sources or statistical reports
Below is presented the distribution of empirical and hierarchical Bayes estimates made by Bracha et .al (2004)
Figure 1. Distribution of coefficient of variation for PLFS estimates of number of unemployed using data from 2003 year estimated by empirical Bayes procedure
Figure 2. Distribution of coefficient of variation distribution for PLFS estimates of number of unemployed using data from 2003 year estimated by hierarchical Bayes procedure
The results presented in Tables 1 and 2 concern the estimates of unemployment obtained using three different estimation methods. These results show, that – practically in every case, that the model approach effects better precision. However the comparison of empirical and hierarchical Bayes estimators is not straightforward. For most units, the HB approach is better, than EB approach, with exception for city of Łódź, where these relationships are different. Similar findings were presented in earlier work of author (Kubacki 2004), however that results was based on different method of variance estimation, that uses random group technique, what may cause less stable variance estimate.
The precision of estimates presented in Table 1 reveals relatively small variance values for model based estimates. However, still the question is valid, whether such approach reveals true nature of investigated population variability. These estimates may be helpful in situation, where size of the sample is relatively small, but if they will be used for describing the precision of survey, such assessment may be misleading.
Table 1. Comparison of unemployment estimates and theirs precision from PLFS using direct, empirical and hierarchical Bayes estimation for unemployment model using data from the 2003 PLFS for lodzkie voivodship (direct a priori estimates)
Unemployment estimates from Coefficient of variation Direct est. EB est. HB est. Direct est EB est. HB est. County ‘000 %% Bełchatów 11.8 11.7 11.2 13.8 10.9 11.1 Brzeziny 0.8 1.5 2.2 63.7 28.8 15.7 Kutno 7.0 8.4 9.1 24.4 13.1 8.9 Łask 8.7 5.9 4.3 36.8 12.6 7.0 Łęczyca 5.4 4.3 3.2 21.5 14.5 11.8 Łowicz 6.1 6.5 6.4 23.5 12.6 8.3 Łódź-wschód 6.3 6.8 6.0 18.7 8.0 5.1 Opoczno 13.0 12.2 10.9 25.9 10.3 7.2 Pabianice 11.2 12.7 11.5 17.9 6.4 5.4 Pajęczno 9.1 4.8 3.5 16.4 11.1 9.0 Piotrków 6.3 6.7 7.0 25.1 13.8 9.4 Poddębice 2.6 4.3 3.5 49 16.3 9.3 Radomsko 11.7 12.0 11.8 16.1 7.0 5.9 Rawa Maz. 2.2 3.3 3.0 38.9 15.8 9.7 Sieradz 4.8 6.5 8.4 43.5 18.8 10.2 Skierniewice 2.0 2.4 1.7 35.5 23.3 21.9 Tomaszów M. 11.0 11.0 10.9 16.7 9.0 7.2 Wieluń 6.4 5.1 4.7 27.3 15.8 10.6 Wieruszów 5.8 3.5 2.0 25.8 17.9 15.2
Zduńska Wola 6.0 3.4 2.6 38.3 18.2 14.3
Zgierz 20.1 18.8 18.1 12.9 6.6 5.4 City of Łódź 66.0 65.8 65.5 8.4 3.2 7.6 City of Piotrków Trybunalski 2.6 2.8 2.8 24.8 20.9 17.2 City of Skierniewice 2.3 3.6 3.0 51.4 21.1 15.4
Source: own calculations.
The more detail analysis of results presented above, shows also, that there is a dependency between the size of region (measured by size of the working population) and value of variance reduction. Such dependency was also found for data from Table 1, where positive reduction of variance for EB method relative to HB method was observed for City of Łódź. Such result, however, may not be the rule. It can be treated as a research proposal and analyzing that dependence may reveal the nature of both empirical and hierarchical estimators.
Figure 3. Coefficient of variation for PLFS estimates of number of unemployed using data from 2003 year for counties in łódzkie region obtained using direct, empirical Bayes (EB) and hierarchical Bayes (HB) estimator
Table 2. Coefficient of variation reduction
(CV
HB−CV
EB)
/CV
EB for estimates using empirical (EB) and hierarchical (HB) Bayes estimationCoefficient of variation variation reduction Coefficient of direct
estimator estimator EB estimator HB (CVHB.−CVEB)/CVEB Region (voivodship) % % Dolnośląskie 6.0 2,7 2,6 -3,8 Kujawsko-pomorskie 6.9 2,2 2,0 -9,1 Lubelskie 7.4 3,9 3,0 -23,1 Lubuskie 7.2 4,1 3,4 -17,1 Łódzkie 5.7 2,9 2,8 -3,5 Małopolskie 7.0 3,3 3,5 6,1 Mazowieckie 7.8 3 4,2 40,0 Opolskie 9.6 8,2 7,0 -14,7 Podkarpackie 6.6 3,1 3,0 -3,3 Podlaskie 10.9 6,7 4,5 -32,9 Pomorskie 7.3 2,8 2,3 -17,9
Coefficient of variation variation reduction Coefficient of direct
estimator estimator EB estimator HB (CVHB.−CVEB)/CVEB Region (voivodship) % % Śląskie 5.8 3 3,8 26,7 Świętokrzyskie 8.2 3,8 2,8 -26,4 Warmińsko-mazursk. 7.3 3,2 2,9 -9,4 Wielkopolskie 6.8 3,2 3,5 9,4 Zachodnio-pomor. 6.3 3 2,7 -10
Figure 4. Dependency of coefficient of variation on size of region population for LFS estimates of number of unemployed using data from 4th quarter 2002 year for polish regions obtained using direct, empirical Bayes (EB) and hierarchical Bayes (HB) estimator
It can be mentioned also here, that the comparison of that two methods is not obvious. Such conclusion can be found for example in recent Sinha and Ghosh (2004) paper, that was presented at Ims/Asa’s Srms Joint Mini Meeting on Current Trends in Survey Sampling and Official Statistics organized between 1 and 3 January, 2004 in Calcutta, India. Similar comparison also can be found in Ghosh and Rao (1994) paper.
The comparison of model for regions, that uses Census 2002 results, shows that, in the situation where precision, for the whole model is better, the EB estimates is slightly more precise, especially for larger regions. This is presented in table below