CAPÍTULO II: MARCO METODOLÓGICO
2.1 Justificación e interés de la investigación
2.A.1 Standard Errors from a Random Sample o f Individuals
This appendix provides formulae for calculating asymptotic standard errors for the various subgroup poverty indices used section 2.1 All of the following calculations are based on the (5-Method, and build on and extend the work o f Kakwani (1993), Zheng (1993) and Bishop et al. (1995).
Assume that incomes ^re randomly drawn at period t from a population in region r with corresponding individual poverty indices whete j°X y „,,z) = F(y,,,,z)*7(y,.^ <z ) . Consider the following primary indices
1 ^
( y in ,z ) , Aggregate Poverty, ty ,=i
1 ^
= — % /(y,;r e ^ wr ) ’ the Population Share of group ic, and
1 ^
= — ^Pirr(yitr^z)I(yi,r G 0 ^ ) , s Sutgroup Proportional Poverty
A/ ,=i
Contribution.
I{yitr G is an indicator function which takes value 1 if individual i receiving
income y . belongs to k, and 0 otherwise.
All primary indices are estimated as simple sample means, which are well known to be consistent and normally distributed in large samples (by the Central Limit Theorem). Asymptotic inference for the further measures or secondary indices requires first to reformulate them as functions o f the primary indices. By the Slutzky Theorem, the respective point estimates are consistent. By the (5-method, these estimates are normally
distributed in large samples and its standard errors are consistently estimated as outlined below (see e.g. Rao 1973 pp. 388f. for details),
For the following, it is helpful to distinguish between measures that use one sample only, and those that are based on several samples.
a) M easures Based on One Sample
These are60
(2.A.1) Subgroup Poverty
(2.A.2)
(2.A.3)
P I = — =
jSPI = ^Kt ^Kt-\ 61
Poverty Intensity, and
the Change in Poverty Intensity.
While (2.A.3) uses observations from two periods, these are merged into one sample (and only individuals observed in both periods are kept in the merged sample). If merging does not cause systematic sample attrition - i.e., if the frequency at which individuals fall out of the sample when merging is not correlated with the poverty measures to be estimated - the merged sample can also be interpreted as a random sample, thus warranting the application of the procedures that follow.
Denote a secondary index as X ^ , and label the primary indices that make it up For subgroup poverty e.g. X ^ = P ^, X^^ = C„ and X ^ ^ = Q „ . B y the (^method, X ^ ’s standard error is
(2.A.4) f \
\0-5
\ y
^ I drop time or region subscripts where there is no risk o f confusion.
For consistency with the remainder o f the appendix, the characteristic is also denoted as k, rather than
X (as in the main text).
where is an IxS vector
(2.A.5) — { d X f 1 3 X
and Ey is the SxS matrix
(2.A.6) =
VarX^j^
CovX^j^Xr^j^
CovX^j^X2j( VarXI K COVX^frX CovXi k^ sk VarXS K /The elements of E^ can be calculated from standard large-sample variance and covariance estimates. For Subgroup Poverty (see also Bishop et al. 1995), for example,
and
(2.A.6a) E ; , ( P J =
^ I c , Æ - c , e , ê . ( i - ô , )
b) M ulti-Sam ple M easures
Matters are slightly more complicated for measures that are based on several samples, such as the Difference in Differences index
(2. A.7) AA,_, = (in - In P /,,, ) - (in - In )
_ -
111 QK,t+t,r\
- In Pt+t,r\)“ (l" ^K,t,r\
~1"
ôr./.rl~ P, )]-
[(in C^,r+f.r2 - In < 2 r , , + / . r 2 “ In P,_^2 ) - (in “ In ô^.r.rz - In Pt,ri )]
Using panel data, some (but not all) individuals that appear in the r-sample will typically also appear in the r+r-sample. If their primary indices are correlated over time, the
samples will therefore be partly dependent. This needs to taken into account when calculating the standard error.
Formally, consider a multi-sample measure which is made up from o f S+T primary
indices where the first S primary indices stem
from subsample I and the other T primary indices stem from subsample Subsample I has size N l, subsample 2 size N2, and M individuals appear in both subsamples. Using a result in Zheng (1998) and extending it to subgroup indices, X ^ ’s standard error is consistently estimated as
(2.A.8) S E{ X^ ) = { D \ Z \ D \ f , with
(2.A.9) D K — {^dXIdX^j ^ . . . d X I d X o X ^ l o X - ^ f r j j ...oXfrjoXjj^jj^ and
(2. A. 10) = VarX^^ / VI CovX-^^jX^f,j / V I CovX^f^X^f^ I N \ " VarX^Ki / CovX^^iX^^„ *
*(%/lV2)
V a r X . ^ J N l C o v X X j f ^ j i * ^ 1V 2) IKU ^TKII / ^ 2 CovXj^, * {^/nIV2) CovX^fj Xj^, * 2) CovX^^j Xjui / V2 VarX-r^,, / V2Note that the elements o f the shift-share decomposition are multi-sample measures too, and could in principle be dealt with analogously. However, the more detailed the decomposition is, and the more subgroups it therefore contains, the more primary indices are involved in the computation, thus inflating the variance-covariance matrix X*. It can be shown that its number of unique elements E (note that X* is symmetric)
relates to the number o f subgroups X as E = 2K^ + K . For more than 3 or 4 subgroups, applying the ^m ethod becomes a tedious exercise. Alternatively, one may bootstrap
62
To keep the notation tractable I refer to the two-sample case only, whereas the difference in differences measure uses primary indices from 4 samples. The extension is straightforward.
standard errors. This is what I do throughout section 2.2, applying the same bootstrapping algorithm as in chapter 1.
2.A.2 Accounting for Clustering at the Household Level
The formulae derived above are based on the assumption of a random sample of individuals. However, all poverty indices used in chapter 2 are calculated from disposable household income, such that poverty within a household is identical. To account for this feature (which often ignored in the applied poverty measurement literature), 1 use the following standard approximation (see e.g. Kish (1965) pp.
(2.A.11) S £ ,(X ) = S £ ,(X )* [l + ( A - l ) p f ’ , w h e r e A = - ^ ,
(2.A.11) allows for clustering of observations. Hi is the number of clustered observations per individual i (including i herself), p is the intra-cluster correlation coefficient and measures the „similarity“ o f observations within a cluster. In a standard clustering problem p needs to be estimated from the data. For disposable income within the household, however, p = l by construction, hence (2.A .11) simplifies to
(l.A .lla ) =
This suggests a convenient two-step procedure: calculating standard errors as in 2.A.1, and then multiplying them by the square root of A, the average number of clustered observations. For subgroup measures, the adequate definition of A requires some care: If the subgroup is defined with respect to household characteristics, then A is the average number of household members per individual. If the group is defined with respect to individual characteristics, A is the average number o f household members per individual with the same characteristic.^"^
For the bootstrapped standard errors used in section 2.2, this problem is solved by redrawing clusters (as in chapter 1).
^ (2.A 11) holds exactly when the clusters have the same size. Here, household size of course differs. For aggregate poverty, however, where (2.A. 11) as well as exact large sample standard errors can easily be
For multi-period indices, one must take into account that some individuals may change households from period t to t+ r. For those individuals, H. is not well defined. One may bracket H -, however, between the number o f individuals that share Vs household in both periods (including i herself), and the size of the larger o f both households. As a practical matter, both standard errors never deviated from one another by more than 3 percent. To be on the safe side, the paper reports the large-household based standard errors.
calculated (using the formula in Deaton 1994, p. 54), (2.A .11) deviates in no case from the exact standard error by more than 5 percent, with an average difference o f 2 percent. Furthermore, the deviations show no systematic pattern. Given that inference derived with the 6-method gives only large sample approximations anyway, this can be considered as fairly satisfying.