Conclusions and future work - Araceli Flores-Sánchez Cupriavidus necator H16 and its recombinan

Composite estimation is a statistical estimation proce-dure that combines data from several sources, for example, from different surveys or databases or from different periods of time in the same longitudinal sur-vey. It is difficult to describe the method in general, as there is no limit to the ways one might combine data when various useful sources are available. Com-posite estimation can be used when a survey is con-ducted using a rotating panel design with the goal of producing population estimates for each point or many points in time. If the design incorporates rotat-ing groups, composite estimation can often reduce the variance estimates of level variables (e.g., totals, means, proportions). In addition, composite estimation can reduce the variance estimates of variables dealing with changes over time, depending on the structure of the sample design, the strength of the correlations between group estimates over time, and other factors.

How a Composite Estimator Works In a typical rotation design, the sampled groups are phased in and out of the sample in a regular, defined pattern over time. To estimate the level of a character-istic in the time period designated by t, a simple com-positing strategy is to take a convex combination of the Horvitz-Thompson estimate of level for period t, Y_t^HT1, with a second estimate for period t, Y_t^HT2. The latter estimate might start with the composite estimate for period t − 1, Y_{t − 1}^CE , brought forward by a measure of change from period t − 1 to period t:

Y_t^HT2= Y_{t − 1}^CE + Dt − 1, t:

This measure of change, D_{t − 1, t}, can be a difference (ratio) estimated using data only from the overlapping Composite Estimation 115

rotation groups, which is then added to (multiplied by) the composite estimate for period t − 1. The com-posite estimate then becomes a recursively defined function of data collected in prior time periods:

Y_t^CE= ð1 − kÞYt^HT1+ kYt^HT2, where 0 < k < 1.

Composite estimators can often be expressed as a lin-ear combination of simple estimates—one formed from each rotation group at each period. A few constraints are usually imposed. First, when estimating levels of a variable at time t, one usually requires that (a) the weighting coefficients of the group estimates at time t add to 1, and (b) for each period before t, the coeffi-cients sum to 0. These restrictions ensure that no bias is introduced through the compositing. Second, to main-tain the consistency of estimates, it is customary, at least for statistical agencies, to require that (a) the esti-mate of changes in a variable equal the difference (or ratio, for multiplicative composite estimators) of the appropriate estimates of levels for that variable, and (b) the estimates of components sum to the estimate of the corresponding total.

Composite estimation tries to take advantage of correlations over time. For example, suppose x_{t − 1, g} and xt, g are estimates from the same rotation group, g, for periods t − 1 and t. If, due to sampling variabil-ity, x_{t − 1, g} is below its expected value, then xt, g tends to be as well. By assigning coefficients with opposite signs to the two estimates, one can temper the sam-pling variations while still balancing coefficients to ensure an unbiased estimate overall.

Variances and biases for composite estimators are computed according to the rotating panel design and depend on the variances and correlations of the rotation group estimates, which are often assumed to be nearly stationary over time. Thus, determining an optimal design becomes a problem of choosing the estimator’s coefficients to minimize the expected error function.

However, the problem becomes more complex when one considers the effect of the design on the different variables of interest, and on the several types of esti-mates to be disseminated: levels at specific points in time, changes across time, or averages over time.

Changing the design or the estimators’ coefficients to lower the expected error for a composite estimator of the level for a variable may induce a corresponding increase in the estimator for the change in a variable, and vice versa.

When the survey’s most important estimate is a measure of the change in a variable over consecu-tive periods, a complete sample overlap is often the most efficient, as it makes the greatest use of the cor-relations over time. With a complete overlap, compos-ite estimation with information from prior periods is generally not a consideration. However, for estimating the level at each time period, a partial sample overlap is often the most productive. Due to the constraint of consistency (see earlier discussion), when estimates of level and changes are both required, a compromise design may be used whereby a large fraction of the sample, but not all of the sample, is carried over from one period to the next.

Specific Examples of Composite Estimators

A specific example of a composite estimator is the one used in the Current Population Survey, jointly sponsored by the U.S. Bureau of Labor Statistics and the Census Bureau, to measure the U.S. labor force.

In each month, separate estimates of characteristic totals are obtained from the eight rotation groups. Six of these groups contain households that were inter-viewed the prior month. The composite estimator implemented in 1998 combines the estimates from current and prior months to estimate the number of unemployed using one set of compositing coefficients, and the number of employed using a different set that reflects the higher correlations over time among esti-mates of employed:

Y_t^CE= ð1 − KÞYt^AVG+ KðY_{t − 1}^CE + Dt − 1, tÞ + Abt, where Y_t^AVGis the average of the estimates of total from the eight rotation groups; D_{t − 1, t} is an estimate of change based only on the six rotation groups canvassed at both times t − 1 and t; bt is an adjustment term inserted to reduce the variance of Y_t^CE and the bias aris-ing from panel conditionaris-ing; and (K, A) = (0.4, 0.3) when estimating unemployed, and (0.7, 0.4) when esti-mating employed. For researchers, a problem with com-posite estimates is producing them from public use microdata files, because computing the composite esti-mate for any period generally requires one to composite recursively over a number of past periods. This problem has been addressed for the Current Population Survey, which now produces and releases a set of ‘‘composite weights’’ with each month’s public use file. First, for 116 Composite Estimation

any month, composite estimates are determined for the labor force categories broken down into a number of race and ethnicity subgroups. Then, using these com-posite estimates as controls, the survey weights are raked to guarantee that the corresponding weighted esti-mates agree with the composite controls. The resulting composite weights can then be used to produce com-posite estimates simply by summing over the weights of records with the appropriate characteristics.

In the U.S. monthly surveys of retail and wholesale trade conducted before 1998 by the U.S. Census Bureau, a different rotating panel design led to an interesting set of composite estimators. In each of three consecutive months, one of three rotation groups was canvassed. In month t + 1, businesses in rotation group A provided sales data for the months t and t − 1, yielding estimates x^At and x^A_{t − 1}, respectively. A preliminary composite estimate for month t,

Pt= ð1 − bÞx^At + bPt − 1D_{t − 1, t}, was released, where D_{t − 1, t}= x^At=x

t − 1, and b = 0:75 for the retail survey and 0.65 for the wholesale sur-vey. One month later, firms in rotation group B sup-plied data for months t + 1 and t, providing estimates x^B_{t + 1} and x^B_t, respectively. This led to a final compos-ite estimate for month t,

Ft= ð1 − aÞx^Bt + aP^t,

where a = 0:80 for the retail survey and 0.70 for the wholesale survey and an analogous preliminary esti-mate for month t + 1. The third group was similarly canvassed a month later, and then the sequence was repeated. The difference between the final and prelim-inary composite estimates for month t, Ft− Pt, was called the revision in the estimate. In 1997 this rotat-ing panel design was replaced by a complete sample overlap, due to problems of panel imbalance and dif-ferential response bias (early reporting bias) that led to undesirably large revisions in some months.

Different forms of composite estimators can be used to combine information from a survey and outside sources. In Statistics Canada’s Labour Force Survey, the households in all six rotation groups are interviewed each month, with a new group entering and an old one dropping out each month. In any month, an estimate of total is obtained from each of the six groups. A com-posite regression estimator uses information from the six group estimates, Y_t^AVG; current population controls,

X^POP_t ; and composite regression estimates of the labor force from the prior month, Z_{t − 1}^CR :

Y_t^CR= YtÂVG+ ½ðX^POPt , Z_{t − 1}^CR Þ − ðXtÂVG, Z_tÂVGÞb^CRt , where the superscript AVG denotes an estimate based on data from the current survey period, and b^CR_t is the estimated composite regression parameter for month t.

The estimation procedure guarantees accordance with the population controls, while taking advantage of recent labor force data. Using a different approach, Sta-tistics Netherlands combines responses from demo-graphic surveys and administrative data from social registers through regression estimation and a method called ‘‘repeated weighting’’ in order to reduce the var-iances of the estimators and to maintain numerically consistent tables across all official publications.

Patrick J. Cantwell See also Current Population Survey (CPS); Panel; Panel

Conditioning; Raking; Response Bias; Rotating Panel Design; Variance Estimation

Further Readings

Gambino, J., Kennedy, B., & Singh, M. P. (2001).

Regression composite estimation for the Canadian Labour Force Survey: Evaluation and implementation. Survey Methodology, 27(1), 65–74.

Houbiers, M. (2004). Towards a social statistical database and unified estimates at Statistics Netherlands. Journal of Official Statistics, 20(1), 55–75.

Lent, J., Miller, S., Cantwell, P., & Duff, M. (1999). Effects of composite weights on some estimates from the Current Population Survey. Journal of Official Statistics, 15(3), 431–448.

U.S. Census Bureau. (2006, October). Technical paper 66:

Current population survey, design and methodology.

Retrieved January 25, 2006, from http://www.census.gov/

prod/2006pubs/tp-66.pdf

Wolter, K. M. (1979). Composite estimation in finite populations. Journal of the American Statistical Association, 74, 604–613.

C

OMPREHENSION

Survey researchers, in developing questions, must bear in mind the respondent’s ability to correctly grasp the question and any response categories associated with the question. Comprehension, which is defined in this Comprehension 117

context as a respondent’s ability to accurately understand a question and associated response categories, is crucial to reliable measurement of attitudes and behaviors.

Scholars have identified a number of elements in question wording that can interfere with comprehen-sion: ambiguous language, vague wording, complex sentence structures, and presuppositions about the experiences of the respondent. The consequences of comprehension problems can be severe. If respon-dents’ understanding of the question varies signifi-cantly from one respondent to another, the responses could provide a highly distorted picture of an attitude or behavior at the aggregate level.

Researchers have identified a number of techniques and guidelines to reduce the potential effects of ques-tion wording on comprehension:

1. Use clear, simple language in questions.

2. Use simple question structures, minimizing the number of clauses in a question.

3. Include a screening question if the survey is measur-ing attitudes or behaviors that might be unique to a specific group, and thereby skip all other respon-dents past the measures targeted to that group.

4. Provide definitions or examples in questions that may have terms that are ambiguous or vague.

5. Offer a frame of reference for terms that define a period of time (e.g., ‘‘in the past 7 days’’ as opposed to ‘‘recently’’).

6. Train interviewers to recognize problems with comprehension, and provide the interviewers with a uniform set of definitions and probes to address the problems.

7. Pretest survey questions not only with survey inter-views, but in qualitative settings such as focus groups or in-depth cognitive interviews if resources permit.

Timothy Vercellotti See also Cognitive Aspects of Survey Methodology

(CASM); Cognitive Interviewing; Focus Groups; Pilot Test; Questionnaire Design; Reliability; Response Alternatives

Further Readings

Fowler, F. J., Jr. (1995). Improving survey questions.

Thousand Oaks, CA: Sage.

Schuman, H., & Presser, S. (1981). Questions and answers in attitude surveys: Experiments on question form, wording, and context.New York: Academic Press.

Tourangeau, R., Rips, L. J., & Rasinski, K. (2000). The psychology of survey response.Cambridge, UK:

Cambridge University Press.

Weisberg, H. F., Krosnick, J. A., & Bowen, B. D. (1996).

An introduction to survey research, polling, and data analysis(3rd ed.). Thousand Oaks, CA: Sage.

In document Araceli Flores-Sánchez Cupriavidus necator H16 and its recombinant strain C. necator H16/pMPJAS03 Synthesis of multicomponent polyhydroxyalkanoates from fatty acid sources in (página 144-149)