Los efectos de los sistemas tributarios sobre la distribución del ingreso

C. Un bosquejo de la situación tributaria de América Latina

3. Los efectos de los sistemas tributarios sobre la distribución del ingreso

The aim of coding is to transform open-ended, textual information into categories that can be used in data analysis. In the business survey field, the commonly used coding classification is NACE Rev. 1, but in the UK this is replaced with the comparable Standard Industrial Classification 1992 (CSO, 1992).

A major use of coding in business surveys is on the business register. In the UK, businesses provide a description of their activity, which needs to be coded according to the Standard Industrial Classification. In some business surveys open-ended descriptions, for example of commodities, are required that need to be coded according to a product classification.

The accuracy of coding is heavily dependent on the skills of coders, so there is the potential for introducing both bias and variance during the coding process.

Coding has two stages:

• the development of a classification or coding frame. This coding frame is known as a

nomenclature or dictionary and is accompanied by a set of coding instructions. Nomenclatures need to be frequently revised so that they represent the full range of possible categories;

• written or verbal responses to survey questions are coded into categories. This coding

may be:

Ø strictly manual where the human coder looks up the codes in the dictionary;

Ø computer assisted where responses are available in electronic form or typed into a computer and some purpose-written software suggests a range of possible codes. The human coder either selects one of these codes or edits the verbal description and asks the computer to suggest further possible codes;

Ø completely automated. In completely automated coding the survey responses are available in electronic form or entered into a computer and the computer software allocates the code.

7.6.1 Measuring coding error

The impact of different coders on data quality can be assessed in terms of consistency (or reliability) and accuracy compared to a standard.

7.6.1.1 Consistency

A consistent coding system will give the same code for items in the same category. Computer automated systems are by definition completely consistent since given the same description of a category they will allocate the same code.

Different human coders implement coding rules differently, whether consciously or subconsciously, so they may allocate different codes to the same job description.

The consistency of coding systems can be measured by asking a set of different coders to code a common list of job descriptions and calculating the proportion of all paired comparisons of codes where the coders agree (Kalton & Stowell, 1979).

7.6.1.2 Accuracy

Although automated systems are completely consistent they have another less desirable feature: they may not allocate the best code to a description, that is, the code may not be an accurate one. Automated coding systems rely on the matching of text strings; if the matching is not exact then the assignment of codes may not be accurate. The accuracy of codes can be measured by comparing codes allocated by standard coders with those allocated by an expert coder, who is presumed to be infallible.

7.6.1.3 The impact of coder error on the variance of survey estimates

In manual coding and computer assisted coding different coders may allocate different codes to the same description. In particular each individual coder may unconsciously over-allocate businesses to some codes and under-allocate them to others. This is known as correlated coder error. The errors in the codes allocated by a particular coder may lead to bias in the estimate of the proportion of businesses in a given industry group for industries coded by that coder. However since for many surveys coding is shared over a number of coders, if the errors made by coders are different the impact of these individual biases on the final survey estimates may cancel out. In this case although the final survey estimates may not be biased the variance of the estimates will be increased. The overall bias is reduced as the number of different coders increases, so in some surveys the code list is provided with or as part of the questionnaire, so that each respondent codes their own answer. This minimises correlated coder error at the expense of a potential increase in measurement error (see chapter 6).

7.6.1.4 The risk of coder error introducing bias in survey estimates

Bias will be introduced into survey estimates if at least some coders systematically assign incorrect codes to certain occupations. One scenario where this may occur is in computer assisted coding where the computer suggests a preferred code which the coder may accept or reject. If there is a tendency for coders to accept the suggested code even when it is incorrect then the coding error may introduce bias into the survey estimates (Bushnell 1996).

7.6.2 Minimising coding error

The impact of coder error on data quality can be minimised by:

• well designed, up-to-date coding systems;

• in manual and computer-assisted coding systems, coders need to be supervised and the

quality of their coding checked regularly. In some cases coders may be unsure which code to allocate and these queries will need to be referred to supervisors and in some cases researchers for reconciliation;

• some surveys (or more localised experiments) code information more than once using

different coders and compare the resulting classifications to help resolve cases where there is some doubt as to the true code.

Useful references on coder error include Lyberg & Kasprzyk (1997).

In document La política fiscal para el afianzamiento de las democracias en América Latina (página 84-87)