• No se han encontrado resultados

III. CONTEXTO POLÍTICO 1 Introducción

III.2. Conflicto colombiano e incremento de la violencia

3.1 Introduction

The underlying theory upon which many statistical tests are based, assumes that the variable or variables sampled are continuous. There is no such thing, in practice, as a continuous variable. It is often expedient for us to consider observations as being rounded from an underlying continuous distribution. To date, there has been very little research into the effect of rounding on a statistical test. This chapter investigates the performance of test statistics under rounding. We will be particularly interested in the degree of precision (r) to which a set of data should be recorded before applying a statistical test. There is considerable vagueness concerning what level of precision should be used. Most statisticians know, for example, that tests of means tend to be robust under departures from normality and that chi-squared and F tests of variance do not. However, they know little about what happens to the significance level and power when the data have been rounded. The reason for this lack of quantitative knowledge has been the absence of a careful accurate study of the effect of rounding on statistical tests. The absence of such a study is primarily due to the following:

(a) the problem of determining the exact distribution of the test statistic for rounded data;

(b) mathematical approximations that have been studied lack accuracy;

(c) Monte Carlo studies in the past required an exorbitant amount of computer time to achieve respectable precision in the results.

Because large computers are now available, studies by Monte Carlo methods offer an excellent approach to investigating the effect of rounding on test statistics. However, to date no one has published a study of this type.

Several authors have considered the problem of rounding and test statistics. Student (1908) gives a most interesting discussion into the possible problems of coarse rounding on statistical procedures. Student's experimental results suggest that the distribution of the single t statistic for rounded and unrounded data will be approximately the same if the sample size is large. Although Student made no detailed study of the performance of the t statistic under rounding, he was the first to point out the possible implications that rounding may have. Fisher (1936) advocated that Sheppard's corrections should be used for the purpose of estimation, but not usually for tests of significance. Eisenhart (1947) pointed out that use of a Sheppard's correction can make the t value imaginary, as the corrected estimate of the variance can be negative. Geddeback (1968) advocated that Sheppard's corrections should be avoided in the analysis of variance. Krutchoff (1967) states "There is no such thing, in practice, as a continuous random variable. It is often expedient for us to consider observations as being rounded from an underlying continuous random variable." He illustrates this point by showing how rounding can cause the F statistic to have a non-zero probability of a zero in the denominator and as such the mean of this statistic will not exist.

Eisenhart (1947) was the first to study in any detail how rounding affects statistical tests. He gave a set of rules, to the problem of how large a sample size n needs to be for a given w for judging the suitability of a particular coarseness of rounding when applying the one sample t-test, chi-squared test for a variance and F-test for equality of two variances. [Details in literature review]. His study has the following limitations. Eisenhart's recommendations were based on the probability of a sample variance obtained from the rounded data being zero. This gives no indication of the performance of the test statistics with respect to level of

samples as large as n equal to 7. In Preece (1982), text book examples of the paired t-test are examined with respect to the degree of precision of data recording. From these examples, he concludes that, for coarse rounding, the value obtained for a paired t-statistic depends crucially both on the rounding interval applied and the position of the rounding grid relative to the origin. As Preece points out, final conclusions cannot be drawn from several examples and further work on the effect of rounding on test statistics is called for. Riley, Bekele and Shrewsbury (1983) adopt a similar approach to Preece in investigating the possible effect of rounding on test statistics. To examine the effect that different degrees of precision have on the analysis of variance, they present several examples where data has been recorded initially to a good degree of precision. For each example the analysis of variance is obtained for various degrees of rounding. From this small set of examples they make some general points about the effect of rounding on the mean squares. The main finding can be summarised as follows. As rounding became more and more severe the mean squares began to behave very erractically. However data could be rounded appreciably before loss of information became significant. With respect to the various recommendations to what degree of precision should be used on rounded data, they concluded that Dyke's rule (Dyke, 1974 ppl63-164) gave a safe degree of precision for every set of data they examined.

The investigations by Preece (1982) and Riley, Berkele and Shrewsbury (1983) have a major limitation. They consisted of looking at the effect of rounding on specific examples. The actual distribution of the test statistic for rounded data was not obtained. As a result no general conclusions could be established about the performance of a test statistic for rounded data. A study which involved the probability distribution of the test statistic under rounding would enable significance

level and power to be considered. Such a study would be of value in supplying answers about how robust specific test statistics are for rounded data. A problem in producing such a study, however, is the very large amount of computer time required. Either one must find a way to find this time or a way of reducing the amount of time required without decreasing the quality of the study. The study undertaken in this chapter does both through the development of purpose written programs which reduce the required computer time to "only a large amount" and through the Polytechnic computer service support via low priority computer use over a long period of time.

The objective of the present extensive study is to precisely quantify the significance level and power of statistical tests on rounded data over many distributions. The study has two sections, namely when the parent population is normal or non-normal. Chapters 3 and 4 respectively considers the significance level and power levels of these tests when data comes from a rounded normal distribution. Chapter 5 deals with both significance level and power for a selection of non-normal rounded distributions.