III. CONTEXTO POLÍTICO 1 Introducción
III.3. b) Presencia de grupos guerrilleros y paramilitares en territorio ecuatoriano
This chapter is concerned with the effect of rounded normal data on the significance level of a test. Many statistical tests could have been investigated. It was decided to investigate test statistics which are frequently used in practice, these being the one sample t-test, the chi-squared test for variance, the two sample t-test, F-test for equality of variances and F-test in the one and two way analysis of variance. Choosing such a selection of tests allowed a wide coverage of the
The main distortion caused by rounding is the discreteness it introduces into the sampling distribution of the test statistic. Although the moments may not be widely affected, the area in the tails of the sampling distribution may be changed. Examining the moments of the test statistic under rounding will indicate the possible effect of rounding. However evaluation of the exact distribution of the sampling distribution of the test statistic under rounding is required for a detailed examination of the possible changes in the tails of the sampling distribution.
The following approaches were used to examine the implications of rounding on the significance level of a test:
(i) Approximations to the sampling moments of the test statistics. These theoretical results have some bearing on the distribution of the test statistics in sampling from rounded normal populations. However they will provide only a rough outline of what characteristics are to be expected when sampling from rounded normal populations, they do not supply answers in numerical terms of the effect of rounding on the significance level of a test. This is why the exact distribution of the sampling distribution of the test statistic is required.
(ii) The exact distribution of the test statistic for rounded data was obtained. By constructing all sample configurations, the exact distribution of the test statistic for rounded data can be obtained. This method was used for small sample sizes. However, it became uneconomical to use this method for large samples and for the analysis of variance.
(iii) The sampling distribution of the test statistic for rounded data was obtained by Monte Carlo methods. Simulation was used where it was impractical in terms of computer time to obtain the exact distribution in
(ii).
Two Fortran programs were written for the necessary analysis. The program EXACT generated every possible sample of size n from a normal population that had been rounded according to a lattice with rounding interval w and lattice position c. The required test statistic was calculated and the percentage of samples where the designated statistic fell about or below the a significance level limits for normal theory conditions was recorded.
The program SIMUL generates N random samples of size n from a normal population which has been rounded to the specific w and c. As in the program EXACT, from each rounded sample the required statistics are calculated and the percentage of samples where the designated statistic fell about or below the a significance level limits for normal theory conditions was recorded. Both the EXACT and SIMUL programs gave the mean and variance of the test statistic for rounded data.
For this study the significance level of the test statistic under rounding was evaluated for values corresponding to the lower and upper 0.1%, 1.0%, 2.5% and 5% points under normal theory conditions, with no rounding. This range of significance levels allowed us to cover one tailed tests at a = 0.001, 0.01, 0.05 and two tailed tests at a = 0.05. Sample sizes from 2 to 25 were considered for the one and two sample t-tests, chi-squared test and F-test. As the sample size increased in size the discreteness in the sampling distribution of the test statistic caused by rounding had less effect. As a result a sample size of 25 was found to give in most situations a good indication of the effect of rounding for larger
For the one and two way analysis of variance various levels of the factor(s) were considered. The degree of precision ranged upto 2 and lattice positions c = -0.5, -0.4, ..., 0.4, 0.5 were used. A value of r beyond 2 is extremely coarse rounding and is impractical in most situations.
The results from the simulation were based on 100,000 iterations. That is, 100,000 values of each test statistic were generated for estimating each significance level under rounding. This number of iterations was necessary for respectable precision, especially for the 0.1% level of significance. Of course the results obtained for the significance levels by simulation are subject to sampling errors. For simulations of 100,000 iterations these will be small. For example, the standard error of our estimates of the significance level will be 6.89(10)-4 for a = 0.05 and 3.15(10)-4 for a = 0.01, by simple binomial calculations.
Quality of Results
Both the EXACT and SIMUL programs were tested to check the validity of their results. For example, an independent check on the results given by SIMUL program was provided by obtaining the significance levels for the test statistics when the normal population was subject to no rounding. They were found to be in very close agreement with the expected results. An independent check of the EXACT program was provided by comparing the results with those obtained manually. Of course this was only possible for small sample sizes (n=2,3). A final check was established by comparing the results for significance levels obtained from both EXACT and SIMUL programs. They were found to be in very close agreement.