UNDÉCIMA SESIÓN:
DÉCIMO QUINTA SESIÓN
A parametric test developed by Rosner can be used to detect up to 10 outliers for sample sizes of 25 or more. This test assumes that the data are normally distributed; therefore, it is necessary to perform a test for normality before applying this test. If the data are not normally distributed either transform the data, apply a different test, or consult a statistician. Note that the test assumes that the data without the outlier are normally distributed; therefore, the test for normality may be performed without the suspected outlier. Directions for Rosner's test are contained in Box 4-19 and an example is contained in Box 4-20.
Rosner's test is not as easy to apply as the preceding tests. To apply Rosner's test, first determine an upper limit r0 on the number of outliers (r0# 10), then order the r0 extreme values from most extreme to least extreme. Rosner's test statistic is then based on the sample mean and sample
Box 4-19: Directions for Rosner's Test for Outliers
STEP 1: Let X1, X2, . . . , Xn represent the ordered data points. By inspection, identify the maximum number of possible outliers, r0. Check that the data are normally distributed, using one of the methods of Section 4.2.
STEP 2: Compute the sample mean X¯ , and the sample standard deviation, s, for all the data. Label these values X¯( 0 ) and s( 0 ), respectively. Determine the observation farthest from X¯( 0 ) and label this observation y( 0 ). Delete y( 0 ) from the data and compute the sample mean, labeled
X
¯( 1 ), and the sample standard deviation, labeled s( 1 ). Then determine the observation farthest from
X
¯( 1 ) and label this observation y( 1 ). Delete y( 1 ) and compute
X
¯( 2 ) and s( 2 ). Continue this process until r 0 extreme values have been eliminated.
In summary, after the above process the analyst should have
where [ ¯X(0 ), s( 0 ), y(0 )]; [ ¯X(1 ), s(1 ), y( 1 )]; . . ., [ ¯X(r0&1 )
, s(r0&1 )
, y(r0&1 )
]
and y( i ) is the farthest value from X ¯( i ). ¯ X(i) ' 1 n&i j n&i j'1 xj, s(i) ' [ 1 n&i j n&i j'1 (xj&x¯(i))2]1/2,
(Note, the above formulas for X¯( i ) and s( i ) assume that the data have been renumbered after each observation is deleted.)
STEP 3: To test if there are 'r' outliers in the data, compute: Rr ' and compare Rr
*y(r&1 ) & x¯(r&1 )*
s(r&1 )
to 8r in Table A-5 of Appendix A. If Rr$8r, conclude that there are r outliers.
First, test if there are r0 outliers (compare Rro&1to8ro&1). If not, test if there are r0 - 1 outliers (compare Rr If not, test if there are r0 - 2 outliers, and continue, until either it is
o&1to8ro&1).
EPA QA/G-9 Final
QA00 Version 4 - 31 July 2000
Box 4-20: An Example of Rosner's Test for Outliers
STEP 1: Consider the following 32 data points (in ppm) listed in order from smallest to largest: 2.07, 40.55, 84.15, 88.41, 98.84, 100.54, 115.37, 121.19, 122.08, 125.84, 129.47, 131.90, 149.06, 163.89, 166.77, 171.91, 178.23, 181.64, 185.47, 187.64, 193.73, 199.74, 209.43, 213.29, 223.14, 225.12, 232.72, 233.21, 239.97, 251.12, 275.36, and 395.67.
A normal probability plot of the data shows that there is no reason to suspect that the data (without the suspect outliers) are not normally distributed. In addition, this graph identified four potential outliers: 2.07, 40.55, 275.36, and 395.67. Therefore, Rosner's test will be applied to see if there are 4 or fewer (r0 = 4) outliers.
STEP 2: First the sample mean and sample standard deviation were computed for the entire data set (X¯(0) and s(0)). Using subtraction, it was found that 395.67 was the farthest data point from
X
¯(0), so y(0) = 395.67. Then 395.67 was deleted from the data and the sample mean, X¯(1), and the sample standard deviation, s(1), were computed. Using subtraction, it was found that 2.07 was the farthest value from X
¯(1). This value was then dropped from the data and the process was repeated again on 40.55 to yield X¯(2), s(2), and y(2) and
X
¯(3), s(3), and y(3). These values are summarized below. i X¯( i ) s( i ) y( i )
0 169.923 75.133 395.67 1 162.640 63.872 2.07 2 167.993 57.460 40.55 3 172.387 53.099 275.36
STEP 3: To apply Rosner's test, it is first necessary to test if there are 4 outliers by computing R4 ' *y(3) & x¯(3)*
s(3)
' *275.36 & 172.387*
53.099
' 1.939
and comparing R4 to 84 in Table A-5 of Appendix A with n = 32. Since R4 = 1.939 Þ84 = 2.89, there are not 4 outliers in the data set. Therefore, it will next be tested if there are 3 outliers by computing
R3 ' *y(2) & x¯(2)* s(2)
' *40.55 & 167.993*
57.460
' 2.218
and comparing R3 to 83 in Table A-5 with n = 32. Since R3 = 2.218 Þ83 = 2.91, there are not 3 outliers in the data set. Therefore, it will next be tested if there are 2 outliers by computing
R2 ' *y(1) & x¯(1)* s(1)
' *2.07 & 162.640*
63.872
' 2.514
and comparing R2 to 82 in Table A-5 with n = 32. Since R2 = 2.514 Þ83 = 2.92, there are not 2 outliers in the data set. Therefore, it will next be tested if there is 1 outlier by computing
R1 ' *y(0) & x¯(0)* s(0)
' *395.67 & 169.923*
75.133
' 3.005
and comparing R1 to 81 in Table A-5 with n = 32. Since R1 = 3.005 > 81= 2.94, there is evidence at a 5% significance level that there is 1outlier in the data set. Therefore, observation 355.67 is a statistical outlier and should be further investigated.
standard deviation computed without the r = r0 extreme values. If this test statistic is greater than the critical value given in Table A-5 of Appendix A, there are r0 outliers. Otherwise, the test is performed again without the r = r0 - 1 extreme values. This process is repeated until either Rosner's test statistic is greater than the critical value or r = 0.