Derechos y obligaciones de las partes

A popular way of analyzing intermediate size data sets visually is by rank probability statistics. Since all test results represent individual random probabilities, they follow the rules of order statistics.

When test results are ordered by rank based on e.g. toughness, they can be designated rank probabilities, which describe the cumulative probability distribution. The rank probability estimates are not real measured values, but estimates of the cumulative probability based on order statistics. Each data point corresponds to a certain cumulative failure probability with a certain confidence. This can be expressed in a mathematical form based on the binomial. The equation for the probability distribution of individual rank probability estimates can be expressed in the form of Equation (1).

(

) (

)

(

)

n j 1 j 1

conf rank rank

j 1 n! P 1 P 1 P j 1 ! n j 1 ! − + − = = − ⋅ ⋅ − − ⋅ − +

∑

(1)

Pconf is the probability that the rank estimate corresponds to the cumulative probability Prank, n is the number of points and i is the rank number. Equation (1) can be used to calculate e.g. rank confidence estimates. The estimation requires the solving of Prank for a specific Pconf and this makes the estimation somewhat cumbersome. An example of the rank probability estimate based on Equation (1) is presented in Figure 5 for a data set consisting of 10 values. The estimates corresponding to 5%, 50% and 95% confidence levels are plotted. The figure shows well the degree of uncertainty in the rank probability for small data sets. Due to the slight inconvenience in using Equation (1), people usually prefer to use simple approximations of the median (Pconf = 0.5) or the mean rank probability estimate. The most accurate analytical simple median rank estimate has the form given in Equation (2).

0 2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0 P_conf = 50 % Pconf = 5 % Pra n k i n = 10 Pconf = 9 5 %

Figure 5. Example of rank probability estimates based on Equation (1).

rank i 0.3 P n 0.4 − = + (2)

The median rank corresponds to Pconf = 0.5 in Equation (1) and therefore it is well suited to describe the median estimate combined with confidence bands. Other rank probability estimates like the one for the mean do not correspond to a specific constant Pconf value.

3.1 Censored rank probability

Equation (2) can only be used, as such, for data sets were all results correspond to failure. It can also be used with data sets where all values above a certain value has been censored e.g. due to non-failure or exceeding the measuring capacity limit, but in this case the data set size, n, must refer to the total data set including the censored data.

If the data set contains non-censored failure results at higher values than the lowest censored value, a method of random censoring (often called the suspended items concept) is needed. In this case the order number, i, in the rank estimation do not remain an integer. The increment in the order number, Δij, after censoring has the form of Equation (3a). This increment is used on all failures following a censored value until another censored value is reached. Then a new increment is calculated. This rank probability estimate for censored data sets refers only to the median rank estimate and it should not be used with other expressions.

(

)

j 1 j n 1 i i n 2 j − + − Δ = + − (3a)

Instead of estimating the increment, the censoring expression can also be expressed directly in the form of the effective order number in the form of Equation (3b). The censoring parameter δj is zero for censored data and one for non-censored data. Even though Equation (3b) is used on all values, only the non-censored values may be used in the resulting analysis.

(

)

j 1

(

)

j j j n 1 j i n 1 i n 1 j − + − ⋅ + + ⋅δ = + − + δ (3b)

An example of the use of Equations (3a and b) is given in Figure 6. It shows an exemplary data set containing 20 random numbers ordered by rank. Thus all results in this column are “valid”. The second data set shows the data after application of a random censoring criterion so that part of the data has now been censored. If the probability in the sensoring set was less than the probability in the first set, the probability in the second set was taken and the data was denoted as “censored”. The combined data was ordered by rank and Equation (3a) was used to calculate the value i+Δi, which was then used to estimate the median rank probability estimate referring to Equation (2). Before being affected by censoring, both estimates are of course identical. After censoring takes effect, the estimates deviate, but the deviation is only the effect of the uncertainty related to the rank probability itself. The trend in both data sets remains the same. Equation (3b) results in an identical estimate as Equation (3a) so it is up to the user to decide the preferred form. It is important to note that Equations (3a) and (3b) should only be used together with Equation (2), not other simplified rank probability estimates.

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

All valid data n = 20

Randomly censored data r = 14

Pra

P_true

Figure 6. Comparison of rank probability estimates for all valid data and censored data.

3.2 Confidence bounds on rank probability

One disadvantage of the theoretical expression for confidence bounds, based on binomial theory, is that it can be applied only to specific data points not allowing for extrapolation to other probabilities outside the data point values. This reduces the applicability of the theoretical expression for graphic presentations. In order to achieve a good graphic presentation one must use some analytical approximations of the rank probabilities that can be extrapolated outside the data points. The analytical approximation equation for the median rank estimate provides a continuous presentation, but it does not provide any information about the confidence limits of the probability estimate. These limits have to be obtained from other expressions. Based on a numerical analysis of the theoretical rank estimate, an approximate analytical expression, for the 90% confidence limits, has been developed. The 90% confidence bounds, corresponding to lower 5% and upper 95% confidence limits, for the rank estimate can be obtained by solving the following set of equations for P0.05-0.95 (Equations 4–7). It is important to note that only uncensored data affects the accuracy of the estimate. Thus, for censored data sets, n should be replaced by the number of uncensored data, denoted here as r.

(

)

0.5 0.05 0.95 e e

P P A P P

(

)

e 0.5 0.5 0.05 0.95 1 P P P P 2 − = − ⋅ − (5) For Pe ≤ 0.5: e e min 1.162 0.342 / n For P 0.5 : A 0.82 P ln(n) − ≤ = + ⋅ (6) For Pe > 0.5: e

(

)

e min 1.162 0.342 / n For P 0.5 : A 0.82 1 P ln(n) − > = + − ⋅ (7)

In the analytical expression the positive sign corresponds to the upper 0.95 reliability level and the negative sign corresponds to the lower 0.05 probability level. The set of equations can be expressed either in the form of a second or a fourth order equation, depending on the value of Pe. In the cases where a fourth order equation is obtained, iteration of the value for P0.05-0.95 is recommended. The result of the second order equation can be used as seed for the iteration, thus making the required number of iteration repetitions quite small.

Rank order statistics is a powerful of visualizing data as long as the confidence bounds are included in the figure. The confidence bounds reveal the uncertainty of the data set with respect to the probability distribution itself. One should remember that any distribution falling inside the rank probability confidence bounds is a possible candidate for the real distribution. The rank probability confidence bounds can thus be used to determine confidence limits to the parameters used to describe the probability distribution. They may also prevent people from making too lengthy conclusions about the significance of their data set.

3.3 Distribution comparison

The use of the rank ordering for a direct distribution comparison is very simple. If for two test cases (size, geometry, etc.) an equal amount of all valid test results exist, the comparison is as simple as shown in Figure 7. The results are ordered by rank and data values for matching ranks are compared directly. If the two cases contain an unequal number of results, or censored values, Equations (2) and (3) must be used to give the rank probabilities. Since this will lead to non- matching rank probabilities for the two cases, the smaller data set should be taken as basis and, by using interpolation, estimate a matching value from the larger data set. A suitable simple linear interpolation can be performed with Equation (8), where KM is the matching toughness corresponding to rank

probability PM for the smaller data set. KL is the closest lower toughness value corresponding to rank probability PL for the larger data set and KH is the closest higher toughness corresponding to PH.

In document Contrato estimatorio (página 44-63)