Discussion - INVERSE PROBLEM OF PREDICTING STOCHASTIC FATIGUE DAMAGE AND RELIABILITY IN COMPOSI

This information about the probability that a sample statistic (e.g. the mean of a sample of Mexican people on job commitment) is a good estimate of a population parameter (i.e. the mean job commitment of all working Mexicans) is expressed using confidence inter-vals. With confidence intervals, you specify how confident you want to be of knowing the range of values that a population mean might fall within. Most commonly the 95 per

cent confidence interval is specified. That is, based on your sample data, you estimate, with 95 per cent confidence, the lowest and highest values that the population mean might actually have.

To understand the concept of a conﬁdence interval, look back at the data in Table 2.1, giving 12 job-commitment scores.

The mean and the 95 per cent conﬁdence interval for this commitment data turn out to be:

■ mean – 3.58

■ 95 per cent conﬁdence interval – lower: 2.59

– upper: 4.58

How this confidence interval is worked out will be explained in Chapter 3; for the time being it is sufficient to accept that it can be done. The sample mean, which is the best esti-mate you have of the population mean, is 3.58, and the confidence interval tells you that you can be 95 per cent confident that the population mean is actually no lower than 2.59 and no higher than 4.58. Using inferential statistical methods, you can’t be sure that the sample mean is the same as the population mean, but you can be 95 per cent confident that the population mean lies somewhere between 2.59 and 4.58. This is equivalent to saying that, in actual fact, the population mean may be lower than 2.59, or it might be higher than 4.58, but there is only a 5 per cent chance of this.

Obviously, you would like the confidence interval to be as small as possible. It is not much use knowing that you can be 95 per cent confident that the mean level of job commit-ment of nurses is somewhere between 1.5 and 6.5 on a 7-point scale! It would be much better to know that you can be 95 per cent confident that you have a small range of scores – say, ranging only from 3.57 to 3.59 – within which your population mean actu-ally falls.

There are two factors that determine the size of the conﬁdence interval of the mean.

They are:

■ the sample size;

■ the sample variance.

This is actually common sense, if you think about it.

Imagine that you wanted to estimate the mean salary of people working for a large multi-national organization. You take two random samples of people who work for the organization, and ask them their salaries. One sample consists of ﬁve people, and the other of 100 people. The mean salary in the small sample turns out to be $41,000, and in the large sample it is $52,000. Which of the two samples do you think would be more likely to have given you an accurate estimate of the mean salary of everyone who works for the organization?

Your intuition, like formal statistical theory, will tell you that the larger sample is more likely to give you a good estimate of the mean population in the whole organization.

1111

THE PRINCIPLES OF INFERENTIAL STATISTICS

So, the 95 per cent confidence interval for the larger sample will almost certainly cover a smaller range than the 95 per cent confidence interval for the smaller sample. Put slightly differently, you can be more confident that the large sample provides you with an accurate estimate of the population mean than that a small sample does.

The other factor that affects the size of the conﬁdence interval is the amount of varia-tion in the data. Imagine that you go into two organizavaria-tions and collect informavaria-tion from 10 people in each about their job commitment, measured on a 7-point scale. The results are shown in Table 2.2.

As you can see in Table 2.2, the mean job-commitment score is 5.5 for both organiza-tions. But which mean do you think is more likely to give you an accurate estimate of the mean of all the people in each organization? Which would you be more conﬁdent in? Again, statistical theory is probably in line with your common sense or intuition here. The conﬁ-dence intervals for these two sets of data are shown in Table 2.3.

In Table 2.3, you can see that the confidence interval for Organization A, where the variation in the job-performance scores was relatively small, is less wide than the confi-dence interval for Organization B, where the dispersion of the scores was considerably greater. To summarize, therefore, the extent to which you can be confident that a sample statistic such as the mean is an accurate estimate of its equivalent population parameter depends on the size of the sample and the amount of variation in the data. The larger the sample, and the smaller the amount of variation in the scores within the sample, the more confident you can be that your sample statistic (the thing you know about) is a good estimate of the population parameter (the thing you want to know about).

Table 2.2 The job-commitment scores of people in two organizations Organization A Organization B

5 1

5 8

5 10

6 3

6 5

5 6

5 8

6 2

6 9

6 3

Mean⫽ 5.50 Mean ⫽ 5.50 Standard Standard deviation⫽ 0.53 deviation ⫽ 3.17

Table 2.3 The conﬁdence intervals for job commitment in two organizations

95% conﬁdence interval

Lower Upper Organization A 5.12 5.88 Organization B 3.23 7.77

Conﬁdence intervals for categorical data

As well as obtaining the confidence intervals for continuous data, you can also do it for categorical data. Imagine you wish to know the proportion of women in a large organiza-tion who smoke. You take a sample of 20 women, and find that eight of them smoke. This suggests that 40 per cent of women in the organization are smokers. However, because you have only a sample of women, you can’t be sure that the proportion of women in the organization as a whole who smoke is exactly 40 per cent. Nevertheless, you can obtain the 95 per cent confidence interval for this, which turns out to be 19 per cent to 61 per cent. So, given that you have found that 40 per cent of your sample of 20 women smoke, you can be 95 per cent confident that the actual number of women who smoke in the organ-ization is no less than 19 per cent and no more than 61 per cent.

As with continuous data, the range of values in the conﬁdence interval for categorical data is reduced if you increase the sample size. This is because the larger the sample, the more conﬁdent you can be that the proportion of your sample that has a particular charac-teristic is a good estimate of the proportion of the population that has that characcharac-teristic.

If instead of finding that eight out of a sample of 20 women smoke, you find that 40 out of a sample of 100 women smoke, the 95 per cent confidence interval this time works out as 30 per cent to 50 per cent. Whereas the range of the confidence interval based on a sample size of only 20 was 42 points (61 per cent minus 19 per cent), the range of this interval based on a sample size of 100 is reduced to only 20 points (50 per cent minus 30 per cent). Once again you can be more confident in the accuracy of your sample statistic if it is based on a large sample than if it is based on a small one.

ASSOCIATIONS AND DIFFERENCES IN INFERENTIAL

In document INVERSE PROBLEM OF PREDICTING STOCHASTIC FATIGUE DAMAGE AND RELIABILITY IN COMPOSITES MATERIALS (página 41-46)