5.2. Estrategia de Refinamiento
5.2.2. Métodos de Refinamiento
...
step 3: Decide on a one-tail or two-tail test. If the hypoth-esis being tested is that the average has or has not increased or decreased, choose a one-tail test. If the hypothesis being tested is that the average has or has not changed, choose a two-tail test.
step 4: Use Table 11.2 or the standard normal table to determine the z-value corresponding to the con-fidence level and number of tails.
step 5: Calculate the actual standard normal variable, z0.
z0¼ x
ffiffiffin p
11:88
step 6: If z0 z, the average can be assumed (with confidence level C) to have come from a differ-ent distribution.
Example 11.16
When it is operating properly, a cement plant has a daily production rate that is normally distributed with a mean of 880 tons/day and a standard deviation of 21 tons/day. During an analysis period, the output is measured on 50 consecutive days, and the mean output is found to be 871 tons/day. With a 95% confidence level, determine whether the plant is operating properly.
Solution step 1: Given.
step 2: C = 95% is given.
step 3: Since a specific direction in the variation is not given (i.e., the example does not ask whether the average has decreased), use a two-tail hypothesis test.
step 4: The population mean and standard deviation are known. The standard normal distribution may be used. From Table 11.2, z = 1.96.
step 5: From Eq. 11.88,
z0¼ x
ffiffiffin p
¼ 871 880 21ffiffiffiffiffi p50
¼ 3:03
Since 3:03 > 1:96, the distributions are not the same.
There is at least a 95% probability that the plant is not operating correctly.
32. APPLICATION: STATISTICAL PROCESS CONTROL
All manufacturing processes contain variation due to random and nonrandom causes. Random variation
cannot be eliminated. Statistical process control (SPC) is the act of monitoring and adjusting the performance of a process to detect and eliminate nonrandom variation.
Statistical process control is based on taking regular (hourly, daily, etc.) samples of n items and calculating the mean, x, and range, R, of the sample. To simplify the calculations, the range is used as a measure of the dispersion. These two parameters are graphed on their respective x-bar and R-control charts, as shown in Fig. 11.7.7 Confidence limits are drawn at ±3= ffiffiffi
pn . From a statistical standpoint, the control chart tests a hypothesis each time a point is plotted. When a point falls outside these limits, there is a 99.75% probability that the process is out of control. Until a point exceeds the control limits, no action is taken.8
33. LINEAR REGRESSION
If it is necessary to draw a straight line ðy ¼ mx þ bÞ through n data points ðx1; y1Þ; ðx2; y2Þ; . . . ; ðxn; ynÞ, the following method based on the method of least squares can be used.
step 1: Calculate the following nine quantities.
å
xiå
x2iå
xi2 x¼å
xin
å
xiyiå
yiå
y2iå
yi2 y¼å
yin
7Other charts (e.g., the sigma chart, p-chart, and c-chart) are less common but are used as required.
8Other indications that a correction may be required are seven mea-surements on one side of the average and seven consecutively increas-ing measurements. Rules such as these detect shifts and trends.
Figure 11.7 Typical Statistical Process Control Charts
UCL–x x –
R – LCL–x
UCLR
t
t –
BackgroundandSupport
step 2: Calculate the slope, m, of the line.
m¼n
å
xiyiå
xiå
yin
å
x2iå
xi2 11:89step 3: Calculate the y-intercept, b.
b¼ y mx 11:90
step 4: To determine the goodness of fit, calculate the correlation coefficient, r.
r¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n
å
xiyiå
xiå
yin
å
x2iå
xi2nå
y2iå
yi2s
11:91
If m is positive, r will be positive; if m is negative, r will be negative. As a general rule, if the absolute value of r exceeds 0.85, the fit is good; otherwise, the fit is poor. r equals 1.0 if the fit is a perfect straight line.
A low value of r does not eliminate the possibility of a nonlinear relationship existing between x and y. It is possible that the data describe a parabolic, logarithmic, or other nonlinear relationship. (Usually this will be apparent if the data are graphed.) It may be necessary to convert one or both variables to new variables by taking squares, square roots, cubes, or logarithms, to name a few of the possibilities, in order to obtain a linear relationship. The apparent shape of the line through the data will give a clue to the type of variable transforma-tion that is required. The curves in Fig. 11.8 may be used as guides to some of the simpler variable transformations.
Figure 11.9 illustrates several common problems encountered in trying to fit and evaluate curves from experimental data. Figure 11.9(a) shows a graph of clustered data with several extreme points. There will be moderate correlation due to the weighting of the extreme points, although there is little actual correlation at low values of the variables. The extreme data should be excluded, or the range should be extended by obtain-ing more data.
Figure 11.9(b) shows that good correlation exists in general, but extreme points are missed, and the overall correlation is moderate. If the results within the small linear range can be used, the extreme points should be excluded. Otherwise, additional data points are needed, and curvilinear relationships should be investigated.
Figure 11.9(c) illustrates the problem of drawing con-clusions of cause and effect. There may be a predictable relationship between variables, but that does not imply a cause and effect relationship. In the case shown, both variables are functions of a third variable, the city population. But there is no direct relationship between the plotted variables.
Figure 11.8 Nonlinear Data Curves y
a
x y = ce–bx
y = aebx
y = 1a + bx
y
a
x y = a + b x
y = a + bx2
y = a + bx + cx2 y = a + bx + cx2 + dx3 log y = a + bx + cx2 + dx3 log y = a + bx + cx2
y = a + b log x y
a
x y
a
a
a
x y
x y
x
√ c
Figure 11.9 Common Regression Difficulties
(a)
(b)
(c) amount of whiskey consumed in the city number of
elementary school teachers in the city
P R O B A B I L I T Y A N D S T A T I S T I C A L A N A L Y S I S O F D A T A
11-17
Backgroundand Support
Example 11.17
An experiment is performed in which the dependent variable, y, is measured against the independent vari-able, x. The results are as follows.
x y
1.2 0.602
4.7 5.107
8.3 6.984
20.9 10.031
(a) What is the least squares straight line equation that best represents this data? (b) What is the correlation coefficient?
Solution
(a) Calculate the following quantities.
å
xi¼ 35:1å
yi¼ 22:72å
x2i ¼ 529:23å
y2i ¼ 175:84å
xi2¼ 1232:01å
yi2¼ 516:38x¼ 8:775 y¼ 5:681
å
xiyi¼ 292:34n¼ 4 From Eq. 11.89, the slope is
m¼n
å
xiyiå
xiå
yin
å
x2iå
xi2 ¼ð4Þð292:34Þ ð35:1Þð22:72Þ ð4Þð529:23Þ ð35:1Þ2
¼ 0:42
From Eq. 11.90, the y-intercept is
b¼ y mx ¼ 5:681 ð0:42Þð8:775Þ
¼ 2:0
The equation of the line is
y¼ 0:42x þ 2:0
(b) From Eq. 11.91, the correlation coefficient is r¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n
å
xiyiå
xiå
yin
å
x2iå
xi2nå
y2iå
yi2s
¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið4Þð292:34Þ ð35:1Þð22:72Þ ð4Þð529:23Þ 1232:01
ð4Þð175:84Þ 516:38 vu
ut
¼ 0:914
Example 11.18
Repeat Ex. 11.17 assuming the relationship between the variables is nonlinear.
Solution
The first step is to graph the data. Since the graph has the appearance of the fourth case in Fig. 11.8, it can be assumed that the relationship between the variables has the form of y¼ a þ b log x. Therefore, the variable change z = log x is made, resulting in the following set of data.
z y
0.0792 0.602 0.672 5.107 0.919 6.984 1.32 10.031
If the regression analysis is performed on this set of data, the resulting equation and correlation coefficient are
y¼ 7:599z þ 0:000247 r¼ 0:999
This is a very good fit. The relationship between the variable x and y is approximately
y¼ 7:599 log x þ 0:000247
BackgroundandSupport
...
...
...