4.9 TYPES OF MODELS
There have been many references to amodel to be constructed from experimental data. The word model refers to the mathematical description of how the response behaves as a function of the input variable or variables. A good model explains the systematic behavior of the srcinal data in some concise manner. The specific form of the model depends on the type of design variable used in the experiment. If an experiment contains a single qualitative design variable set to several different treatment levels, then the model consists of the treatment means. There will be as many means for the model as there are treatments in the experiment. If an experiment contains a single quantitative 100
variable that covers a range of values, then the model will consist of an equation that relates the response to the quantitative predictor. Experiments that involve qualitative predictors are usually analyzed by analysis of variance (ANOVA). Experiments that involve quantitative predictors are usually analyzed by regression. Experiments that com- bine both qualitative and quantitative variables are analyzed using a special regression model called a general linear model.
Any model must be accompanied by a corresponding description of the errors or dis- crepancies between the observed and predicted values. These quantities are related by: (4.1) where yirepresents the ith observed value of the response,ˆ yirepresents the correspond-
ing predicted value from the model, ande irepresents the difference between them. The e iare usually called theresiduals. In general, the relationship between the data, model,
and error can be expressed as:
(4.2) At a minimum, the error statement must include a description of the shape or dis- tribution of the residuals and a summary measure of their variation. The amount of error or residual variation is usually reported as a standard deviation called the standard error of the modelindicated with the symbol ˆs e or se . When the response is measured
under several different conditions or treatments, which is the usual case in a designed experiment, then it may be necessary to describe the shape and size of the errors under each condition.
Most of the statistical analysis techniques that we will use to analyze designed experiments demand that the errors meet some very specific requirements. The most common methods that we will use in this book, regression for quantitative predictors and ANOVA for qualitative predictors, require that the distribution of the errors is normal in shape with constant standard deviation under all experimental conditions. When the latter requirement is satisfied, we say that the distribution of the errors is homoscedastic. A complete error statement for a situation that is to be analyzed by regression or ANOVA is, “The distribution of errors is normal and homoscedastic with standard error equal to se ,” where se is some numerical value. If the distribution of errors
does not meet the normality and homoscedasticity requirements, then the models obtained by regression and ANOVA may be incorrect.* Consequently, it is very impor- tant to check assumptions about the behavior of the error distribution before accepting a model. When these conditions are not met, special methods may be required to ana- lyze the data.
Data
→
Model+
Error Statementyi
= +
yˆi ε iDOE Language and Concepts DOE Language and Concepts 101101
*When the standard deviations of the errors are different under different conditions (for example, treatments) we say that the error distributions are heteroscedastic.
Example 4.3 Example 4.3
A manufacturer wants to study one of the critical quality characteristics of his process. He draws a random sample of n = 12 units from a production lot and measures them, obtaining the distribution of parts shown in Figure 4.5a. The mean of the sample is – x = 140 and the standard deviation is s = 10. A normal plot of the sample data (not shown) indicates that the observations are normally distributed. From this information, identify the data, model, and the error statement.
Solution:The data values are the n = 12 observations, which can be indic ated with
the symbol yiwhere i = 1, 2, . . . , 12. The model is the one number, ˆ yi= – y = 140, that
best represents all of the observations. The error values are given bye i= yi – – y and are
known to be normally distributed with standard deviations e 10. These definitions
permit Equation 4.2 to be written:
(4.3) Example 4.4
Example 4.4
The manufacturer in Example 4.3 presents his data and analysis to his engineer- ing staff and someone comments that there is considerable lot-to-lot variation in the product. To test this claim, he randomly samples n= 12 units from three different lots.
The data are shown in Figure 4.5b. The three lot means are – y A= 126, – y B= 165, and – yC
= 123 and the standard deviations are all about s e = 10. Normal plots of the errors
yi
= −
y ε i102
102 Chapter Four Chapter Four
180 160 120 100 a. a. 140 Lot Lot B C A 180 160 120 100 b. b. 140 Temperature Temperature 25 30 20 180 160 120 100 c. c. 140 Figure 4.5
indicate that each lot is approximately normally distributed. From this information, identify the data, model, and the error statement.
Solution:The data are the n = 12 observations drawn from the k = 3 lots indicated
by the symbol yijwhere i indica tes the lo t (A, B, or C) and j indi cates the obse rvation
(1 to 12) within a lot. The model consists of the three means – y A= 126, – y B= 165, and – yC
= 123. The error statement is, “The errors are normally distributed with constant stan- dard deviations e 10.”
Example 4.5 Example 4.5
After reviewing the data and analysis described in Example 4.4, someone realizes that the part temperatures were different for the three lots at a critical point in the process. They decide to run an experiment by making parts at different temperatures. n= 12 parts were made at 20, 25, and 30C in completely randomized order and the data are shown in Figure 4.5c. They use linear regression to fit a line to the data and obtain y = 60 + 3T where T is the temperature. The errors calculated from the difference between the observed values and the pre dicted values (that is, the fitted line) are approximately normal and have s e 10. From this information, identify the data,
model, and the error statement.
Solution:The data are the36 sets of paired (T
i , yi) observations. The model is given
by the line y = 60 +3T. The error statement is, “The errors are normally distributed with constant standard deviations e 10.”
Models involving quantitative predictors are written in the form of an equation. These models may be empirical or based on first principles depending on the needs of the analyst. The goal of an empirical model is to provide an accurate description of the response independent of the physical mechanisms that cause the predictors to affect the response. Empirical models tend to be arbitrary. A model based on first principles gets its functional form from the mechanistic theory that relates the predictors to the response. First-principles models may be based on very crude to highly accurate ana- lytical study of the problem. It may be safe to extrapolate a first-principles model but empirical models should never be extrapolated.
Whether an empirical or first-principles model is fitted to data depends on the motivation of the experimenter. A scientist who wants to demonstrate that data follow some theoretical formula will, of course, have to use the first-p rinciples approach. This may involve some heroics to transform the formula into a form that can be handled by the software. On the other hand, a manufacturing engineer probably doesn’t care about the true form of the relationship between the response and the predictors and is usually willing to settle for an effective empirical model because it gets the job done. He would still be wise to stay conscious of any available first-principles model because it will suggest variables, their ranges of suitable valu es, and other subtleties that might influ- ence the design of the experiment even if time or model complexity prohibit the use of the first-principles model. When a first-princi ples model is available, it is almost
DOE Language and Concepts DOE Language and Concepts 103103
always preferred over an empirical model , even if the empirical model provides a slightly better fit to the data.
Example 4.6 Example 4.6
An experiment is performed to study the pressure of a fixed mass of gas as a func- tion of the gas volume and temperature. Describe empirical and first-principles models that might be fitted to the data.
Solution: In the absence of any knowledge of the form of the relationship between
the gas pressure (P) and its volume (V) and temperature (T), an empirical model of the form:
(4.4) might be attempted where a, b, c, and d are coefficients to be determined from the data. For the first-principles model, the kinetic theory of gases suggests that an appropriate model would be:
(4.5)
where a is a coefficient to be determined from the data. Although both models might fit the data equally well, the second model would be preferred because it is suggested by
the theoretical relationship betwe en P, T, and V.
So why are we so concerned about models? What’s the purpose for building them in the first place? These questions are also asking about our motivation for doing designed experiments. The purpose of any designed experiment and its corresponding model is to relate the response to its predictors so that the response can be optimized by better management of the predictors. Some of the reasons to build a model are:
• To determine how to maximize, minimize, or set the response to some target value.
• To learn how to decrease variation in the response. • To identify which predictor variables are most important.
• To quantify the contribution of predictor variables to the response. • To learn about interactions between predictor variables.
• To improve the operation of a process by learning how to control it better. • To simplify complex operating procedures by focusing attention on the most
important variables and by taking advantage of previously unrecognized relationships between predictor variables and between them and the response.
P aT
V
=
P
= + + +
a bV cT dVT104