In addition to evaluating the magnitude of the R2 values as a criterion of predictive accuracy, researchers should also examine Stone-Geisser's Q2 value (Geisser,
1974;
Stone,1974).
This measure is an indicator of the model's predictive relevance. More specifically, when PLS-SEM exhibits predictive relevance, it accurately predicts the data points of indicators in reflective measurement models of endogenous constructs and endogenous single-item constructs (the procedure does not apply for formative endogenous constructs). In the structural model, Q2 values larger than zero for a certain reflective endogenous latent variable indicate the path model's predictive rele vance for this particular construct.The Q2 value is obtained by using the blindfolding procedure for a ceratin omission distance D.=. Blindfolding is a sample reuse technique that omits every dth data point in the endogenous construct's indicators and estimates the parameters with the remaining data points (Chin,
1998;
Henseler et al.,2009;
Tenenhaus et al.,2005).
The omitted data points are considered missing values and treated accordingly when run ning the PLS-SEM algorithm (e.g., by using mean value replacement). The resulting estimates are then used to predict the omitted data points. The difference between the true (i.e., omitted) data points and the pre dicted ones is then used as input for the Q2 measure. Blindfolding is an iterative process that repeats until each data point has been omitted and the model reestimated. The blindfolding procedure is only applied to endogenous constructs that have a reflective measurement model spec ification as well as to endogenous single-item constructs.When applying the blindfolding procedure to the PLS-SEM in Exhibit 6.5, the data points in the measurement model of the reflec tive endogenous construct are estimated by means of a two-step approach. First, the information from the structural model is used to predict the scores of latent variable
Y3•
More specifically, after run ning the PLS-SEM algorithm, the scores of the latent variablesY1, Y2,
andY3
are available. Instead of directly using theY3
scores, the blind folding procedure predicts these scores by using the available infor mation for the structural model (i.e., the latent variable scores ofY1
andY2,
as well as the structural model coefficientsp13
andp23).
Specifically, the prediction ofY3
equals the standardized scores of theA A
following equation:
Y 3
=P13
·Y1
+ P23 ·Y2,
wherebyy3
represents the structural model's prediction. These scores differ from the scores ofY3,
which were obtained by applying the PLS-SEM algorithm because they result from the structural model estimates rather than those from the measurement model (Chapter3).
A
In the second step, the predicted scores
(Y3)
of the reflective endogenous latent variable are used to predict systematically omit ted (or eliminated) data points of the indicators x5, x6, and x7 in the measurement model. The systematic pattern of data point elimination and prediction depends on the omission distance(D),
which must be determined to run the blindfolding procedure. An omission distance of3,
for example, implies that every third data point of the indicators x5, x6, and x7 is eliminated in a single blind folding round. Since the blindfolding procedure has to omit and predict every data point of the indicators used in the measurementExhibit 6.5 Path Model Example of the Blindfolding Procedure x1 y1 XS x2 y3 xs X3 y2 x7 x4
180 A Primer on Partial Least Squares
model of a reflective endogenous latent variable, it has to include three rounds. Hence, the number of blindfolding rounds always equals the omission distance D.
Exhibit
6.6
shows the application of the blindfolding procedure with respect to the reflective endogenous latent variable Y3 shown in Exhibit 6.5. For illustrative purposes, the number of observations for the standardized data of indicators x5, x6, and x7 is reduced to seven in the reflective measurement model of construct ¥3• We select an omission distance of3
in this example, but a higher number between 5 and 10 should be used in most applications (Hair et al., 2012b). Note that[dl], [d2],
and[d3]
in Exhibit6.6
are not entries in the data matrix but are used to show how the data elimination pattern is applied to the data points of x5, x6, and x7• For example, the first data point of indicator x5 has a value of -0.452 and is connected to[dl],
which indicates that this data point is eliminated in the first blindfold ing round. The assignment of the omission distance (e.g.,[dl], [d2],
and[d3]
in Exhibit6.6)
occurs per column. When the assignment of the pattern ends with[dl]
in the last observation (i.e., Observation7)
in the first column of indicator x5 (Exhibit6.6),
the procedure contin ues assigning[d2]
to the first observation in the second column of indicator x6•Exhibit
6.6
displays the assignment of the data omission pattern for the omission distance of3.
It is important to note that the omis sion distance D has to be chosen so that the number of observations used in the model estimation divided by D is not an integer. If the number of observations divided by D results in an integer, you would always delete the same set of observations in each round from the data matrix. For example, if you have 90 observations, the omission distance must not be 9 or 10 as in, respectively, 90/9 = 10 and 90/10 = 9. Rather, you should use omission distances of7
or 8. As shown in Exhibit6.6,
the data points[dl]
are eliminated in the first blindfolding round. The remaining data points are now used to estimate the path model in Exhibit 6.5. A missing value treatment function (e.g., the mean value replacement) is used for the deleted data points when running the PLS-SEM algorithm. These PLS-SEM esti mates differ from the original model estimation and from the results of the two following blindfolding rounds. The outcomes of the firstA
blindfolding round are used to first predict the Y3 scores of the selected reflective endogenous latent variable. Thereafter, the pre-
A
First Blindfolding Round: Omission of
Standardized Indicator Data Data Points [d1]
Observations Indicators of the Reflective Construct Y3 Indicators of the Reflective Construct Y3
Xs x6 X7 Xs x6 X7 1 -0.452 [d1] -0.309 [d2] -0.152 [d3] -0.309 [d2] -0. 152 [d3] 2 0.943 [d2] 1.146 [d3] 0.534 [d1] 0.943 [d2] 1.146 [d3] 3 -0.452 [d3] -0.309 [d1] -2.209 [d2] -0.452 [d3] -2.209 [d2] 4 0.943 [d1] -1.036 [d2] -0.837 [d3] -1.036 [d2] -0.837 [d3] 5 0.943 [d2] -1.036 [d3] 0.534 [d1] 0.943 [d2] -1.036 [d3] 6 -1.150 [d3] -1.036 [d1] -0.837 [d2] -1.150 [d3] -0.837 [d2] 7 1.641 [d1] -0.309 [d2] 1.220 [d3] -0.309 [d2] 1.220 [d3] (Continued)
Exhibit 6.6
(Continued)
Third Blindfolding Round: Omission of Second Blindfolding Round: Omission of Data Points [d2] Data Points [d3]
Observations Indicators of the Reflective Construct Y3 Indicators of the Reflective Construct Y3
X5 x6 X7 X5 x6 X7 1 -0.452 [d1] -0.152 [d3] -0.452 [d1] -0.309 [d2] 2 1.146 [d3] 0.534 [d1] 0.943 [d2] 0.534 [d1] 3 -0.452 [d3] -0.309 [d1] -0.309 [d1] -2.209 [d2] 4 0.943 [d1] -0.837 [d3] 0.943 [d1] -1.036 [d2] 5 -1.036 [d3] 0.534 [d1] 0.943 [d2] 0.534 [d1] 6 -1.150 [d3] -1.036 [d1] -1.036 [d1] -0.837 [d2] 7 1.641 [d1] 1.220 [d3] 1.641 [d1] -0.309 [d2]
(i.e., 135, 136, and 137; Exhibit
6.6)
in the first blindfolding round allow every single eliminated data point to be predicted in this first round. The second and the third blindfolding rounds follow a similar process. After the last blindfolding round, each data point of the indica tors of a selected reflective endogenous latent variable has been removed and then predicted. Thus, the blindfolding procedure can compare the original values with the predicted values. If the predic tion is close to the original value (i.e., there is a smallprediction
error), the path model has a high predictive accuracy. The prediction
errors (calculated as the difference between the true values [i.e., the omitted values] and the predicted values), along with a trivial predic tion error (defined as the mean of the remaining data), are then used to estimate theQ
2 value (Chin,1998). Q
2 values larger than 0 sug gest that the model has predictive relevance for a certain endogenous construct. In contrast, values of 0 and below indicate a lack of predic tive relevance.It is important to note that the
Q
2 value can be calculated by using two different approaches. Thecross-validated redundancy
approach-as described in this section-builds on the path model estimates of both the structural model (scores of the antecedent constructs) and the measurement model (target endogenous con struct) of data prediction. Therefore, prediction by means of cross validated redundancy fits the PLS-SEM approach perfectly. An alternative method, thecross-validated communality
approach, uses only the construct scores estimated for the target endogenous construct (without including the structural model information) to predict the omitted data points. We recommend using the cross validated redundancy as a measure ofQ
2 since it includes the key element of the path model, the structural model, to predict elimi nated data points.The
Q
2 values estimated by the blindfolding procedure repre sent a measure of how well the path model can predict the originally observed values. Similar to the f2 effect size approach for assessing R2 values, the relative impact of predictive relevance can be com pared by means of the measure to the q2 effect size, formally defined as follows:q