predictivities and adequacies: an illustrative example
Consider theNational track data set provided in Table 8.6in Johnson and Wichern (2002). This data set provides information on the national track records of men in 55 countries. For each country, measurements on eight variables namely 100m,
200m, 400m, 800m,1500m, 5000m, 10000m and marathon, are reported. The first three variables (100m, 200m and 400m) are measured in seconds while the other five variables are measured in minutes.
Table 3.6 contains the standard deviations of the eight measured variables. Due to the fact that the variables have widely differing standard deviations it is expected that the PCA biplots constructed from the standardised and unstandardised mea- surements of the data set will differ greatly. As these differences can be expected to be more substantial for the lower dimensional biplots, the axis predictivities, ad- equacies and overall qualities of the one-dimensional PCA biplots constructed from the standardised and unstandardised measurements of the National Track data re- spectively, will be compared to illustrate some of the concepts discussed thus far.
Table 3.6: The standard deviations of the eight measured variables of the National Track data set.
100m 200m 400m 800m 1500m 5000m 10000m Marathon
0.3514 0.6446 1.4570 0.0637 0.1559 0.8012 1.8077 9.2270
Table 3.7: The axis predictivities corresponding to the one-dimensional PCA biplot constructed from the unstandardised measurements of the National Track data set.
100m 200m 400m 800m 1500m 5000m 10000m Marathon
0.2873 0.3736 0.5184 0.6676 0.7656 0.8811 0.9024 0.9994
Table 3.7 contains the axis predictivities of the biplot axes of the one-dimensional PCA biplot constructed from the unstandardised measurements of the National Track data set. The very high axis predictivity of the biplot axis representing the variableMarathon is attributable to the very large relative magnitude of that vari- able’s standard deviation. Due to the extremely large standard deviation of the variableMarathon relative to those of the other variables, theMarathon dominates the first principal component, that is the coefficient vector of the first principal component lies very close to the Cartesian axis which represents Marathon in the
eight-dimensional measurement space. This is evident from the very large relative magnitude of the coefficient associated withMarathon in the first principal compo- nent (Table 3.8) as well as from the very high adequacy of the biplot axis representing
Marathon relative to those of the other biplot axes (Table 3.9). Due to the fact that the first principal component lies so close to the Cartesian axis which represents the variable Marathon in the eight-dimensional measurement space, the representation of the measurements on the variableMarathon in the one-dimensional PCA biplot will be very similar to the exact one-dimensional representation of the measurements on the variableMarathon.
Table 3.8: The coefficients of the first principal component of the unstandardised national track data set.
100m 200m 400m 800m 1500m 5000m 10000m Marathon
−0.02 −0.042 −0.111 −0.006 −0.014 −0.079 −0.181 −0.973
Table 3.9: The adequacies of the eight biplot axes in the one-dimensional PCA biplot constructed from the unstandardised measurements of the National track data set.
100m 200m 400m 800m 1500m 5000m 10000m Marathon
0.000 0.002 0.012 0.000 0.000 0.006 0.033 0.946
Note that the zero adequacies in Table 3.9 are not truly zero - they only appear to be zero due to rounding. Recall that a truly zero adequacy implies a zero axis predictivity (see Section 3.4.1). The fact that none of the axis predictivities in Ta- ble 3.9 are equal to zero confirms that none of the adequacies are exactly zero. If the first principal axis and the Cartesian axis representing the variableMarathon in the eight-dimensional measurement space were collinear, then all seven the other Carte- sian axes would have been orthogonal to the one-dimensional PCA biplot space, and hence the adequacies of the corresponding seven biplot axes would have been zero. Since the first principal axis does not lie exactly in the direction of the Carte- sian axis representing the variableMarathon in the eight-dimensional measurement space, the other seven Cartesian axes only lie at angles close to, but not exactly,
90○ to the one-dimensional biplot space and hence have adequacies close to, but not
exactly, zero.
Table 3.10: The weights of the axis predictivities in the expression of the overall quality of the PCA biplot constructed from the unstandardised measurements of the National Track data set.
100m 200m 400m 800m 1500m 5000m 10000m Marathon
0.0013 0.0045 0.0231 0.0000 0.0003 0.0070 0.0356 0.9281
Upon consideration of the weights of the axis predictivities in the calculation of the overall quality of the PCA biplot provided in Table 3.10, it is evident that the
variableMarathon has a very large weight compared to the other variables. This is the result of the large relative magnitude of the standard deviation of the variable
Marathon. The very large relative magnitude of the weight of the axis predictivity of the biplot axis representing the variable Marathon in the calculation of the overall quality together with the fact that the axis predictivity of this biplot axis is much higher than those of most of the other biplot axes, implies that the overall quality in Table 3.11 is overly optimistic.
Table 3.11: The overall qualities corresponding to the one-dimensional PCA biplots constructed from the unstandardised and standardised measurements of the National Track data respectively.
Unstandardised Standardised
0.98 0.83
When the PCA biplot is constructed from the standardised measurements of the National Track data set, the first principal component is not dominated by the variable Marathon - in fact, the eight variables contribute almost equally to the first principal component. This is evident from the coefficients of the first principal component provided in Table 3.12 as well as from the adequacies of the eight biplot axes provided in Table 3.13.
Table 3.12: The coefficients of the first principal component of the standardised national track data set.
100m 200m 400m 800m 1500m 5000m 10000m Marathon
−0.318 −0.337 −0.356 −0.369 −0.373 −0.364 −0.367 −0.342
Table 3.13: The adequacies of the eight biplot axes of the one-dimensional PCA biplot constructed from the standardised measurements of the National Track data set.
100m 200m 400m 800m 1500m 5000m 10000m M arathon
0.1008 0.1136 0.1265 0.1359 0.1390 0.1328 0.1345 0.1169
Table 3.14: The axis predictivities of the eight biplot axes of the one-dimensional PCA biplot constructed from the standardised measurements of the National Track data set.
100m 200m 400m 800m 1500m 5000m 10000m M arathon
0.6678 0.7520 0.8376 0.9001 0.9204 0.8792 0.8908 0.7742
Table 3.14 contains the axis predictivities of the eight biplot axes of the one- dimensional PCA biplot constructed from the standardised measurements of the
National Track data set. Since the eight standardised variables carry equal sized weights in the calculation of the overall quality of the biplot and hence the overall quality provided in Table 3.11 is equal to the arithmetic average of the eight axis predictivities in Table 3.14.