An important step in a PCA/FA is to attempt to interpret what the extracted PCs actually mean with reference to the problem or hypothesis posed. The fi rst stage of this analysis involves determining which variables load signifi cantly onto each PC. A simple procedure would be to accept as signifi cant any variable whose loading was larger than a certain
150 FACTOR ANALYSIS AND PRINCIPAL COMPONENTS ANALYSIS
value, for example, > 0.30 or > 0.50; but this is an arbitrary procedure and does not take into account sample size. A more rigorous method is to test the loadings statistically using Stevens method (Norman & Streiner, 1994 ) and is given for a sample size of N by the
Hence, any variable whose factor loading exceeds this critical value may be regarded as being signifi cantly correlated with a PC.
28.5.7 Interpretation
In the present application of PCA/FA, bacterial strains are the variables ( Q - type analysis), and the result is a scatter plot of the strains in relation to the extracted PC. The objective is twofold: (1) to describe the pattern of variation between bacterial strains and (2) to identify those features of the DNA profi les that best correlate with the distribution of the strains.
The matrix of correlations between the MRSA strains is shown in Table 28.1 . The majority of the correlations exceed 0.30, suggesting that the data are suitable for PCA/FA.
Two PCs were extracted from the data, accounting for approximately 93% of the total variance in the data (Table 28.2 ). Hence, reducing the original 10D frame to 2D has resulted in the loss of approximately 7% of the original spatial information. A plot of the bacterial strains in relation to PC1 and PC2 is shown in Figure 28.2 . The data suggest that the data from wells I / J , C / H , and A / B / E immediately form three groups, which are each identical according to band distances. Furthermore, D is the strain most closely related to E / B / A and F is most closely related to J / I . Strain G appears to be the most unrelated to the others. In addition, the correlations between the band distances for each strain and the factor loadings of the strains on PC1 and PC2 are shown in Table 28.3 . Bands 4 and 14 were positively correlated with PC1 and bands 11 and 12 negatively correlated with PC1.
Hence, these are the DNA band distances that are the most important in determining the clustering of the strains. Moreover, band 13 was negatively correlated with PC2. The
TA B L E 28.1 Simple Correlation Matrix (Pearson ’ s Correlation Coeffi cient r ) between the
a The majority of the correlations exceed 0.30, suggesting that the data are suitable for PCA/FA.
ANALYSIS: HOW IS THE ANALYSIS CARRIED OUT? 151
Figure 28.2. Resulting pulsed - fi eld gel electrophoresis (PFGE) patterns of the eight Sma I genomic digests of methicillin - resistant S. aureus (MRSA); wells 3 and 8 carry an Sma I chromosomal digest from S. aureus strain NCTC 8325 as a control and molecular weight marker plotted in relation to PC1 and PC2. the Bacterial Strains and the Extracted Principal Components (PC) and Percentage of Total Variance Explained by Each PC
interpretation of the PCA/FA data in this example is in close agreement with that of the dendrogram analysis described in Statnote 27 . However, the PCA has a number of advan-tages over that of classifi cation: (1) no assumptions are made that the data are actually classifi able; (2) the relationship between strains and clusters of strains is spatially dis-played, which facilitates discussion of the implications of the analysis; and (3) the analysis identifi es the criteria, in this case the band distances, that best differentiate between the strains.
152 FACTOR ANALYSIS AND PRINCIPAL COMPONENTS ANALYSIS
TA B L E 28.3 Correlations between Band Distances and Factor Loadings of Bacterial Strains on PC1 and PC2
Band
Extracted PCs
PC1 PC2
1 0.26 0.54
2 0.51 − 0.08
3 0.39 − 0.18
4 0.66 a 0.40
5 − 0.13 0.32
6 − 0.04 0.13
7 − 0.16 0.28
8 − 0.21 0.38
9 − 0.42 0.14
10 − 0.57 0.22
11 − 0.69 a 0.26
12 − 0.80 a 0.55
13 − 0.31 − 0.96 b
14 0.88 b − 0.44
a P < 0.01.
b P < 0.001.
28.6 CONCLUSION
PCA/FA are methods of analyzing complex data sets in which there are no clearly defi ned X or Y variables. They have multiple uses, including the study of the pattern of variation between individual entities such as bacterial strains and the detailed study of descriptive variables. In most applications, variables are related to a smaller number of factors or PCs that account for the maximum variance in the data and, hence, may explain important trends among the variables. No assumptions are made before the analysis that the variables can actually be classifi ed, and this may be a considerable advantage in the analysis of more complex data sets in which DNA band data among strains may be more continuously distributed.
REFERENCES
(ANOVA) with special reference to data from clinical experiments in optometry . Ophthal Physiol Opt 20 : 235 – 241 .Armstrong , R. A. , Eperjesi , F. , & Gilmartin , B. ( 2002a ). The application of analysis of variance (ANOVA) to different experimental designs in optometry . Ophthal Physiol Opt 22 : 1 – 9 . Armstrong , R. A. , Cairns , N. J. , Ironside , J. W. , & Lantos , P. L. ( 2002b ). Quantifi cation of
vacuola-tion ( “ spongiform change ” ), surviving neurons and prion protein deposivacuola-tion in eleven cases of variant Creutzfeldt – Jakob disease . Neuropathol Appl Neurobiol 28 : 129 – 135 .
Armstrong , R. A. , Cairns , N. J. , Ironside , J. W. , & Lantos , P. L. ( 2002c ). Laminar distribution of the pathological changes in the cerebral cortex in variant Creutzfeldt – Jakob disease (vCJD) . Folia Neuropathol 40 : 165 – 171 .
Bannerman , T. L. , Hancock , G. A. , Tenover , F. C. , & Miller , J. M. ( 1995 ). Pulsed - fi eld gel electro-phoresis as a replacement for bacteriophage typing of Staphylococcus aureus . J Clin Microbiol 33 : 551 – 555 .
Statistical Analysis in Microbiology: Statnotes, Edited by Richard A. Armstrong and Anthony C. Hilton Copyright © 2010 John Wiley & Sons, Inc.
154 REFERENCES electrophoresis protocols for epidemiological typing of strains of methicillin - resistant Staphylococcus aureus : A single approach developed by consensus in 10 European laboratories and its application for tracing the spread of related strains , J Clin Microbiol 41 : 1574 – 1585 .
REFERENCES 155
Scheff é , H. ( 1959 ). The Analysis of Variance . Wiley , New York .
Smith , S. N. , Armstrong , R. A. , & Rimmer , J. J. ( 1984 ). Infl uence of environmental factors on zoospores of Saprolegnia diclina . Trans Br Mycol Soc 82 : 413 – 421 .
Snedecor , G. W. & Cochran , W. G. ( 1980 ). Statistical Methods , 7th ed. Iowa State University Press , Ames, IA .
Spearman , C. ( 1904 ). The proof and measurement of association between two things . Am J Psychol 15 : 72 – 101 .
Spjotvoll , E. & Stoline , M. R. ( 1973 ). An extension of the t - method of multiple comparisons to include cases with unequal sample sizes . J Am Stat Assoc 69 : 975 – 979 .
Tabachnick , B. G. & Fidell , L. S. ( 1989 ). Using Multivariate Statistics , 2nd ed. Harper and Row , New York .
Wilcoxon , F. ( 1945 ). Individual comparisons by ranking methods . Biomet Bull 1 : 80 – 83 .
Will , R. G. , Ironside , J. W. , Zeidler , M. , Cousans , S. N. , Estebeiro , K. , Alperovitch , A. , Poser , S. , Pocchiari , M. , Hofman , A. , & Smith , P. G. ( 1996 ). A new variant of Creutzfeldt – Jakob disease in the United Kingdom . Lancet 347 : 921 – 925 .
Appendix 1
WHICH TEST TO USE: TABLE
The fi rst column in the following table lists the type of data to be analyzed and the second and third columns the recommended parametric and nonparametric statistical procedures, respectively, that could be applied to the respective data. The sections of the various statnotes that describe the statistical tests are given in boldface in parentheses.
An alternative method of selecting the correct test using a taxonomic key is presented in Appendix 2 .
Statistical Analysis in Microbiology: Statnotes, Edited by Richard A. Armstrong and Anthony C. Hilton Copyright © 2010 John Wiley & Sons, Inc.
Form of the Data
Possible Statistical Procedures
Parametric Nonparametric
A single observation x Is x a member of a specifi c population ( 2.5 )?
—
A sample of x values Construct frequency distribution,
calculate x * , SD, SEM, CI ( 2.4 ).
Is X normally distributed? ( 1 )
Mode, median, 95th percentile ( 4 ) Two independent samples ( x 1 , x 2 ) Unpaired t test ( 3.4 ) Mann – Whitney U test
( 4.7 )
Two paired samples ( x 1 − x 2 ) Paired t test ( 3.6 ) Wilcoxon signed - rank test ( 4.8 )
Two sets of measurements using two methods
Test of agreement: Bland and Altman ( 16 )
—
Continued
158 APPENDIX 1: WHICH TEST TO USE: TABLE
Appendix 2
WHICH TEST TO USE: KEY
A taxonomic key for the identifi cation of the correct statistical procedure. This is an alter-native method of fi nding the correct test and relies on following a key analogous to those used in taxonomy for the identifi cation of bacteria and fungi. Starting at (1) decide which of the alternative statements applies to the data and then follow the steps shown by the numbers in parentheses until a statnote (in bold) and an appropriate tests (in italics) is indicated.
1. The data comprise frequencies, that is, counts of specifi c events. (2)
The data comprise scores, for example, abundance of microorganisms on a fi ve point scale. (5)
The data are measurements, that is, continuous variables measured in units. (8) 2. Objective is a test of normality: Statnote 1 , Goodness of fi t test χ 2 test, KS test . Objective is comparison of two or more frequencies of a single variable: Statnote
1 , Goodness of fi t test χ 2 test, KS test .
Objective is comparison of two or more frequencies comprising two variables. (3) 3. The data comprise a 2 × 2 contingency table. (4)
The data comprise more than two rows and columns: Statnote 5 , R × C contingency table, χ 2 test .
4. Expected frequencies are > 5: Statnote 5 , 2 × 2 contingency table, χ 2 test . Expected frequencies are < 5: Statnote 5 , Fisher ’ s 2 × 2 exact test .
5. Data comprise a single set of scores of one variable: Statnote 4 , Median, mode,