Orthogonal Rotation of Principal Components
Jolliffe (1995) discusses the effects of the orthogonal rotation of principal components and shows why it is not possible to preserve rotated components which are pairwise uncorrelated and/or whose loadings are orthogonal. Consider the mean centred or standardized data sampleX, then its principal components are given by
Y =X U,
using the spectral decomposition of the covariance matrix ofX (Section 1.1.1). Taking the first k components and treating the remaining components as residual error, e, a PC factor model can be written
X =YkU′k+e, (1.14) and the covariance matrix ofX is modelled as
Σ=UkY′kYkU′k+Ψ,
Ψdenoting a diagonal matrix of residual variance. As Yk are principal components,
Y′kYk=∆k, which is a diagonal matrix of the firstk eigenvalues ofΣ in descending order of magnitude. Then,
Σ=Uk∆kU′k+Ψ.
Notice that the factors, Yk and the principal component loading vectors are uncorre- lated as U′kUk =I and Y′kYk =∆k. As mentioned earlier the model is invariant to an orthogonal rotation of the principal axes. Let R be an orthogonal rotation, then
R′R =R R′ =I and the model becomes
X = (YkR)(UkR)′+e
which is equivalent to (1.14). However, the factors will no longer remain uncorrelated, as
The factor loadings will remain orthogonal,
(UkR)′(UkR) =R′U′kUkR =I.
In practice the factors are usually standardized, which causes the factors to remain uncorrelated after an orthogonal rotation (the loadings become non-orthogonal). This practice is criticized in the literature as standardization effectively stretches the scores to sit on a hypersphere so that any position of the axes will not induce correlation.
To standardize the factors, letZ=Y ∆−12, and the factor model becomes,
X =Zk∆ 1 2 kU′k+e and Σ=Uk∆ 1 2 kZ′kZk∆ 1 2 kU′k+Ψ. Now, Z′kZk =Ik and the factor loadings are Uk∆
1 2
k. So after an orthogonal rotation of the axes, the factors remain uncorrelated as,
(ZkR)′(ZkR) =Ik
but the loadings are no longer uncorrelated, as
(Uk∆ 1 2 kR)′(Uk∆ 1 2 kR) =R′∆kR.
In the literature, if factors are highly correlated this is taken as meaning the factors should really be one single factor. Oblique rotations, will better align with natural vari- able clusters and for this reason are recommended. However, in certain circumstances it may be useful to obtain orthogonal factors which are highly correlated. For example, when groups of correlated components differentiate to describe a latent trait in different ways. Or given a configuration of points from an analysis, for example an MDS, and the axes are arbitrary, it would be useful to rotate the configuration in such a way that the correlation or covariance between the latent variables is maximized, but keeping the axes orthogonal. In this way the latent variables could be displayed on a parallel coordinate plot. The axes remain independent, but the plot becomes easier to interpret as groups differentiate and the number of cross-overs on the plot are minimized. These applications are investigated in Chapter 3.
Oblique Rotation of Principal Components
As mentioned briefly in the last section, an oblique rotation will better align the factors with natural variable clusters. An application of this is to identify variable clusters and is discussed in Section 2.9, which looks at a method to obtain a simple interpretation of a large data set. A brief overview of oblique rotations is given here for reference.
Firstly, an oblique rotation relaxes the requirement that the axes are orthogonal, and so finding oblique axes is more akin to regression. Basilevsky has the detail. If both the variables and factors are standardized to unit length, then, ifG ={g1,g2, . . . ,gk} represents the oblique basis then,
ˆ
Σ=BΦB′+Ψ
and
Φ=G′G
which is the correlation matrix of the oblique axes. B is described in terms of an ordinary least squares projection of the dataX,
B′ = (G′G)−1G′X (1.15) and so the estimate ofX is
ˆ
X =G B′ =G(G′G)−1G′X =PGX
wherePG is an idempotent, symmetric projection matrix. From (1.15),
ΦB′ =G′X.
G′X is the correlation matrix of the variables and the oblique components, called the matrix of correlation loading coefficients. Bare the regression coefficients and represent the coordinates ofX with respect to the oblique components G. G is not unique and to define the oblique basis a further constraint is required. Criterion such as oblimin
provide this, and in a similar way to the varimax criterion guide the axes position to align with the variables.
Chapter 3 explores the case where axes can be found which remain orthogonal but the induced correlation between factors may provide groups of axes which although correlated, describe different aspects of a latent trait.