• No se han encontrado resultados

1.7 INTERFAZ DE RADIO LTE/LTE ADVANCED

1.7.4 CAPA FÍSICA

Factor analysis aims to describe the covariance structure of the data variables in terms of a few underlying, independent, but nan-measurable features called factors. It assumes that variables can be grouped according to their correlation. If variables in a particular group of data are highly correlated among themselves and less correlated with some other highly correlated group of variables, it is possible to conclude that each group represents an underlying structure, or factor, that explains the observed correlation.

The factor analysis model assumes that the elements of the D-dimensional feature vector

X (with mean m and covariance matrix S) are expressed as function of d non-measurable

variables, F-j, (d< D), called the common factors, and D error terms ey, e^, ...., eQ (an error term for each variable in x), by the equation

X ” ji = L F + £ (3.2)

L is the D X d matrix of factor loading, whose element Ijj is the loading of the if^ variable on the jf ii factor. To allow the representation on the data in a lower dimension space, the model is usually constrained to be orthogonal such that F and £ have zero means and are mutually independent, F has an identity covariance matrix while £ has a diagonal

covariance matrix (Cov(e)=Q)^ . By substituting the constraints in (3.2), and with some algebraic manipulation, the D x D covariance matrix of the data may be written as

C o v(x) = Z = L L + (3.3)

In this form it is easy to show that the components of L are the eigenvectors of S each scaled by the square root of the corresponding eigenvalues. If all eigenvalues are considered, LL is an exact (D x D) representation of S with H=0. If, however, only the d largest eigenvalues (and their corresponding eigenvectors) are selected to approximate S, then Ü. becomes a matrix of residual errors due to the lost information from the dropped D-d components, which suggests a way of choosing the number of factor loading, by keeping Q. small.

Sometimes estimates of the factors themselves may be required. If the D-dimensional data is represented by cf factors {d<D), F is a cf dimensional random vector which is linear function in the data. Each observation in the data will generate its corresponding values of the random variable F, which could be estimated by least square methods (Lawley and Maxwell, 1971), so for sample k,

F|t = ( L ’L r 'L ( x * - m ^ ^ ) (3.4)

Equation 3.4 assumes that the previously estimated parameters L and Q are the true values^.

It is important to underline the assumption that the factors model (3.2) is linear in the data. If some variables in x are related but their relationship is nevertheless non linear, the representation in (3.2) becomes inaccurate (discriminant analysis has a way around this limitation-section 3.4.3). Another problem with factor analysis is the occasional non­ existence of a solution. Equation 3.2 is a set of linear equations from which the loading parameters Ijj need to be calculated. Johnson and W ichern (1992) give an example in which a valid solution for 3.2 does not exist. Moreover, if a valid^ solution for 3.2 exists, linear transformations of L will also exist. The transformed version will provide equally valid representation of S and the resulting factors will have the same statistical characteristics. This ambiguity, resulting from the non-uniqueness of the solution, is demonstrated in Fig. 3.6 (from Hanaoka et al, 1993). The original result is shown together with a rotated version of it. The rotated version provides the same statistics as the original but gives more role in the discrimination to the second factor by placing the adenocarcinomas and most of the squamous cell carcinomas in opposite quadrants. However, the loadings on each factor are different. Although the factor rotation feature is often used, interpretations of the data based on them should be considered with care.

® These constraints thus suggest a model which linearly transforms the data onto an orthogonal space spanned by the common factors F,- ^ = 1 i n a way similar to principal components. For details on the relationship between factor analysis and principal components see Johnson and Wichern (1992).

^ The matrix of loadings in (3.2) is often written as L to indicate that itself is an estimate. * By valid we mean a solution that has positive W, values and I,y values less than one.

Chapter 3: Pattern recognition in MRS 46

Mathematically, L or any of its linear transform ations, are equivalent, but any causal association in the resulting factors or their loading becomes questionable.

□ Large cell Carcinoma ■ Small cell Carcinoma # Squamous cell carcinoma Q Adenocarcinoma

F ig u re 3.6 In factor analysis, rotation of the factor axis does not change the statistics, but the interpretation of how the variab les load on each factor. The solid axis are the original results of H anaoka e t al (1993). The dashed set of axes are a rotation of the original that offers another, equally valid, interpretation of the variable loadings. From Hanaoka et a i (1993) © 1993 W illiam s & W ilkins. Reproduced with perm ission.

For classification purposes, it is possible to keep rotating the factor axis until maximum separation between the groups is obtained, in which case the factor scores become equivalent to a discrim inant function (Morvan et al, 1990; Chabrol et al, 1995; Hagberg et al, 1995). Fluge et al (1996) attempted to find a cancer factor by analysing the methyl and methylene regions (0.85-1.25 ppm) of proton spectra of serum from controls and patients with neoplastic adenoma and colorectal cancer. They concluded that, contrary to Fossel

et al (1986), MRS (and hence the subsequent analysis) cannot be used to detect colorectal cancer at an early stage. However, they did not rule out completely the presence of a cancer factor in MRS of serum but emphasised that the detection and identification of such a factor will depend on the analysis procedure as well as the spectroscopy technique. Howells et al (1992a) addressed the relationship between the factors and the eigen-structure, interpreting the loading on the first factor as an average spectrum, while subsequent loading described variations between samples (Fig 3.7). This seems consistent with Ripley’s (1996) proposal that the first principal com ponent is an overall measure of the biological process. However, this does not explain why TPS, added as a chemical shift standard in constant concentrations to all samples, should have any loading on the factors since it has no real role in the biochemical process.

Hagberg et a / (1995) concluded that the orthogonal discrim inant form of factor analysis is better than PCA and factor analysis to classify in vivo spectra from normal brains and brain tum ours represented by five metabolic ratios. But the lipid/Cr and NAA/Cr ratios exerted only a small influence on the classification. However, the same study showed that in using cluster analysis of the same data set, the lipid/Cr ratio dominated the classification, a result consistent with those of other studies on the importance of the lipid/Cr and NA/VCr ratios for the classification of brain tum ours (Kuesel et al, 1994;

Rutter et al, 1995; PreuI et al, 1996). This leads to conclude that although factor analysis may be a good discrimination tool, the interpretation of the results should be considered with care. f I 20 15 !- I 10 - I II I (2) ' ' -a F—

F ig u re 3.7. A, B, 0 and D show variable loading on the first 4 principal com ponents, respectively, from the factor analysis of 72 tum our sam ples (H ow ells e t al, 1992a). The m arked peaks are: (1) TPS, the 0 ppm standard, (2) Val, (3) Lac, (4) Cho, (5) Tau, (6) Gly. Note that TPS is not a true feature of the biochem ical process and hence should not have any loading on the factors. From H ow ells e t ai (1992a) © 1992 W illiam s & W ilkins. R eproduced with perm ission.