en la vida, y se te ahogaron al vacío inerte mismo,

Although it has been stated that msi data acquired with sims could be more suitable for multivariate analysis than maldi data due to the presence of correlating fragments in sims,51 _the following work aims to demonstrate that multivariate methods are very useful in the analysis of maldi imaging data.

Multivariate versus univariate analysis

Principal component analysis (pca) of the msi data, processed as described in §5.4, with 564 variables and 11439 pixels and mean-centred, is able to identify a number of features, see figure 5.12. In this figure, the scores of the pca model on a given pc axis are used to colour-code the image, so a red pixel corresponds to a high value on that pc. These images show that pc 1 models the overall difference between the grey and white matter, and subsequent pcs show substructures, such as the hippocampus and the cerebellum, that are blue in pc 3.

The loadings of the pca model can be visualised by ‘back-projection’,90 _{as is shown in figure} 5.13. In this figure, the colour is the absolute value of the loading, and thus red peaks are most important for this component. The sign of the loading is combined with the unscaled data to present the height of the peaks. The example shown in figure 5.13 is for pc 1, and can similarly be calculated for all subsequent components. The peak at m/z = 835.6 was assigned as a sphingomyelin ([SM 24:1 + Na]), the peak at m/z = 788.6 as a phosphatidylcholine ([PC 36:1 + H]) and m/z = 810.6 as [PC 36:1 + Na]; these compounds are higher in the areas coloured red for pc 1 in figure 5.12, such as the corpus callosum, medulla and the white matter in the cerebellum. In contrast, m/z = 782.4, which could be from [PC 34:1 + Na], m/z = 756.4 and m/z = 697.4 are higher in the areas coloured blue for pc 1 in figure 5.12, including the cerebral cortex, hippocampus and the cerebellar grey matter.

Figure 5.12: Images colour-coded with the scores of the pixels on the different principal components (pc 1 to 6, the respective variance explained is indicated). The pca model was built on the mean-centred data set after processing (11439 pixels×564 variables), and the scores on the pcs are ‘refolded’ to the image to aid visual interpretation.

100 200 300 400 500 600 700 800 900 184 697.4 756.4 782.4 788.6 810.4 _835.6 757.4 _783.4 810.6 m/z

Figure 5.13: The absolute values of the loading for the first pc are used as a colour-coding; peaks coloured red can be considered important variables. The sign of the loading is combined with the data without mean-centring to form the peak height and direction and create a mass spectral appearance. Peaks pointing up have a higher intensity in the pixels with high scores on pc 1, whereas negative bars correspond to pixels with low scores on pc 1.

A different way of representing the scores is based on red–green–blue (rgb) encoding, where each colour corresponds to the scores on one of the principal components. An example is shown in figure 5.14 A, where pc 1 (red), pc 2 (green) and pc 3 (blue) can be summarised in one image, (compare with figure 5.12). This differentiates e.g. the corpus callosum and the white matter and cortex of the cerebellum. The same can be done for any combination of principal components, e.g. the rgb image based on pcs 4, 5 and 6 is shown in figure 5.14 B, where the hippocampus, thalamus and septum are visible. Comparison of these two figures with the histology image (figure 5.1) shows an advantage of the modelled msi data in terms of differentiating anatomical parts.

Figure 5.14: The scores on three different principal components can be summarised in one image using red, green and blue encoding for the different pcs. (A) The combined image of pc 1 (red), pc 2 (green) and pc 3 (blue). (B) The rgb encoded image of pc 4 (red), pc 5 (green) and pc 6 (blue).

Multivariate analysis241

of the high-dimensional msi data is preferred to univariate analysis. As an example, two m/z images are shown in figure 5.15: m/z = 810.6 [PC 36:1 + Na], which had the highest loading on pc 1, and the variable with the highest average intensity, m/z = 760.4 [PC 34:1 + H]. As expected, there is some similarity between these images and pc 1 and pc 2 (inverted), respectively, but the clarity of the images differ (compare with figure 5.12). Moreover, the choice of these two values is biased: 2 from the 564 variables were selected manually, whereas pca is an unbiased method to create an overview of the data. To find specific structures, it would be necessary to evaluate each individual m/z variable. Thus, manual variable selection is likely to overlook interesting substructures or over-interpret artefacts.

Figure 5.15: Images of the variable with the highest loading on pc 1 (m/z = 810.6, left) and the highest average value in the image (m/z = 760.4, right).

Score space versus image space

Evaluation of pca results is not limited to mapping back the scores of a principal component in the image: the score plot itself can be used to evaluate clustering of certain image pixels.242, 243 The identified outliers, classes or trends can then be mapped back onto the image, similar to a mask, where only the selected pixels are presented. Examples of this are given in figure 5.16 A–C, where two clusters of pixels in the score plot of pc 1 versus pc 5 are coloured, and these pixels correspond to the outer right side of the image: the top (pink) and bottom (turquoise) regions.

This separation in the score plot could be the result of the different composition of these two regions in the cerebral cortex, but because pca models the variation in the spectral data, it is not possible to differentiate between biologically interesting variation and analytical artefacts. The only thing that is certain is that the measured signal differs between these two regions. Thus, the fact that separate clusters are detected can indicate both analytical variation as well as biological variation and can therefore be used in data quality control. One example of an analytical artefact would be the observation of a raster-effect, which could happen because the msi data are currently acquired in a grid: artefact modelling would allow the diagnosis of sample degradation over time.

Ideally, the spectra should be measured in a random order, but with the current technology this would result in a great increase in measurement time or incomplete coverage of the image.

A score plot of pc 8 versus pc 9, see figure 5.16 D–F, has a distinct separate cluster (yellow) corresponding to the anatomical structure of the hippocampus; the cluster coloured blue corresponds to pixels in the region around the pons and pituitary gland.

Figure 5.16: (A) The scores of pc 1 versus pc 5, two clusters in the score space are selected. (B, C) The position of the selected pixels from the score space in the image is visualised, where a grey scale and colour scale, both based on pc 1, are used to indicate the unselected and selected pixels, respectively. (D) The scores of pc 8 versus pc 9, the yellow and blue clusters are selected. (E, F) The positions of the selected pixels from the score space are visualised, where the grey scale and colour scale are based on the scores on pc 8. (G) The loadings of the two clusters in A are shown as turquoise and pink bars. The loadings are calculated as the weighted linear combination of both pc axes, pc 1 and 5, based on the median position of each cluster, indicated with crosses in A. An exemplar expansion of these composite loadings is shown. (H) The loadings can also be visualised with a biplot, where the scores (black) and loadings (green, cyan and blue) of pc 8 and 9 are superimposed. Variables increased in the hippocampus (yellow cluster in D) are shown in blue, negative loadings are shown in cyan, and the corresponding m/z values are listed in table 5.2.

Interpretation of the variables important for the clusters in the pca score plots is possible using the median of the selected pixel clusters in the score space region (indicated with crosses). The linear combination of both principal component loadings was calculated, and visualised in the barplot of figure 5.16 G for the clusters indicated in figure 5.16 A. An alternative option is to create a biplot, see figure 5.16 H, which superimposes the scores (black) and loadings (green, cyan and blue).194 The loadings were multiplied by 0.1 to match the range of the scores, and loadings that are in the same region as the hippocampal cluster (yellow in D) are coloured blue. Therefore, the yellow pixels are characterised by higher levels of the blue and lower levels of the cyan loadings; the m/z values are listed in table 5.2. The example shown in figure 5.16 D and E is an excellent illustration of the use of multivariate approaches in the analysis of msi data: the score and image space visualisations identify patterns in the data of biological relevance with accompanying characteristic m/z profiles.

Table 5.2: The selected variables indicated as blue and cyan loadings in figure 5.16 H, representative for the yellow cluster of the hippocampus in figure 5.16 D, are given in terms of m/z value; tentative assignments of some of the isotopes are shown where known.

Blue m/z Assignment Cyan m/z Assignment

264.2 725.4 SM 16:0+Na 363.2 769.4 SM 18:0+K 459.2 784.4 496.2 784.6 PC 34:0+Na 518.2 785.4 599.4 797.4 694.4 828.4 734.6 PC 32:0+H 828.6 PC 38:6+Na 735.6 829.4 745.4 856.4 753.4 SM 18:0+Na 856.6 PC 40:6+Na 754.4 PC 32:1+Na 761.4 776.6 783.4 804.4 804.6 PC 36:4+Na 805.4 805.6 809.6 820.4 PC 36:4+K 832.4 PC 38:4+Na 833.4 833.6

Figure 5.17: The sum of squared residuals of each pixel in the pca model is plotted in the image. Sums of squared residuals for each variable are calculated and plotted on the right hand side.

Finally, one can also evaluate the residuals of the pca model (E, see equation 2.10), by calculating the sum of squares of the distances from the data to the model. An example of the residuals from the mean-centred pca model after calculating nine components is shown in figure 5.17. It is clear that after nine components, unmodelled structural variation is still present, as some structures are highlighted red when the residues are plotted. The choice for nine components is arbitrary, used for illustration purposes; the variance explained for each subsequent component is smaller than 0.4%. The sum of squares for the residuals of each variable was also plotted, and some of the highest residuals are annotated in figure 5.17. It is clear that investigation of these residuals is necessary when an exhaustive pca analysis is performed, as some features might still be latent in the data and can be discovered with this approach.

In document Veracruz; Sueños de Sal Pasiones de Mar y Canto. (página 94-113)