5. MANUAL DE OPERACIÓN DEL EQUIPO PRUEBA DE
5.9 CRITERIOS DE ACEPTACIÓN
The most important advantage of jres nmr spectroscopy over 1D and cpmg nmr spectroscopy is the reduction in peak overlap, which aids biomarker identification and data interpretation in ‘congested’ spectral areas. The usefulness of full-resolution jres projections was evaluated, and ways to realise the potential of the technique for metabonomic studies have been specified, partic- ularly with respect to processing and spectral alignment of the data. The chosen galactosamine toxicity data set is particularly suitable, as many previously identified differentiating biomarkers of galactosamine toxicity represent complex multiplets and thus a considerable effect resulting from the collapse of signals in the jres spectral projections was observed. Due to the inherently non-quantitative nature of the standard jres spectrum, there is no need to use the stoichiometry- conserving sum projection, and thus the skyline projection is often preferred, as this gives an increased s/n and is still highly reproducible. Correlations of interest retrieved with stocsy are conserved or improved compared to conventional one-dimensional experiments for both projection methods, whilst regression slopes obtained using storsy are unreliable in the current experimental system. The effective peak span widths decrease upon projection compared to highly split multi- plets, and thus pattern recognition and correlation methods require alignment of the full-resolution jres spectral projections. The subsequently reduced peak overlap in projected jres spectra can improve the quality and interpretation of multivariate models and statistical correlation analyses, and will enhance biomarker identification and reduce the differential degree of over-representation of molecules. Hence, judicious acquisition of jres data and application of alignment can improve the interpretation of pattern recognition models and increase the information content extractable from nmr-based metabonomic studies, resulting in enhanced biomarker identification.
Chapter 4
Simulated annealing optimised
K-OPLS
in metabonomics
4.1
Aims and objectives
In the previous chapter, the choice of nmr experiment, peak alignment and subsequent use of linear multivariate projection methods, such as principal component analysis and orthogonal partial least squares (opls), which find use in modelling of spectroscopic data, were discussed. However, when the relationship between the descriptor variables and the response is non-linear, conventional linear prediction models will perform sub-optimally. In this chapter, the focus is shifted to evaluate the mathematical non-linear predictive modelling of spectroscopic data with kernel-based orthogonal partial least squares (k-opls), and its characteristics are illustrated with three separate metabon- omic data sets: a study on the liver toxin galactosamine, a study of the nephrotoxin mercuric chloride and a study of Trypanosoma brucei brucei parasite infection. This work, for which the accepted publication is included in appendix D, was done by means of the following steps:
1. Demonstrating that the optimisation of the parameter σ of the Gaussian kernel transforma- tion performed in k-opls can be done effectively and in a user-friendly manner by means of simulated annealing;
2. Implementation of the simulated annealing optimisation and calculation of prediction per- formance metrics (auc and q2
y) using a nested cross-validation approach;
3. Comparing the predictive ability of non-linear k-opls and its linear equivalent opls; 4. Evaluating model interpretation enabled by the separate modelling of predictive and orthog-
onal variation and accompanying score plots;
5. Development of methods to approximate and visualise which variables play a main role in the non-linear model based on ‘pseudo-samples’.
4.2
Introduction
Multivariate projection methods such as partial least squares (pls)87 and the related orthogonal pls (opls, see §2.3.5)93 are frequently applied for modelling of spectroscopic biological data, as they provide predictive and interpretable models.23, 55, 64 The opls algorithm enables separate modelling of Y-predictive (response-related) and systematic Y-orthogonal (response-orthogonal) variation in data, ‘structured noise’.92–95, 160
Thus, opls is beneficial in terms of model interpre- tation compared to pls, and has been successfully employed in metabonomics.56 The concept of Y-orthogonal variation can be understood as systematic effects that are needed to characterise the system but are unrelated to the question at hand, i.e. the model predictions. For instance, when aiming to classify a group of responders versus non-responders to a particular treatment, the structured noise could be composed of inter-sample differences that are needed to describe variability of the system but are not useful for separating responders from non-responders.
Most of the commonly used multivariate prediction models in metabonomics assume a linear relationship between the X (descriptor) and Y (response) variables. However, many biological systems display non-linear characteristics in response to a perturbation. Under such conditions, non-linear methods are expected to provide improved models, which is particularly important in predictive applications such as disease diagnostics,161assessment of toxicity142and characterisation of variable response of individuals to drugs in personalised healthcare.33, 34
A particular class of non-linear models are kernel-based models,162with an early chemometric application in the form of radial basis functions–pls.163, 164 Other examples of kernel-based mod- els include support vector machines (svm),165–167
kernel-based partial least squares (kpls168, 169) and kernel-based least squares regression.170
Kernel-opls (k-opls) is the non-linear extension of the opls model, a commonly used multivariate model in metabonomic studies.55, 56 In con- trast to separate (linear) orthogonal signal correction (osc) followed by kpls modelling,171 or kernel-osc followed by kpls modelling,172
k-opls provides an integrated orthogonal signal cor- rection property that allows for separate modelling of predictive and Y-orthogonal variation in the feature space, removing drawbacks associated with multi-step solutions, such as separate (k)osc and (k)pls steps.93
Although the k-opls method does not necessarily provide improved pre- diction performance compared to other kernel-based models,162
k-opls facilitates an improved model interpretation compared to alternative models, which can aid quality control and further understanding of the model and data.
Kernel-based models require an optimisation of the kernel-function parameter, which may be challenging for the non-specialist, as the kernel parameter is often a continuous parameter with an undefined upper limit that may have multiple local optima. The optimisation step is essential to produce a model with a good predictive performance.171, 173, 174 Here, an automated procedure is implemented for optimisation of the kernel parameter based on simulated annealing (sa), a stochastic optimisation method,175, 176
Bylesj¨o et al.57
This optimisation has been incorporated into the freely available k-opls software package for both r and matlab (http://sourceforge.net/projects/kopls/).
The main objective in this study is to evaluate if non-linear prediction models provide an advantage compared to linear alternatives in two common application areas of metabonomics: toxicology and disease diagnostics. Using the non-linear sa-k-opls method, the possibility for improved prediction performance in comparison to the linear opls model is demonstrated for three separate spectral nmr metabonomic data sets. In particular, the focus is on problems where prediction is of paramount importance, and it is also shown how structured Y-orthogonal variation can be interpreted to gain further insight into the data. To further increase model transparency, the influence of variation of each variable in the k-opls model is approximated.