• No se han encontrado resultados

2.1 ELABORACIÓN DE ACEITES Y GRASAS DERIVADAS

2.1.1 REFINACIÓN

The general goal of this thesis is to develop techniques to move beyond the aforementioned assumptions and propose a set of advanced machine learning tools for robust analysis of neuroimaging data. This goal is divided into the three following aims that are detailed below.

Aim 1: Inference in the presence of population heterogeneity

Brain disorders often exhibit a heterogeneous clinical presentation: autism spectrum disor- der (ASD) encompasses neurodevelopmental disabilities characterized by deficits in social communication and repetitive behaviors [57]; schizophrenia can be subdivided into dis-

tinct groups by separating its symptomatology to discrete symptom domains [18]; Alzheimer’s disease (AD) can be separated into three subtypes on the basis of the distribution of neu- rofibrillary tangles [112]; and mild cognitive impairment (MCI) may be further classified based on the type of specific cognitive impairment [157].

Disentangling disease heterogeneity may greatly contribute to our understanding of disease mechanisms and lead to more accurate diagnosis and prognosis, as well as targeted treatment. However, most commonly used neuroimaging analysis approaches assume a single unifying pathophysiological process governing the presence of disease and perform a monistic analysis to identify it. Such approaches typically aim to either identify voxels that characterize group differences through mass-univariate statistical techniques [3] or use MVPA to identify the multivariate imaging pattern that best discriminates between two populations [153]. Thus, the heterogeneity of the disease is completely ignored, which results in deriving imaging patterns that are at best incomplete, and at worst misleading.

Recognizing this limitation, few research efforts have focused on revealing the inherent disease heterogeneity. These methods can be mainly classified into two groups. The first class assumes an a priori subdivision of the diseased samples into coherent groups, based on independent criteria, and opts to identify group-level anatomical differences using uni- variate statistical methods [87, 156]. Thus, multivariate effects are ignored, while the a priori definition of disease subtypes is either difficult to obtain (e.g., from autopsy near the date of imaging), or noisy and non-specific (e.g., cognitive or clinical evaluations). The second class focuses on the diseased population and maps it to distinct anatomical sub- types by applying multivariate unsupervised clustering driven by considering all image elements [157, 118]. These methods tend to group patients along the direction of largest

variability, which may be confounded by effects such as age and sex, and thus may not be induced by pathology.

To tackle these challenges, the second aim of the thesis is to develop a method for detecting and characterizing heterogeneity through the data-driven identification of dis- ease subgroups.

Aim 2: Inference through optimal spatial filtering

Group analysis studies how distinct clinically-defined groups of individuals differ in brain anatomy and function, aiming to understand the pathophysiological processes that steer these differences. Towards this goal, mass-univariate [3] as well as MVPA techniques [89, 55] have been developed to summarize and understand imaging patterns reflecting a clinical change.

Mass-univariate techniques, such as VBM, have been widely used for neuroimaging analysis. However, mass-univariate techniques ignore multivariate relations in the data, while also suffering from multiple comparison problems. Critically, local smoothing is typically applied to reduce voxel-wise noise, account for errors in spatial alignment of images and Gaussianize the data before performing statistical analyses. However, this smoothing is seldom adapted to the anatomical structures of the brain and may obscure the effects of interest. A narrow blurring kernel cannot effectively account for noise in the data, thus reducing the statistical power. Contrarily, a wide blurring kernel diffuses signal, potentially leading to false conclusions about the real loci of the effect. Additionally, it may introduce signal from regions that have no group difference, thus reducing sensitivity in detecting group differences.

MVPA methods characterize group differences by harnessing multivariate relation- ships in the data. They can be distinguished into two classes according to whether they perform local or global learning. Local learning techniques, such as Searchlight [89], ana- lyze the information content of local neighborhoods, while global learning methods, such as SVM [55], perform inference by modeling signal relationships across the entire brain. Local techniques are computationally expensive, while they may also lead to serious in- terpretation errors [44]. Global techniques, by construction, select regions sufficient for discrimination and may not fully reflect the group difference [68].

Toward addressing the above limitations of univariate and multivariate techniques, the last aim of this thesis is to develop a method for performing statistical inference through optimal spatial filtering of data and regional discriminative analysis.

Aim 3: Inference in the presence of confounds

Univariate statistical methods, such as general linear models, effectively account for con- founds by explicitly parametrizing them in the model. However, there is no clear consen- sus on how to reduce confounding effects within MVPA predictive settings. Confounding effects are an important problem in MVPA prediction methods as powerful machine learn- ing methods may learn the covariate structure rather than group effects, which may lead to overfitting and failure to generalize.

Prior approaches have either 1) ignored confounds, or have taken them into account either 2) implicitly, or 3) explicitly [125]. The first approach is to ignore the confounds and proceed with the predictive learning task using the imaging features. The second approach is to implicitly account for the confounds by correcting a posteriori the learned

model using the underlying covariate structure [68]. Lastly, confounds can be adjusted for explicitly. Weighting schemes [136, 139, 101] and residualization approaches [40] explicitly account for confounds prior to the learning model. The limitation of these approaches is that they either compromise generalization for interpretability, or interpretability for generalization. Furthermore, they seldom allow for immediate statistical inference due to lack of insight into the probability distribution of the model parameters.

Toward addressing the above limitations, the first aim of the thesis is to develop a framework for performing multivariate statistical inference and pattern analysis that is robust to confounding variations.

Documento similar