Ron Wehrens, Arjan W. Simonetti and Lutgarde M. C. Buydens
Laboratory for Analytical Chemistry, University of Nijmegen, Toernooiveld 1, 6525 ED Nijmegen, The Netherlands
Abstract
In clinical decision making, (semi-)automatic unsupervised classification of data for diagnostic purposes is becoming more and more important. The paper describes the application of mixture modelling, a clustering where multivariate gaussians are used to describe clusters in the data, to in vivo nuclear magnetic resonance data of patients with brain tumors. Images as well as localized spectra are analyzed. The method is able to automatically generate meaningful classifications. Moreover, the results of clustering both the image and spectral data are in close agreement.
Keywords: model-based clustering, image segmentation, MRI, 1H-MRSI, brain tumor, classification
Introduction
Nuclear magnetic resonance imaging (MRI) has become one of the most important non-invasive diagnostic aids in clinical decision making, mostly because of the good visibility of soft tissue structures. In the diagnosis and treatment of brain tumors, this is of prime importance since it enables the expert to assess location and size of brain tumors. Single images, however, do not always show the desired information and therefore multiple, complementary images are routinely recorded. Although increasing the amount of information, this may hamper diagnosis since radiologists have to investigate multiple images. Therefore, (semi-)automatic methods extracting the relevant features of the images can be of great benefit. Several methods for distinguishing structures in the brain (segmentation) and labeling of these structures have been described in literature1-3.
Mixture modelling of medical magnetic resonance data
65 Mixture modelling is an approach to clustering where the data are
described as mixtures of distributions, usually multivariate normal distributions5, 6. Each gaussian can be considered as one cluster, or a cluster with a non-normal shape can be described by two or more gaussians. Several advantages to model-based clustering over other, more common forms of clustering can be identified:
The clustering has a statistical basis, which allows for inference. It is, e.g., possible to derive uncertainty estimates for individual classifications, as well as for the clustering as a whole.
Several criteria can be used to assess the optimal number of clusters, a direct consequence of the statistical model used to describe the data. This is a large advantage compared to, e.g., hierarchical clustering methods, where a cut-off value must be chosen by the user. In most cases, no clear criteria exist for such a choice.
The clustering method can be selected according to the same criteria used for the choice of the number of clusters. As is the case in hierarchical clustering, several closely related clustering methods exist, and the one that fits the data best can be distinguished in an objective way.
Noisy objects can be explicitly incorporated in the clustering procedure; these objects are then treated as one separate, widely spread cluster polluting the neat grouping of the other objects. In the current application, this feature is not used.
Visualization of the cluster shapes is possible in the space of the original variables. In some cases, this allows for an easier interpretation of the results; e.g., in the current application, neurologists can use domain knowledge to label the different clusters.
Mixture modelling has been applied to a wide variety of clustering problems (for an overview, see5), including image segmentation of MRI brain images of healthy volunteers7. In this paper, mixture modelling is applied to two types of data sets obtained from five patients with a brain tumor. The first type consists of MRI images of the brains of the patients. The second type of data consists of levels of a small set of metabolites, quantitated in 1H-MRSI data. Although the tumors in many cases can be recognized from the raw data, either the MRI images, or the spectra recorded in the separate voxels, an automatic unsupervised segmentation is very useful to the physician. The individual clusters can be investigated further, e.g. by labeling them as healthy
tissue (e.g. gray matter, white matter), or unhealthy tissue (e.g. tumor, necrotic tissue), or labeling according to the type of tumor. At this stage, supervised methods or expert knowledge may be used. The image segmentation provides an objective definition of the size and location of the clusters, taking into account all data simultaneously. The latter property is important because MRI images can be misleading, especially when only one or two images at the same time are considered: sometimes a tumor seems to be much larger than it actually is, and sometimes it does not show up at all. It is imperative that all data are taken into account, in order to minimize the number of false negatives and false positives.
Mixture modelling is applied to the MRI images of the patients and the levels of the five metabolites. It is shown that tumors can be located in the MRI images as well as in the quantified-metabolites case without human interference. The results for the two types of data show very good agreement. The clustering method is flexible enough to allow for easy assessment of other choices of clustering method or number of clusters.
In the next section, we will briefly review mixture modelling in the context of cluster analysis. Next, the data are presented, followed by the results of clustering both the MRI images and the voxels with the metabolite levels. Although no “true” values for the patients are known (these can only be obtained by an autopsy), the agreement between the clusterings of the two types of data sets indicates that tumors can be located in this way.