In Chapter 4, a likelihood function was designed for use in deformable model based image segmentation. Fenster & Kender [FK01] made two key observations about the requirements of such likelihood functions:
1. For optimization to succeed, the function must be optimal for the correct segmentation. Contrary to intuition, the distribution of a function’s values for correct segmentations gives no information about its goodness for segmentation. Incorrect segmentations must have different values, i.e., the function must be specific.
2. If a gradient-descent type optimization is performed, the function must meet the more stringent condition that it become closer to optimal as a shape gets closer to correct. Many likelihood functions do not meet this second requirement, so they suffer from local minima or capture range issues. Additionally, many likelihood functions are only evaluated by the quality of their segmentation results. However, the cause of a poor segmentation result is difficult to determine. It could be due to one of the issues above or to other portions of the segmentation pipeline such as the shape model, initialization, shape prior, or optimization. Fenster & Kender recognize the inadequacy of relying on segmentation results to evaluate the likelihood function, so they introduce a different evaluation metric based on the likelihood function’s behavior as a segmentation gets further from correct [FK01].
In this section I expand upon this idea. First, a more principled evaluation metric is proposed when a shape prior is available. This metric is based on a definition of the ideal likelihood function for a given shape prior. Second, a training strategy is proposed to learn likelihood functions that are closer to ideal. Current learned likelihood functions are trained only on correct segmentations, which was argued in Section 4.2.2 to be inadequate.
Evaluating the Quality of a Likelihood Function
Whereas Section 4.2 argued for equally penalized expected variations of correct segmenta- tions, I believe that the ideal objective function used for segmentation will have a shape prior and image likelihood that equally penalizeexpected deviations from correct segmentations, as well. Also, to meet the two requirements above, both components must define a smooth func- tion such as a multivariate Gaussian distribution in the space of expected deviations. Here I focus on defining a likelihood function given a fixed shape prior.
The quality of a likelihood function can be defined for a given training image and fit shape model as follows. The shape prior is typically defined to be a multivariate Gaussian distribution on the parameters of the shape model. Further, since the prior is used to deform the object during segmentation, this prior centered on the correct segmentation defines the expected deviations. The ideal likelihood function would define the same Gaussian distribution in the shape space as this recentered shape prior. Equivalently, in the parameter space of the shape prior, the ideal likelihood function would be a centered, unit multivariate Gaussian.
To evaluate how close a likelihood function is to this ideal for the given image and fit shape model, the expected deviations from the correct segmentation can be simulated by sampling the recentered shape prior. Then, the likelihood function can be computed for each shape sample. To evaluate the quality of the likelihood function, a dissimilarity measure must be defined that measures how close these sampled likelihood function values are to the centered, unit multivariate Gaussian. For each sampled value the ideal value is known. Therefore, I propose using the sum of squared differences as the penalty measure.
This approach is similar to that taken by Fenster & Kender, who defined the correlation between the sampled values and a 1D variable that described the degree of deformation. The
proposed approach simply leverages the shape prior to do a more principled sampling and to define a more accurate penalty.
An additional idea to explore is the notion of the scale of the expected deformations. As segmentation proceeds, the current segmentation should get closer to the correct segmenta- tion. A multi-scale shape prior somewhat captures this notion; the likelihood function could be estimated at each of these fixed scales. Alternatively, a scale parameter 0≤α≤1 could be defined that scales the expected deformations generated by the prior. The likelihood function could be evaluated at different values ofα. Anαof 0 corresponds to only expecting segmenta- tion deformations that are on the scale of the fitting error of the shape model during training, an idea discussed in Section 4.2.2.
A straightforward use of this evaluation framework is for parameter selection. The pa- rameters of the appearance model and the likelihood function could be tuned to make the likelihood function closest to ideal. Such a principled approach to parameter tuning makes reasonable the introduction of appearance models with many more parameters. For example, a multi-scale appearance model could be defined, where for each scale of the shape prior an appearance model is found that is closest to ideal. Such a multi-scale appearance model could explore parameters such as the degree of Gaussian smoothing of the image and region size.
Learning a Closer to Ideal Likelihood Function
The notion of expected segmentation variability can also be incorporated into the train- ing of the likelihood function. Since the likelihood function is evaluated by its performance on such deformations, it makes sense to train on them. The current likelihood function esti- mates pcorr(acorr), the likelihood of a correct segmentation. I propose additionally modeling
pdef,α(∆adef), the changes in the appearance model due to expected segmentation deforma- tions at scaleα. This formulation assumes ∆adef is i.i.d. for differentacorr, which allowspdef,α to be trained by pooling estimates across the training images. During segmentation, the like- lihood of a segmentation with appearance a is now p(a) = minacorrpcorr(acorr)pdef,α(∆adef)
subject toa=acorr + ∆adef. I additionally assume thatpdef,α is Gaussian distributed in the space of the appearance parameters. This assumption combined with the i.i.d. assumption
above allows a simple closed form solution to this equation. Let pcorr ∼ N(µcorr,Σcorr) and
pdef,α∼ N(0,Σdef,α). If Σcorr and Σdef,α are estimated using the same principal directions, p can be simply expressed asp∼ N(µcorr,Σcorr+ Σdef,α).
One possible issue with this formulation that needs to be examined is the appropriateness of the QF based representations in this context. The variation measured bypdef,α should pri- marily be mixture changes in the amount of each tissue in the object-relative region. Therefore, one of the appearance models proposed in Sections 5.2.3 or 5.2.4 may be more appropriate in this context.
This likelihood function more accurately estimates the spatial accuracy of a given segmen- tation. This could be useful to automatically signal failures by defining a likelihood function at the scale of acceptable segmentations [LBR+07]. Another possible use of this likelihood function is that it could be used to guide the optimizer, using an approach similar to [CET98]. Jingdan Zhang (a Ph.D. student at UNC) has explored an idea similar to the ones presented here to learn an ideal likelihood function in a kernel framework directly from the images.
Appendix A
Users Guide
This appendix presents a guide to the basic algorithms developed in this dissertation for the computation and display of quantile functions, for converting between QFs and PDFs, and for representing conditional distributions using QFs. The guide concludes with an example that uses some of these functions to generate Figure 2.4(c) on page 19. All code is given in MATLAB.
A.1
QF Computation
This section provides three functions for computing the discrete quantile function repre- sentation from samples. These algorithms were mentioned in Section 2.1.2 on page 14. The first function is used to quickly compute a quantile function from samples when many samples are available. This algorithm was used in Chapter 3 by both the MR8 and PCA-MRF texture models. The second function is slower, more accurate, and also allows weighted samples. The third function assumes the samples are from a discrete distribution that takes on only integer values. This is leveraged to avoid sorting by instead computing a fine histogram. This third function was used in the appearance model described in Section 4.2.
function qfs = getQFs(features, numBins); % Input
% features: a numFeatures by numSamples matrix
% numBins: the number of QF bins to use per feature
% Output
% qfs: a numFeatures by numBins matrix that is
% a discrete QF for each feature
% Approach
% 1. Compute an integer number of samples per QF bin
% 2. Sort the samples for each feature
% 3. Average adjacent samples to compute each bin value
[numFeatures, numSamples] = size(features); ind = randperm(numSamples);
numSamplesPerBin = floor(numSamples/numBins); numSamples = numSamplesPerBin * numBins;
qfs = reshape(mean(reshape(sort(features(:,ind(1:numSamples))’), ... [numSamplesPerBin numBins numFeatures]... )), [numBins, numFeatures])’; end
function qfs = getQFsFromWeightedSamples(features, weights, numBins); % Input
% features: a numFeatures by numSamples matrix of samples
% weights: a 1 by numSamples vector that gives the weight, or
% contribution, of each sample to the distribution
% numBins: the number of QF bins to use per feature
% Output
% qfs: a numFeatures by numBins matrix that is
% a discrete QF for each feature
% Approach
% 1. For each feature sort the samples
% 2. Linearly go through the samples to find the QF bin
% boundaries, which generally split a sample into two.
% 3. Sum the samples in each bin as you go through the
% samples so that their average can be computed
[numFeatures, numSamples] = size(features); qfs = zeros(numFeatures, numBins);
for f = 1:numFeatures,
[orderedFeatures indices] = sort(features(f,:)); orderedWeights = weights(indices);
totalWeight = sum(orderedWeights);
wpb = totalWeight / numBins; %weight per bin qf = zeros(1, numBins); currentBinWeight = 0; currentBin = 1; i = 1; while(i <= numSamples) if(orderedWeights(i)+currentBinWeight <= wpb) %all of sample is in bin
qf(currentBin) = qf(currentBin) + orderedWeights(i)... * orderedFeatures(i);
i = i + 1; else
%part of sample is in bin
partial = wpb - currentBinWeight;
qf(currentBin) = qf(currentBin) + partial * orderedFeatures(i); orderedWeights(i) = orderedWeights(i) - partial;
currentBinWeight = 0; currentBin = currentBin + 1; if(currentBin == numBins + 1) break; end end end qf = qf / wpb; qfs(f, :) = qf; end end
function qfs = getQFsFromDiscreteDiscribution(features, weights, numBins) % Input
% features: a numFeatures by numSamples matrix of samples from
% a distribution with integer values
% weights: a 1 by numSamples vector that gives the weight, or
% contribution, of each sample to the distribution
% numBins: the number of QF bins to use per feature
% Output
% qfs: a numFeatures by numBins matrix that is
% a discrete QF for each feature
% Approach
% 1. For each feature compute a histogram with a bin for
% every possible discrete value
% 2. Use the bin locations and frequencies as weighted
% samples for input into getQFsFromWeightedSamples()
[numFeatures, numSamples] = size(features); qfs = zeros(numFeatures, numBins);
for f = 1:numFeatures, samples = features(f,:); %Compute histogram
binCenters = min(samples):max(samples); frequencies = zeros(1, size(binCenters, 2)); for i = 1:numSamples,
index = samples(i)-binCenters(1)+1;
end
%Compute QF
qfs(f,:) = getQFsFromWeightedSamples(binCenters, frequencies, numBins); end
end