The LRMSSP has potential applications in signal processing and in particular, in compressive sensing (Cand`es,2006;Donoho,2006). The goal in compressive sensing is to recover a sparse signal w = (w1, . . . , wd)T from a limited set of linear measurements y = (y1, . . . , yn)T, where
n< d. The measurements y are obtained after projecting the signal w onto an n × d measurement matrix X, that is, y = Xw + e, where e = (e1, . . . , en)T∼
N
(0, σ20I) is additive Gaussian noise.Since w is sparse, it is possible to reconstruct this vector accurately from y and X using fewer measurements than the number of degrees of freedom of the signal, which is the limit imposed by the Nyquist sampling theorem to guarantee the reconstruction of general signals. When w is not sparse, we may find a d × d orthonormal matrix B, for example a wavelet basis, such that
˜
w = BTw, where ˜w is sparse or nearly sparse. In this case, the measurement process is performed
after projecting the signal onto the columns of B, that is, y = XBTw + e = X ˜w + e. Once an estimate of ˜w is obtained from y and X, we can approximate w using w = B ˜w. Therefore, even when the signal is not sparse, it may still be possible to reconstruct w with high precision using less than d samples, provided that this vector is compressible in some basis B.
In summary, the reconstruction of a sparse signal from a reduced number of compressive measurements is a linear regression task in which y is the target vector, X is the design matrix and the vector of regression coefficients w (the signal) is assumed to be sparse. Therefore, the EP algorithm for approximate inference in the LRMSSP introduced in this chapter (SS-EP) can be used to address this problem. The performance of SS-EP is evaluated in a series of experiments on the reconstruction of non-uniform and uniform spike signals. These tasks have been used as benchmarks for comparison in the compressive sensing literature (Ji et al.,2008).
4.4.2.1 Non-uniform Spike Signals
In this experiment, 100 signals of length d = 512 are generated by randomly selecting 20 non- zero components in each signal vector. The elements in these positions are then independently sampled from a standard Gaussian distribution. All the other elements in the signal vectors are zero. It is not necessary to determine an appropriate B because the signals are already sparse. The measurements are performed using a matrix X whose rows are sampled uniformly from the unit hypersphere. For the reconstruction of the signals, a total of n = 75 measurements are used.
Table 4.2: Results for each method in the non-uniform spike signal reconstruction problem.
SS-MCMC Laplace-EP RVM SS-EP Error 0.19 ± 0.37 0.82 ± 0.06 0.19 ± 0.36 0.04 ± 0.11 log
P
(y|X) 19.7 ± 11.2 219 ± 25 122 ± 27 Time 798 ± 198 0.12 ± 0.01 0.07 ± 0.02 0.19 ± 0.11Figure 4.4: Signal estimates generated by each reconstruction method on a particular instance of the non-uniform spike signal problem. The original signal (not shown) cannot be visually distinguished with the approximation generated by EP in the spike and slab model.
Noise in the measurement process follows a zero-mean Gaussian distribution with standard deviation 0.005. The signal is approximated by the posterior mean of w. The following methods for computing the posterior are compared: The LRMSSP with Gibbs sampling (SS-MCMC), the LRMSSP with EP (SS-EP), the linear model with Laplace prior and EP (Laplace-EP) and the RVM. The values of the different hyper-parameters are selected optimally. In the LRMSSP, p0= 20/512, vs= 1 and σ0= 0.005. In Laplace-EP, the scale parameter is b0=p10/512. The
variance of the noise in Laplace-EP and RVM is 0.0052. SS-MCMC draws 10,000 samples from
the posterior distribution using Gibbs sampling after a burn-in period with 1000 samples. Given an estimate ˆw of a signal w0, the reconstruction error of ˆw is quantified by || ˆw − w0||2/||w0||2,
where || · ||2represents the Euclidean norm.
Table 4.2 summarizes the results obtained by each method in the experiments with non- uniform spike signals. The rows in this table display the average and the standard deviation of the signal reconstruction error, the logarithm of the model evidence and the time cost for each method. The best reconstruction performance is obtained by the LRMSSP with EP. The differences with respect to the other methods are statistically significant at the level α = 5% according to a paired t test. The resulting p-values are all below 3 · 10−5. The approximation of the model evidence is higher for the spike and slab than for the Laplace prior. Once more, RVM obtains the largest estimate of
P
(y|X). The computational cost of the EP and RVM methods are similar. With the configuration selected, Gibbs sampling is much more costly than the other methods (up to 4000 times slower than EP).Chapter4. Linear Regression Models with Spike and Slab Priors 69
The poor results of Gibbs sampling with respect to EP in this task have their origin in the propensity of the Markov chain to become trapped in sub-optimal modes of the posterior. This is illustrated by the plots in Figure4.4, which show the signal estimates obtained by the different methods in a particular realization of the experiment. The LRMSSP with EP generates a signal reconstruction which is very accurate and cannot be visually distinguished from the original signal. By contrast, the Gibbs sampling approach generates many spikes of small magnitude that were not present in the original signal. The signal reconstruction given by RVM also presents similar problems. The reason for this is that the optimization process carried out by this method often converges to local and sub-optimal maxima of the type-II likelihood. This happens even when an additional greedy optimization process is used to reduce the impact of local and sub- optimal maxima, as in the implementation of RVM given byJi et al.(2008). Finally, the Laplace model has the largest error in this problem, as illustrated the top-right plot in Figure4.4. The Laplace prior produces excessive shrinkage of non-zero coefficients, while the magnitude of the coefficients that should be zero is not sufficiently reduced.
4.4.2.2 Uniform Spike Signals
The uniform spike signals are generated in a similar manner as the non-uniform ones. The only difference is that the non-zero elements of each signal vector are now sampled at random from the set {−1, 1}. The experimental protocol is the same as before. The hyper-parameters of each method are initialized to the same values as in the experiments with non-uniform spike signals. However, we use in this case 100 measurements for the reconstruction of each signal vector because accurate reconstruction of uniform spike signals requires more data.
Table 4.3presents the results of each method. By far, the most accurate reconstruction is provided by LRMSSP with EP for approximate inference. The differences with respect to the other methods are statistically significant at α = 5% according to a paired t test. The p-values obtained are all lower than 2.2 · 10−16. The evidence of the LRMSSP larger than the evidence of the Laplace model. The training times of both EP methods and the RVM approach are similar. Gibbs sampling has a much larger computational cost (85,000 times slower than EP) and obtains the worst performance. Figure4.5shows the signal estimates generated by the different methods in a particular realization of the experiment. The Gibbs sampling approach appears to be trapped in some sub-optimal mode of the posterior distribution. Similarly, RVM has converged to a local maximum of the type-II likelihood, which is sub-optimal. By contrast, the signal reconstruction given by the LRMSSP with EP is very accurate. These results show that EP seems to be less affected than Gibbs sampling by the multimodality of the posterior distribution under the spike and slab prior. This is a surprising result because EP is supposed to have problems when the posterior distribution is multimodal (Bishop,2006). Finally, the behavior of the Laplace method is similar as in the non-uniform case. In this approach, the reduction of the magnitude of the different coefficients is uniform. Consequently, the shrinkage of the coefficients that should be zero is insufficient and the shrinkage of the coefficients that should be non-zero is too large.
In this case the performances of both Gibbs sampling and RVM are markedly worse than in the experiments with non-uniform spike signals. The reason for this is that, with uniform spike signals, it is more difficult to avoid sub-optimal maxima of the type-II likelihood or sub-optimal modes of the posterior distribution. In particular, the starting point of the Markov chain used
Table 4.3: Results for each method in the uniform spike signal reconstruction problem.
SS-MCMC Laplace-EP RVM SS-EP Error 1.03 ± 0.61 0.84 ± 0.03 0.66 ± 0.54 0.01 ± 0.01 log
P
(y|X) 27.8 ± 5.3 248 ± 56 215 ± 5.9 Time 1783 ± 533 0.17 ± 0.02 0.12 ± 0.04 0.2 ± 0.03Figure 4.5: Signal estimates generated by each reconstruction method on a particular instance of the uniform spike signal problem. The original signal (not shown) cannot be visually distin- guished from the approximation generated by EP in the spike and slab model.
by Gibbs sampling has to be a good initial solution. This solution is determined using a greedy procedure that is described in AppendixC.1. In RVM, the maximization of the type-II likelihood is also implemented using a similar greedy strategy (see the Matlab given byJi et al.(2008)). When the signal consists of non-uniform spikes, these greedy strategies are very successful in identifying the signal elements which are truly different from zero. However, with uniform spike signals, these greedy processes make more mistakes. Consequently, in this latter case, RVM and SS-MCMC are more likely to get trapped into sub-optimal maxima of the type-II likelihood or sub-optimal modes of the posterior distribution, respectively.