EL CAMINO DE EN MEDIO DE - EL PULGAR DEL PANDA

The “baseline” curve in Figure 5.5 shows the performance obtained by choosing judiciously—depending on the rate—between direct coding ofxand adaptively coding the sparsity pattern and nonzero values. Speciﬁcally, it is the convexiﬁcation of (5.4) with the point(R⫽0,D⫽1). The value ofcused in (5.4) corresponds to scalar quantization of the nonzero entries ofx. Scalar quantization of random measurements is not competitive with this baseline, but it provides improvement over simply coding each element ofxindependently (see (5.5)).

Let us now more closely compare the baseline method against compressive sampling. From (5.4), the distortion with adaptive quantization decreases exponentially with the rateRthrough the multiplicative factor 2⫺2R. This appears in Figure 5.5 as a decrease in distortion of approximately 6 dB per bit. The best one could hope for with compressive sampling is that at a very high rate,M⫽K⫹1⫽5 becomes the optimal choice, and the distortion decays as∼2⫺2(K/M)R⫽2⫺2(4/5)R, or 4.8 dB per bit. This is a multiplicative rate penalty that is large by source coding standards, and it applies only at very high rates; the gap is larger at lower rates.

We see that, in the very high-rate regime (greater than about 15 bits per dimen- sion), compressive sampling with near-optimal decoding achieves an MSE reduction of approximately 4 dB per bit. While better than simple direct quantization,this performance is signiﬁcantly worse than the 6 dB per bit achieved with adaptive quantization. Moreover,this simulation in some sense represents the“best case”for compressive sampling since we are using an exhaustive-search decoder. Any practical decoder such as orthogonal matching pursuit or lasso will do much worse. Also, based on the analysis above, which suggests that compressive sampling incurs a logN penalty, the gap between adaptive quantization and compressive sampling will grow as the dimensions increase.

5.3 INFORMATION THEORY TO THE RESCUE?

We have thus far used information theory to provide context and analysis tools. It has shown us that compressing sparse signals by distributed quantization of random measurements incurs a signiﬁcant penalty. Can information theory also suggest alternatives to compressive sampling? In fact, it does provide techniques that would give much better performance for source coding, but the complexity of decoding algorithms becomes even higher.

Let us return to Figure5.3 and interpret it as a communication problem wherex

is to be reproduced approximately and the number of bits that can be used is limited. We would like to extractsource coding with side informationanddistributed source codingproblems from this setup. This will lead to results much more positive than those developed above.

In developing the baseline quantization method,we discussed how an encoder that knowsV can recoverJ anduKfromxand thus sendJexactly anduKapproximately. Compressive sampling is to apply when the encoder does not know (or want to use) the sparsifying basisV. In this case, an information theorist would say that we

have a problem of lossy source coding ofxwith side informationV available at the decoder—an instance of theWyner–Ziv problem[26]. In contrast to the analogous lossless coding problem [27], the unavailability of the side information at the encoder does in general hurt the best possible performance. Speciﬁcally,letL(D)denote the rate loss (increased rate becauseV is unavailable) to achieve distortionD. Then there are upper bounds toL(D)that depend only on the source alphabet, the way distortion is measured,and the value of the distortion—not on the distribution of the source or side information [28]. For the scenario of interest to us (continuous-valued source and MSE distortion),L(D)⭐0.5 bits for allD. The techniques to achieve this are complicated, but note that theconstant additive rate penaltyis in dramatic contrast to Figure 5.5. Compressive sampling not only allows side informationV to be available only at the decoder, but it also allows the components of the measurement vectory to be encoded separately. The way to interpret this information theoretically is to consider y1,y2, . . . ,yM as distributed sources whose joint distribution depends on side information(V,⌽)available at the decoder. Imposing a constraint of distributed encoding ofy(while allowing joint decoding) generally creates a degradation of the best possible performance. Let us sketch a particular strategy that is not necessarily optimal but exhibits only a small additive rate penalty. This is inspired by [28, 29].

Suppose that each ofM distributed encoders performs scalar quantization of its own yi to yield q(yi). Earlier, this seemed to immediately get us in trouble (recall our interpretation of Theorem 5.1), but now we will do further encoding. The quantized values give us a lossless distributed compression problem with side information (V,⌽)available at the decoder. Using Slepian–Wolf coding, we can achieve a total rate arbitrarily close to H(q(y)). The remaining question is how the rate and distortion relate.

For sake of analysis, let us assume that the encoder and decoder share some ran- domnessZ so that the scalar quantization above can be subtractively dithered (see, e.g., [30]). Then following the analysis in [29, 31], encoding the quantized samples q(y)at rateH(q(y)|V,Z)is within 0.755 bit of the conditional rate-distortion bound for sourcexgiven V. Thus the combination of universal dithered quantization with Slepian–Wolf coding gives a method of distributed coding with only a constant additive rate penalty. These methods inspired by information theory depend on coding across independent signal acquisition instances, and they generally incur large decoding complexity.

Let us ﬁnally interpret the “quantization plus Slepian–Wolf” approach described above when limited to a single instance. Suppose theyis are separately quantized as described above. The main negative result of this article indicates that ideal separate entropy coding of eachq(yi)is not nearly enough to get to good performance. The rate must be reduced by replacing an ordinary entropy code with one that collapses some distinct quantized values to the same index. The hope has to be that in the joint decoding ofq(y), the dependence between components will save the day. This is equivalent to saying that the quantizers in use are not regular [30], much like multiple description quantizers [32]. This approach is developed and simulated in [25].

In document EL PULGAR DEL PANDA (página 169-200)