EL PLURALISMO TEORICO
4.1. La proliferación de teorías psicosociales.
the decoded video signal is controlled mainly through the quantization process. This process has to be adapted to the application needs, defined in terms of minimum quality or a target bit rate, depending on the network characteristics. Because the cur- rent video coding paradigm considers both the temporal (prediction) and frequency (DCT) domains, this type of coding architecture is known as hybrid or predictive coding.
Since the end of the 1980s, predictive coding schemes have been the solution adopted in most available video coding standards,notably the ITU-T H.26x and ISO/IEC MPEG-x families of standards. Today this coding paradigm is used in hundreds of millions of video encoders and decoders, especially in MPEG-2 Video set-top boxes and DVDs. In this video coding architecture, the correlation between (temporal) and within (spatial) the video frames is exploited at the encoder,leading to rather complex encoders and much simpler decoders. This scenario does not have much flexibility in terms of codec complexity budget allocation besides making the encoder less com- plex and thus less efficient; decoders are normative and basically much simpler than encoders. This approach fits some application scenarios very well, notably those that have been dominating the video coding developments in the past, such as broadcast- ing and video storage. These applications follow the so-called down-link model, in which a few encoders typically provide coded content for millions of decoders; in
8.1
Introduction
191
this case, the decoder complexity is the critical issue and thus has to be minimized. Moreover,the temporal prediction loop used to compute the residues to transmit,after the motion-compensated prediction of the current frame, requires the decoder to run the same loop in perfect synchronization with the encoder. This means that, when channel errors are present, errors may propagate in time, strongly affecting the video quality, typically until some (Intra) coding refreshment is performed.
The H.264/AVC (Advanced Video Coding) standard is the most efficient video cod- ing standard available [1], typically spending about 50 percent of the rate for the same quality regarding previous video coding standards. It is being adopted for a wide range of applications from mobile videotelephony to HDTV and Blu-ray discs. Besides the technical novelty adopted in this standard (e.g.,in terms of motion estimation and com- pensation, spatial transform, and entropy coding), the compression gains very much rely on additional significant encoder and decoder complexity.To provide coding solu- tions adapted to a wide range of applications, the standard defines a set of profiles and levels [1],which constrain the very flexible coding syntax in appropriate ways to allow successful deployment. Meanwhile, some of the most relevant compression gains are coming from control of the H.264/AVC codec through adequate encoder tools since, as usual, H.264/AVC encoders are not normative, thus leaving space for improvements in rate-distortion (RD) performance while still guaranteeing the same compatibility with much simpler decoders.
With the recent wide deployment of wireless networks,a growing number of appli- cations do not fit well the typical down-link model but rather follow an up-link model in which many senders deliver data to a central receiver. Examples of these appli- cations are wireless digital video cameras, low-power video sensor networks, and surveillance systems. Typically, these emerging applications require light encoding or a flexible distribution of the codec complexity, robustness to packet losses, high com- pression efficiency, and, often, low latency/delay as well. There is also a growing use of multiview video content, which means the data to be delivered regards many (corre- lated) views of the same scene. In many cases, to keep the sensing system simple, the associated cameras cannot communicate among themselves, preventing the usage of a predictive approach to exploit the interview redundancy.
In terms of video coding, these emerging applications are asking for a novel video coding paradigm that is better adapted to the specific characteristics of the new sce- narios. Ideally, these applications would welcome a new video coding paradigm able to address all the requirements above with the same coding efficiency as the best predictive coding schemes available, as well as with an encoder complexity and error robustness similar to the current best Intra coding solutions, which are the simplest and most error-robust solutions.
To address these needs, around 2002, some research groups decided to revisit the video coding problem in light of two information theory results from the 1970s: the Slepian–Wolf theorem [2] and the Wyner–Ziv theorem [3]. These efforts gave birth to what is now known as distributed video coding (DVC), and Wyner–Ziv ( WZ) video coding,as a particular case of distributed video coding.This chapter presents the devel- opments that have been produced in DVC following the early research initiatives.
To achieve this purpose, this chapter is organized as follows: building on previous chapters, Section 8.2 will briefly review the basic concepts and theorems underpin- ning DVC. Next, Section 8.3 will present the first distributed video codecs developed, while Section 8.4 will review some of the most relevant developments following the early DVC codecs. To get a more precise understanding of how a DVC codec works, Section 8.5 will present in detail perhaps the most efficient DVC codec available, the DISCOVER WZ video codec [4, 5]. Section 8.6 will propose a detailed performance evaluation of the DISCOVER WZ codec, which may be used as benchmarking. Finally, Section 8.7 will summarize the past developments and project the future in the DVC arena.
8.2
BASICS ON DISTRIBUTED VIDEO CODING
As mentioned in a previous chapter, the Slepian–Wolf theorem addresses the case where two statistically dependent discrete random sequences, independently and
identically distributed (i.i.d.),X and Y, are independently encoded, and thus not
jointly encoded as in the largely deployed predictive coding solution. The Slepian– Wolf theorem states that the minimum rate to encode the two (correlated) sources is the same as the minimum rate for joint encoding, with an arbitrarily small error probability. This distributed source coding (DSC) paradigm is an important result in the context of the emerging application challenges presented earlier, since it opens the doors to new coding solutions where, at least in theory, separate encoding and joint decoding does not induce any compression efficiency loss when com- pared to the joint encoding and decoding used in the traditional predictive coding paradigm. In theory, the rate bounds for a vanishing error probability considering two sources are
RX⭓H(X|Y)
RY⭓H(Y|X) (8.1)
(RX⫹RY)⭓H(X,Y),
which corresponds to the area identified in Figure8.1. This basically means that the
minimum coding rate is the same as for joint encoding (i.e.,the joint entropy),provided that the individual rates for both sources are higher than the respective conditional entropies.
Slepian–Wolf coding is the term generally used to characterize lossless coding architectures that follow this independent encoding approach. Slepian–Wolf coding is also referred to in the literature as lossless distributed source coding since it con- siders that the two statistically dependent sequences are perfectly reconstructed at a joint decoder (neglecting the arbitrarily small probability of decoding error), thus approaching the lossless case.
8.2
Basics on Distributed Video Coding
193
Independent encoding and decoding: no errors
Distributed encoding and joint decoding: vanishing error probability
RX 1RY 5H(X,Y)