There exists greater amount of statistical redundancies in video data com-pared to still images. In addition to spatial redundancy, it also has temporal redundancy in consecutive frames, and compression algorithms aim at discard-ing these redundancies, leaddiscard-ing to reduction in the size of the video. Usually,
video compression is lossy in nature as it deals with enormous amount of data.
The block diagram of a typical video encoder is shown in Figure 1.13.
ME DCT Q EC
IDCT Q-1
Input Frame Sequence
Compressed Video Stream
Figure 1.13: Block diagram of a typical video encoder.
In a video compression algorithm, each input video frame is compressed individually. First, they are mapped from the RGB color-space to the YCbCr color-space. Like the baseline lossy JPEG compression algorithm, usually the luminance component, Y , of the frame is sampled at the original picture res-olution, while the chrominance components, Cb and Cr, are downsampled by two in both the horizontal and vertical directions to yield a 4:2:0 subsampled picture format. Input frames are further subdivided into macroblocks (MBs) of typical size of 16×16 pixels. Hence, each YCbCr macroblock consists of four luminance blocks of dimensions 8×8 pixels, followed by one 8×8 Cb block and one 8 × 8 Cr block. As shown in Figure 1.13, an input video frame first under-goes motion estimation (ME) with respect to one or more reference frames.
Some frames (or some macroblocks of a frame) are independently encoded, and the ME block is skipped for them. The motion estimated frame is then transformed from spatial to frequency domain using an 8 × 8 block employing DCT. The transformed coefficients are then quantized (Q) and entropy coded (EC) using variable length codes (VLC) to generate the output bitstream.
Again, on the encoder side, the quantized coefficients are dequantized (Q−1) and inverse transformed (IDCT) to obtain the reconstructed frame. The re-constructed frame is then subsequently used to estimate the motion in the next input frame.
Different video compression algorithms have been developed for different types of applications in keeping with various targets set by them. For exam-ple, H.26x standards are targeted at video teleconferencing, while the Motion JPEG (MJPEG) is convenient for video editors as it has easy access to in-dividual frames. However, the latter technique works with a low compression ratio. There are different standards suggested by the MPEG, starting from MPEG-1, which is designed for low bitrate (e.g., 1.5 Mbps) audio and video playback applications, and then MPEG-2 for higher bitrates for high-quality playback applications with full-sized images (e.g., CCIR 601 with studio qual-ity at 4–10 Mbps). Later, MPEG-4 video standard was advanced for both low
Seq . Seq . Seq . ... Seq .
Seq . SC Video Params Bitstream
Params QTs Misc GOP ... GOP
GOP SC
Time
Code GOP Params Picture ... Picture
Pict . SC Type Buffer Params
Encode
Params Slice ... Slice
Slice SC Vert. Pos . Q-Scale MB MB ... MB
Addr . Incr. Type MV Q-Scale CBP BL-1 ... BL-6
Figure 1.14: Video stream data hierarchy.
data rate and high data rate applications. Under the context of compressed do-main processing, let us review three such video compression standards, namely MPEG-2, MPEG-4, and H.264 for their extensive use in different applications.
1.5.1 MPEG-2
The MPEG-2 [135] is one of the most popular video compression techniques.
The structure of the encoded video stream and its encoding methodology are briefly described below.
1.5.1.1 Encoding Structure
Considering a video as a sequence of frames (or pictures), in MPEG-2 video stream, the video data is organized in a hierarchical fashion as shown in Fig-ure 1.14. The video stream consists of five layers : GOP, pictFig-ures, slices, mac-roblock, and block, as discussed in the following text (seeFigure 1.15).
--- ---GOP
Video Sequence
Picture
Slice
Macroblock
Block
Figure 1.15: Video sequence in an MPEG stream.
1. Video Sequence: It begins with a sequence header, includes one or more groups of pictures, and ends with an end-of-sequence code.
2. Group of Pictures (GOP): This consists of a header and a series of one or more pictures intended to allow random access into the sequence.
3. Picture: A picture is an individual image or frame, the primary coding unit of a video sequence. A frame of a colored video has three compo-nents: a luminance (Y ) and two chrominance (Cb and Cr ) components.
The Cb and Cr components are one half the size of the Y in horizontal and vertical directions.
4. Slice: One or more contiguous macroblocks define a slice. In a slice, macroblocks are ordered from left to right and top to bottom. Slices are used for handling of errors. If an error is reported for a slice, a decoder may skip the erroneous part and goes to the start of the next slice.
5. Macroblock: Like still image coding, the basic coding unit in the MPEG algorithm is also a macroblock. It is a 16 × 16 pixel segment in a frame. If each chrominance component has one-half the vertical and horizontal resolution of the luminance component, a macroblock consists of four Y, one Cr, and one Cb block (as shown inFigure 1.16).
6. Block: A block is the smallest coding unit in the MPEG-2 algorithm. It is of 8 × 8 pixels, and it can be one of three types: luminance (Y ), com-plementary red chrominance (Cr ), or comcom-plementary blue chrominance (Cb).
1.5.1.2 Frame Types
There are three types of pictures or frames in the MPEG standard, depending upon their role in the encoding process.
1 2
3 4
5 6
}
Y
C b C r
Figure 1.16: Structure of a macroblock.
I B B P B B P
Figure 1.17: Prediction of pictures in a GOP sequence.
1. Intra-frames (I-Frames) 2. Predicted frames (P-Frames) 3. Bidirectional frames (B-Frames)
A sequence of these three types of pictures forms a group of pictures whose starting frame must be of an ‘I-frame.’ A typical GOP structure is shown in Figure 1.17.
1. Intra-frames: Intra-frames, or I-frames, are coded independently using only information present in the picture itself. It enables one to synchro-nize a video with potential random access points within its compressed stream. Encoding of I-frames is the same as still-image encoding. In MPEG-2, the same JPEG baseline lossy compression algorithm is used for coding these frames.
2. Predicted frames: A predicted frame, or P-frame, is predicted from its nearest previous I- or P-frame. The prediction in this case is guided by motion compensation, leading to higher compression of these frames.
3. Bidirectional frames: Bidirectional frames, or B-frames, are frames that use both a past and a future frame as references. Hence, in this case, the frame is encoded with bidirectional prediction. Expectedly, B-frames have higher compression than P-B-frames. However, it takes more computation during prediction.
1.5.1.3 Method of Encoding Pictures
1. Intra-frame: As mentioned earlier, I-frames are encoded following the same baseline JPEG lossy compression algorithm as discussed earlier.
In this case also, a color component of an I-frame is partitioned into a set of 8 × 8 nonoverlapping blocks, and each partition is subjected to operations such as level shifting, forward DCT, quantization, and entropy encoding of DC and AC coefficients, which are organized in a zigzag order to produce long runs of zero, encoded subsequently with a variable length Huffman code (seeFigure 1.8).
2. P-frame: A P-frame is coded with reference to a previous image (ref-erence image) that is either an I- or a P-frame as shown in Figure 1.18.
In this case, the frame is partitioned into a set of 16 × 16 blocks (called macroblocks under this context), and each of them is predicted from a macroblock of reference frame (of the same size). The respective refer-ence macroblock is obtained by searching in the referrefer-ence image around the neighborhood of its same position. The offset position of the ref-erence macroblock, which provides minimum error (usually computed in the form of the sum of absolute differences (SAD)), is stored in the compressed stream, and it is known as the motion vector of the cor-responding macroblock of the P-frame. Finally, the difference between the values of the encoded macroblock and reference macroblock (i.e., the prediction error) is computed, and they are encoded by subsequent application of forward DCT, quantization, run-length encoding of zigzag sequence of AC coefficients, and Huffman encoding of respective runs and DC off-sets.
3. B-frame: A B-frame is bidirectionally predicted from a past and a fu-ture reference frame. In this case, motion vectors can be from either the previous reference frame, or from the next picture, or from both. This process of encoding is shown inFigure 1.19. Consider a B-frame B, pre-dicted from two reference frames R1 and R2. Let R1 be the past I- or P-frame, and R2be the future I- or P- frame. For each macroblock mB
of B, the closest match m1 in R1 and m2in R2 are computed. In that case, the prediction of mb, ˆmb, is obtained as follows.
Current Frame Past Frame