Processing input data in blocks is quite common in several applications of image and signal processing. In particular, block processing is quite common in compression of audio, images, and videos. In this type of representation, the advantages lie in keeping the buffer requirement low, introducing inherent parallelism, and opening up opportunity for designing pipelined architecture for the computation. Hence, in this section, we review the approaches for filtering sequences partitioned into blocks of a fixed size (say M ).
Let us consider filtering of an input sequence x(n) with a finite impulse re-sponse h(n). Let the input sequence x(n), 0 ≤ n ≤ N −1 be partitioned intoMN subsequences of length M (assume N is a multiple of M ). Let the ith partition be denoted as xi(n), 0 ≤ i ≤ MN − 1, iM ≤ n ≤ (i + 1)M − 1. In a block filter-ing, the objective is to design the equivalent filtering technique of the sequence x(n) by processing a block or a set of neighboring blocks. There are two basic approaches for block filtering, namely, the overlapping and add method and the overlapping and save method. In the overlapping and add method, each block (say, xi) is independently filtered (say, yi(n) = xi(n)⋆h(n)), assuming that their boundaries in both the directions are padded with zeroes. Finally, the sum total of all such responses provides the same result as obtained by filtering the whole sequence at a time (i.e y(n) = x(n)⋆h(n) =
N MX−1
i=0
yi(n)).
For the ith block filtering with h(n) of length L generates a sequence yi(n) of length M + L − 1 in between iM and (i + 1)M + L − 1.
In the overlapping and save method, for each block (say, the ith block), its adjacent neighboring blocks in both the directions are concatenated together in larger block, and this acts as the input to the convolution operation. If the length of adjacent data is sufficient enough to provide the necessary intervals of stimulation affecting its response around every sample point of the central block (that is, the ith block), it is sufficient to save the response associated with the central block in the resulting output stream. The number of adja-cent blocks required for concatenation depends on the length of the impulse response. For a causal system, it is sufficient to concatenate only the block to its left. For a length L, we need to concatenate at least ⌈ML⌉ number of blocks from its left. However, for a noncausal response, we need to concatenate from both sides. Suppose h(n) is defined in an interval between −(L − 1) and (L − 1). In that case, the number of blocks to be concatenated from both end is also ⌈ML⌉.
While designing efficient algorithms, we may restrict computations per-taining to the central block only following both the approaches. Eventually, this makes both the computations equivalent. We discuss next how these
ap-proaches are adapted in the transform domain, namely, in the block DCT domain.
3.4.1 Overlapping and Save Methods in the Transform Do-main
A useful technique under this category was reported by Kresch and Merhav [79]. They have performed filtering of blocks using convolution–multiplication properties for both DCTs and DSTs, and their method is applicable to any arbitrary type of FIR, be it a symmetric noncausal or an asymmetric causal sequence.
Consider the extended neighborhood of a block xi with (i − 1)th and (i + 1)th block such that the ith output block yi is obtained from the convolution with an FIR h(n), −N ≤ n ≤ N. By matrix notation, we can write it in the are defined from the nonpositive half of the impulse response (that is, hm(n) = {h(0), h(−1), · · · , 2h(−N)}) following Eqs. (3.22) and (3.23), respectively. ΦN is the N × N flipping matrix and given by the following form.
ΦN =
From Eqs. (3.24) and (3.25), the above relationship is expressed in the transform domain in the following form. Note that we have used both the DCT and DST of input blocks in our expression (see Eq. (3.28)). The DCT of a block xi is denoted as Xic, whereas its DST is denoted by Xis.
The above expression could be further simplified for symmetric, antisymmet-ric, and causal responses. As the input is assumed to be available in the block DCT space, efficient cosine-to-sine transformation (CST) and sine-to-cosine transformation (SCT) are presented in [79].
In a different approach, Yim [165] used the linear and distributive prop-erties of DCTs for developing efficient algorithms for filters with symmetric response. In this case, the submatrices of the convolution matrix in Eq. (3.26) hold the following relation.
H2+= (H2−)T. (3.29)
Considering H0= H1++ H1−, Eq. (3.26) is rewritten in the block DCT domain as follows.
Yic= DCT (H2+)Xi−1c + DCT (H0)Xic+ DCT (H2−)Xi+1c . (3.30) Exploiting the properties of DCT matrices and the symmetry in the response, Yim [165] proposed an efficient computation in the 8 × 8 block DCT space.
3.4.2 Overlapping and Add Methods in the Transform Do-main
In these methods, the contribution of an input block to the filtered response of neighboring blocks and also to itself is measured in isolation from others [107]. In the block DCT domain, this is computed in three steps, namely,
1. The block, X, is padded with zero neighbors and merged into a single larger block, say Xl.
2. Apply CMPs to the larger block for computing the filtered response in the transform domain.
3. Decompose the larger block into smaller ones, each providing the re-spective contribution to the neighboring blocks (including the central one).
Finally, the sum total of all such contributions from its neighbors, including its own response, provides the filtered response for the block. The first and third steps in the above involve multiplication of DCT blocks with block composi-tion and decomposicomposi-tion matrices as discussed in the previous chapter. We use the same notation for denoting them in subsequent discussion. Accordingly, A(L,N )denotes the block composition matrix for merging L adjacent N -point DCT blocks into a single LN point DCT block.
For computing the filtered output of any general arbitrary response (that is, noncausal asymmetric FIR), only two of the CMPs as described in Table 3.1are used. They correspond to the the first and last rows of the table. As the last one involves the DST, we need to convert DCT blocks into a DST
block. We refer to this operation as DST composition. The DST composition matrix for transforming L adjacent DCT blocks of length N into a DST block of length LN is denoted byB(L,N) and is given by the following expression.
B(L,N )= SLN
As before, in the above equation, SLN is the LN -point type-II DST matrix, and CN is the N -point type-II DCT matrix. In subsequent subsections we present the properties and theorems related to filtering with different types of FIRs.
3.4.2.1 Filtering with Symmetric FIR
Let us consider the impulse response as h(n), −L ≤ n ≤ L. It is symmetric if h(n) = h(−n). From the related CMP, as discussed earlier, it is sufficient to compute with its positive half h(n), 0 ≤ n ≤ L. Let it be denoted as h+(n).
However, to make the operation equivalent to the linear convolution, we need to compute with an extended neighborhood. The extent of this neighborhood depends upon the length of the filter (that is, 2L+1). So, given a block size N , the number of adjacent blocks to be merged is ⌈NL⌉. Let us restrict ourselves to the case L ≤ N. Hence, only two neighboring blocks are to be merged during the computation. We now state a theorem accounting for the contributions from neighboring blocks of the ith input block.
Theorem 3.5 Consider a matrix U of size 3N × 3N such that U = AT(3,N )D({√
6NC3NI h+}3N −10 )A(3,N )and its N ×N submatrices Uij, 1 ≤ i, j ≤ 3 are defined as follows:
U = same location. Then, Yi is given by the following expression:
Yi=
Let us consider the contribution of a single block X to its neighboring blocks. For this we compute the filtered response by composing a larger block
surrounding X with zeroes. This computation is based on the same overlap and add principle and is shown in the following:
Y = U
0N
X 0N
,
(from Eq. (2.94) and Eq. (2.110)).
=
U12X U22X U32X
.
(3.33)
From Eq. (3.33) we determine the contributions of the (i − 1)th and (i + 1)th neighboring blocks toward the i-th block, and its response Yi is expressed in
the form of Eq. (3.32).
The matrices Es, Fs, and Gs are referred to as filtering matrices. From Theorem 3.5 we obtain their relationship with submatrices of A3,N. Let us use the same notation for denoting the the submatrices of A3,N as Ai,j, 0 ≤ i, j ≤ 2, each of dimension N × N. Similarly, we also denote diagonal submatrices of D({√
6N C3NI h+}3N −10 ) as Di = Di,i, 0 ≤ i ≤ 2 . With these, the filtering matrices are expressed in the following form.
Es = AT0,2D1A0,1+ AT1,2D2A1,1+ AT2,2D3A2,1, Fs = AT0,1D1A0,1+ AT1,1D2A1,1+ AT2,1D3A2,1, Gs = AT0,0D1A0,1+ AT1,0D2A1,1+ AT2,0D3A2,1.
(3.34)
Theorem 3.6 Filtering matrices (as defined in Theorem 3.5) have the fol-lowing properties.
1. Fs is a symmetric matrix.
2. Fs(k, l) = 0 if k + l is odd.
3. Es(k, l) = (−1)k+lGs(k, l).
Proof:
For proving these properties, we take the help of Theorems 2.14 and 2.15 as shown below:
1. U (= AT(3,N )D({√
6N C3NI h+}3N −10 )A(3,N )of Theorem 3.5) is symmetric as it is in the form of U DU−1, where D is a diagonal matrix. Hence,
Fs(= U1,1) is also symmetric.
2. Let us consider a matrix Si= ATi,1DiAi,1. Let the tth diagonal element (at tth row) of Dibe denoted as d(t). Hence, an element of Siis expressed
as follows:
Now, from Theorem 2.14, if k and l are of opposite polarities, either Ai,1(t, k) = 0 or Ai,1(t, l) = 0. This implies that S(k, l) is zero if k + l is
Next we consider two cases separately, one for l as even and the other for l as odd.
2.14). Hence,
From Eqs. (3.37) and (3.38), it follows that
Si,0(k, l) = (−1)k+lSi,2(k, l) (3.39) 3.4.2.2 Filtering with Antisymmetric FIR
In this case, the approach is similar to the previous one. However, it applies a different CMP and models skewcircular convolution in its equivalent computa-tion. A filter response h(n) is antisymmetric when h(n) = −h(−n). Note that, in this case, h(0) = 0. Hence, the complete specification of the filter can be obtained from the strict positive half of the response such as h(n), 1 ≤ n ≤ L.
Let this portion of the response be denoted as hp(n). Here, the filtered re-sponse is computed in the DCT space by applying the last CMP ofTable 3.1.
This demands conversion of adjacent input DCT blocks into a DST block. The input–output relationship for filtering with antisymmetric FIR is expressed in the following theorem. same location. Then, Yi is given by the following expression:
Yi= −
Similar to the filtering matrices of symmetric FIR, Ea, Fa, and Ga are found to have the following properties. The proof is of similar in nature and is omitted from the text.
Theorem 3.8 Filtering matrices (as defined in Theorem 3.7) have the fol-lowing properties.
1. Fa(k, l) = 0, if k + l is even.
2. Ea(k, l) = (−1)k+l+1Ga(k, l).
3.4.2.3 Filtering with an Arbitrary FIR
By using the above two methods, we compute filtered output for any arbitrary FIR in the block DCT space. This is achieved by expressing any arbitrary FIR h(n) as a sum of symmetric response hs(n) and antisymmetric response ha(n) such that hs(n) = 12(h(n) + h(−n)), and ha(n) = 12(h(n) − h(−n)). Let us rename h+(n) as hs(n), 0 ≤ n ≤ L, and hp(n) as ha(n), 1 ≤ n ≤ L. Hence, by applying both the Theorems 3.5 and 3.7, we easily obtain the following input–output relationship.
Theorem 3.9
Yi =
E F G
Xi−1 Xi
Xi+1
, (3.41)
where E = Es− Ea, F = Fs− Fa and G = Gs− Ga. For symmetric h(n), Ea = Fa = Ga = 0N. Similarly, for antisymmetric h(n), Es = Fs = Gs = 0N. For a causal impulse response, G = 0N, imply-ing Gs = Ga. Again, from Theorems 3.6 and 3.8 in this case (for a causal response), E = 2Es= −2Ea.
3.4.2.4 Efficient Computation
In each of the above cases, computation essentially involves multiplication and addition of matrices of dimensions N × N. Instead of performing computation by direct multiplication and addition of filtering matrices, we adopt a more efficient policy by using modified filtering matrices as derived from the original ones:
Ps = (Es+G2 s), Qs = (Es−G2 s), Pa = (Ea+G2 a), Qa = (Ea−G2 a).
(3.42)
The choice of these derived matrices is primarily motivated by the fact that they are relatively sparse, as can be observed from the properties of Theo-rems 3.6 and 3.8. The following theorem states the extent of sparsity of these matrices.
Theorem 3.10 Each of the matrices Fs, Ps, Qs, Fa, Pa, and Qa has at least
N
2 zeroes in each row or each column. Hence, the number of multiplications (nm) and additions (na) required for multiplying one of these matrices with an input block of size N are given as
nm = N22,
na = (N2 − 1)N. (3.43)
Using derived filtering matrices, efficient computation for both the noncausal symmetric and antisymmetric filtering operations are designed. As steps of these computations are similar for both cases, they are shown in Table 3.3.
In this table, the corresponding filtering matrices are denoted by P , Q, and F , respectively. Their roles should be considered under the respective context of the type of responses. Instead of computing all individual contributions of neighbors at the same stage, for efficient handling of input data, contributions of a single block (say, the ith block) to its neighboring blocks are computed at the same time. They are subsequently buffered and pipelined in the process of accumulation for both the (i − 1)th and (i + 1)th blocks (see the last step of Table 3.3).
Table 3.3: Computational steps and associated complexities of filtering an input DCT block of size N with symmetric or nonsymmetric responses
Steps nm na
K= P Xi N2
2 N(N2 − 1) L= QXi N2
2 N(N2 − 1) Wi= F Xi N2
2 N(N2 − 1)
Ui= K + L - N
Vi= K − L - N
Yi= Ui+1+
-Wi+ Vi−1 - 2N
Total 32N2 N2(3N + 2)
In the more general case of filtering with arbitrary noncausal and causal response, it is observed that the sparsity of derived matrices does not provide any additional advantage. In this case, it is required to employ direct matrix operations with original filter matrices such as E, F , and G. For noncausal cases, it takes 3N2 multiplications and 3N2− N additions for N samples,
Table 3.4: Per sample computational cost of filtering in 1-D Impulse response type nm na
Symmetric 3N2 3N+22
Antisymmetric 3N2 3N+22
Causal 2N 2N − 1
Noncausal arbitrary 3N 3N − 1
whereas for causal filters the numbers are 2N2 and 2N2− N, respectively.
Computational complexities of different cases are summarized in Table 3.4.