• No se han encontrado resultados

This chapter presents novel online data-driven video denoising techniques based on learning sparsifying transforms for appropriately constructed spatio- temporal patches of videos. This new framework provides high quality video restoration from highly corrupted data. In the following, we briefly review the background on video denoising and sparsifying transform learning, before discussing the contributions of this work.

4.1.1

Video Denoising

Denoising is one of the most important problems in video processing. The ubiquitous use of relatively low-quality smart phone cameras has also led to the increasing importance of video denoising. Recovering high-quality video from noisy footage also improves robustness in high-level vision tasks [37,84]. Though image denoising algorithms, such as the popular BM3D [32], can be applied to process each video frame independently, most of the video denoising techniques exploit the spatio-temporal correlation of the tensor data. Natural videos have local structures that are sparse or compressible in some transform domain, or over certain dictionaries, e.g., discrete cosine transform (DCT) [85] and wavelets [86]. Prior works exploited this fact and proposed video (or high-dimensional data) denoising algorithms by coefficient shrinkage, e.g., sparse approximation [31] or Wiener filtering [87]. Since there are typically motions involved in videos, objects can move throughout the scene. Thus, state-of-the-art video and image denoising algorithms also combine block matching (BM) to group local patches, and apply denoising jointly [32, 59, 87].

Temporal Sliding Window, size = 𝑚, at 𝜏 = 𝑡 ෩ 𝒀𝒕−𝒎+𝟏 ෩ 𝒀𝒕−(𝒎−𝟏)/𝟐

𝒀𝒕 𝒏𝟏× 𝒏𝟐 Spatial Window 𝒎 𝑛2 𝑛1 𝒏 𝒏 = 𝒏𝟏× 𝒏𝟐×𝒎 cascade 𝑴 Vectorized ෥𝒖𝒊 𝒊𝒕𝒉3D tensor Mini-batch ෩𝑼𝒋= [෥𝒖𝒋𝑴−𝑴+𝟏| … |෥𝒖𝒋𝑴]

Yt

Middle frame (odd 𝒎 for A2)

Extract 𝑹𝒊Yt෩ in A1

Construct 𝑽𝒊Yt෩ in A2by BM

Input FIFO buffer:

Figure 4.1: Illustration of video streaming, tensor construction and vectorization.

4.1.2

Sparsifying Transform Learning

Most of the aforementioned video denoising methods exploit sparsity in fixed transform domain (e.g., DCT) as part of their framework. It has been shown that the adaptation of sparse models based on training signals usually leads to superior performance over fixed sparse representation in many applica- tions. Synthesis dictionary learning is the most well-known adaptive sparse representation scheme [4, 8]. However, the synthesis model sparse coding problem is NP-hard. The commonly used approximate sparse coding algo- rithms still involve relatively expensive computations. As an alternative, the transform model suggests that the signal u is approximately sparsifiable us- ing a transform W ∈ Rm×n, i.e., W u = x + e, with x ∈ Rm sparse and e a small approximation error in the transform domain (rather than in the signal domain). Recent works proposed sparsifying transform learning [20, 23] with cheap and exact sparse coding steps, which turn out to be advantageous in various applications such as natural data representations, image denoising, inpainting, segmentation, magnetic resonance imaging (MRI), and computed tomography (CT) [5, 21, 22, 27, 44, 88].

𝑌1

𝑌t

𝑌t+1

𝑌

t−m+1

| … |𝑌

t

=

Input FIFOBuffer

VIDOSAT Mini-batch Denoising

Y

t−m+1

| … |ഥY

t

=

Output FIFOBuffer

Y1

Yt−m+1

Noisy Video

Stream

Denoised

Stream

t Y ~

accumulate denoised patches

Normalize & Output the Oldest Frame

Figure 4.2: Illustration of online video streaming and denoising framework.

4.1.3

Contribution

While the data-driven adaptation of synthesis dictionaries for the purpose of denoising image sequences or volumetric data [30, 31] has been studied in some recent papers, the usefulness of learned sparsifying transforms has not been explored in these applications. Video data typically contains correlation along the temporal dimension, which will not be captured by learning spar- sifying transforms for the 2D patches of the video frames. Thus in this work, we focus on video denoising using high-dimensional OSTL. We propose the method of VIdeo Denoising by Online SpArsifying Transform learning (VI- DOSAT). The sparsifying transform is adapted to the tensors formed by the local patches of the corrupted video on-the-fly. Figure 4.1 illustrates how the spatio-temporal tensors are constructed and vectorized from the streaming video, and Fig. 4.2 is a flow-chart of the proposed VIDOSAT framework.

To our knowledge, this is the first video denoising method using online sparse signal modeling, by applying high-dimensional sparsifying transform learning for spatio-temporal data. Table 4.1 summarizes the key attributes of some of the aforementioned related video denoising algorithm representatives,

Table 4.1: Comparison between video denoising methods, including fBM3D, 3D DCT, sKSVD, VBM3D, VBM4D, as well as VIDOSAT and

VIDOSAT-BM proposed here. fBM3D is applying BM3D algorithm for denoising each frame, and the 3D DCT method is applying the VIDOSAT framework but using the fixed 3D DCT transform.

Methods Sparse Signal Model BM Temporal Fixed Adaptive Online Correlation

fBM3D

3

3

3D DCT

3

3

sKSVD

3

3

VBM3D

3

3

3

VBM4D

3

3

3

VIDOSAT

3

3

3

VIDOSAT

3

3

3

3

-BM

as well as the proposed VIDOSAT algorithms. Our contributions can be summarized as follows:

• We propose a video denoising framework, which processes noisy frames in an online fashion. Within the framework, we present two meth- ods of spatio-temporal tensor construction, one of which utilizes block matching (BM) for motion compensation.

• We apply the OSTL for reconstruction of sequentially arrived tensors, whose spatio-temporal structure is exploited using the adaptive 3D transform-domain sparsity. The denoised tensors are aggregated to reconstruct the streaming video frames.

• We evaluate the video denoising performance of the proposed algo- rithms, which outperform the competing methods over several public video datasets.

Documento similar