CAPITULO XVII EDIFICIOS PARA BAÑOS
PROYECTO ESTRUCTURAL
Definition 11 (Visual data). Visual data can be seen as a collection of samples at the abstract level. In the visual domain, the Visual data encapsulates It, [It+1, . . . It+p]
where p = 0 in the case of static detector and p > 0 in the case of dynamic detector. The Visual data encapsulates also all the subsequent visual data obtained from these images. To obtain the samples (static or dynamic image samples) we use an iterator called search strategy.
Given a knowledge Cζ relative to a visual object ζ. Given a visual data (see Defini-
tion 11) source π. A visual finder is a component that first builds the visual pyramid for the visual data π(t) and does some preprocessing on π(t). This preprocessing is required by the classification step. Next it iterates over the sliding windows available through an iterator named search strategy. Each sliding window is classified using Cζ. Visual finder ends by a postprocessing step which consists in some spatial and
temporal filters on the positively classified sliding windows. In the following sections we will detail each of these functional blocks. We start by the generic patterns that could be used to group all these blocks together.
Visual finder patterns
In this section we focus on the question where to put the preprocessing block relative to the search strategy? because it influences the performance of the total process in term of memory space and computation time. Figure 6.17(a) shows the first config- uration called Global Preprocessing. Figure 6.17(b) shows the second configuration called Local Preprocessing.
Local preprocessing consists in iterating over the input visual data first. For each sliding window it builds the required preprocessing defined by Cζ. The sliding
window is responsible of the storage of the preprocessing data. This pattern requires less memory size with the cost of more computation time.
Global preprocessing consists in building a global preprocessing data which is shared between all the sliding windows. Each sliding window maintains a reference to the shared preprocessing interface, and a position (sx, sy). This pattern requires
larger memory space and lower computation time.
These two patterns are constrained by the subsequent architecture. From the functional point of view these two patterns are similar. Thus, in next sections we refer to the visual finder regardless its pattern.
(a) Global Preprocessing (b) Local Preprocessing
Figure 6.17: Generic Finder Patterns
Visual pyramid
In the generic design of the visual detector, we focus on the question how to find objects at different scales in the same image? First let us remember that the knowl- edge Cζ is specialized on an object ζ with dimensions h × w. The dimensions h × w
represent the dimensions of the smallest object ζ that can be recognized by Cζ. To
detect objects of same category ζ with higher dimensions h0 × w0 constrained by h
w =
h0
w0, the classifier Cζ is applied to the scaled image with factor α = h h0.
(a) Object to scale association (b) Face example
Figure 6.18: Visual Pyramid: ζ corresponds to ’A’, and the largest object found in the image is ’E’.
Given a visual data π(t) with embedded image It(in case of dynamic detector
of It. The largest object ζ that can be recognized in It has as dimensions H × W .
Therefore, the objects ζ that could be found in this image have dimensions hk× wk
constrained by: hk = H . . . h, wk = W . . . w and whk
k =
h
w.
We build a set of N images obtained from It by a scaling factor α, where αN =
min(Hh,Ww). This set of images is called visual pyramid (see Figure 6.18(a)). The size of the visual pyramid is N . N is calculated by:
N = min( logα( h H) ,jlogα( w W) k )
Table 6.3 represents the size of the pyramid for a quarter-pal image with several values of α. The choise of α depends on the requirements of the application.
Visual finder preprocessing
Given a knowledge Cζ. Let Ψ be a category of visual features. Let fΨ denote a visual
feature based on Ψ. For each Ψ, if Cζ contains at least a visual feature fΨ, then the
preprocessing block must prepare the required data, as shown in Table 6.4.
Visual finder search strategy
Definition 12 (Visual search strategy). Given a visual data π(t). A search strategy is an iterator on the sliding windows present in π(t). The search may cover all the available sliding windows and then we speak in term of full-search, and otherwise it is called partial-search.
We have introduced the notion of search strategy as an abstraction of the scan method presented by Viola and Jones [VJ01b]. We call the previous method the generic visual search. It is fully customizable: this will help us to compare the per- formance of different visual features without doubts. We can configure the horizontal and vertical steps to zero and then it is similar to full-search strategy as shown in Fig- ure 6.19. Otherwise, Figure 6.19 shows that these parameters decrease dramatically the number of sliding windows in the image. The accuracy of the detection decreases as well. The ideal case is to cover all the sliding windows but this is a heavy task for the traditional processors.
The notion of search strategy is useful, since it can be extended to untraditional search methods. For example in some applications, the objects of interest are located in some regions at different layers in the pyramid. These informations could be measured using statistics on some detection results applied to recorded sequences.
Visual finder postprocessing
Given a knowledge Cζ and a visual data π(t). The detector based on Cζ is insensi-
tive to small variations in scale and position, usually a large number of detections occur around an object of interest ζ (see Figure 6.20(a)). To incorporate these mul- tiple detections into a single detection, which is more practical, we apply a simple
Figure 6.19: Generic Visual Search Strategy
grouping algorithm as in [VJ01b]. This algorithm combines overlapping detection rectangles into a single detection rectangle. Two detections are placed in the same set if their bounding rectangles overlap. For each set the average size and position is determined, resulting in a single detection rectangle per set of overlapping detections (see Figure 6.20(b)).
(a) Before (b) After
Figure 6.20: Visual postprocessing: grouping