• No se han encontrado resultados

CAPÍTULO 2. MARCO TEÓRICO

2.1. Antecedentes

2.2.3. Directiva para la Ejecución Presupuestaria

2.2.3.5. CAPÍTULO V: DISPOSICIONES COMPLEMETARIAS

5.4.1

Computation for Images

For each view of images, we value the similarity of each sample pair by using the neighbors of each point. The construction of Wi is illustrated below via the ℓ1-graph [26], which is

demonstrated to be robust to data noise, automatically sparse and adaptive to the neighbor- hood.

For each Xpi, we find the coefficients βββ ∈ RN−1 such that Xpi = Bβββ , where

B= [X1i, · · · , Xp−1i , Xp+1i , · · · , XNi] ∈ RDi×(N−1).

Considering the noise effect, we can rewrite it as Xpi= B′βββ′, where B′= [B, I] ∈ RDi×(Di+N−1)

and βββ′∈ RDi+N−1. Thus, seeking the sparse representation for Xpi leads to the following op- timization problem: arg min β β β′ ∥Xpi− B′βββ′∥2, s.t. ∥βββ′∥1< ε, (5.24)

where ε is the parameter with a small value. This problem can be solved by the orthogonal matching pursuit [116].

Considering different probabilistic distributions that exist over the data points and the natural locality information of the data, we first employ the Gaussian mixture model (GMM) on the training data for each view. On the one hand, it has been proved that data in the high- dimensional space do not always follow the same distribution, but are naturally clustered into several groups. On the other hand, realistic data distributions basically follow the same form, i.e., Gaussian distribution. In this case, G clusters are obtained by the unsupervised GMM clustering for each view. Thus, we can solve the above problem (5.24) using the data from the same cluster to represent each point rather than the whole data points B, which is also regarded as a solution to alleviate the computational complexity of problem (5.24).

In particular, for βββ′ = (β1, · · · , βDi+N−1), we can first set βq= 0 if X

i

q and Xpi are in

different clusters, ∀q ̸= p, then solve the above problem. Now the similarity matrix Wi∈

RN×N can be defined as: (Wi)pp = 0, ∀p, (Wi)pq = |βq| if q < p, and (Wi)pq = |βq−1| if

q> p. To ensure the symmetry, we update Wi← (WT

i + Wi)/2. Then we set the diagonal

matrix Di∈ RN×N with (Di)pp= ∑q(Wi)pq and the Laplacian matrix Li= Di−Wifor each

Fig. 5.1 Illustration of selected middle frames from actions “Handwaving" and “Diving".

5.4.2

Computation for Videos

Incremental Naive Bayes Keyframe Selection

In a video sequence, however, not all of the poses are informative and discriminative for action recognition. Some poses may carry neither complete nor accurate information and would even contain common patterns shared by various action types. Since these poses in a video sequence cannot represent the action well and would cause confusion during the classification phase, a weakly supervised method, termed Incremental Naive Bayes Filter (INBF), has been carried out to filter the noisy representation and keep the relatively repre- sentative and discriminative poses, i.e., the key poses.

For each action category, ten action sequences are randomly selected. We choose a small set of discriminative poses for a certain action type from each action sequence as the INBF initial positive samples (labeled as y = 1), and the remaining frames are adopted as the negative ones (y = 0). As illustrated in Fig. 5.1, the five frames in the middle of an action sequence are selected as discriminative poses. We repetitively apply the above procedure to each action type. INBF is then regarded as an unsupervised online learning strategy.

For the i-th feature view, the representation of each pose (frame) s can be written as xi(s) = (xi1(s), · · · , xiD(s)) ∈ RD. Since all the features we extracted are based on statistical histograms, we assume all elements in xiare independently distributed and model them with a naive Bayes classifier:

P(xi) = logΠ D m=1Pr(xim|y = 1) Pr(y = 1) ΠDm=1Pr(xim|y = 0) Pr(y = 0) = D

m=1 logPr(x i m|y = 1) Pr(xi m|y = 0) . (5.25)

y∈ {0, 1} is a binary variable which represents the negative and positive sample labels, respectively.

Furthermore, in either statistics or physics, real-world data distribution empirically fol- lows the same form, i.e., Gaussian distribution. Thus, the conditional distributions xim|y = 1 and xim|y = 0 in the classifier P(xi) are assumed to be Gaussian distributed with the four-

tuple (µy=1m , µy=0m , σy=1m , σy=0m ), which satisfy

xim|y = 1 ∼ N(µy=1m , σy=1m ) and xim|y = 0 ∼ N(µy=0m , σy=0m ).

Up to now, for a certain feature view, we can initialize a group of naive Bayes models for each action type, and the training sequence is successively employed through all the models. The Gaussian parameters in INBF can be then incrementally updated as follows:

µy=1m ← λ µy=1m + (1 − λ )µy=1,

σy=1m ← q

λ (σy=1m )2+ (1 − λ )(σy=1)2+ λ (1 − λ )(µy=1m − µy=1)2,

(5.26)

where µy=1 = 1S∑s|y(s)=1xim(s), σy=1 =

q

1

S∑s|y(s)=1(xim(s) − µy=1)2, λ > 0 denotes the

learning rate of INBF, and S = |{s|y(s) = 1}|. And µy=0m and σy=0m have similar update rules. The above solutions are easily obtained by maximum likelihood estimation. In this way, we can use INBF to keep the representative frames for the later learning phase and discard irrelevant frames to decrease the influence of noise. The process of INBF is summarized in Algorithm 7.

Algorithm 7 Incremental Naive Bayes Keyframe Selection

Input: 10 randomly selected action sequences from each category; the total number of actions in each category Nc.

Output: The selected keyframes for action sequences.

1: Manually select 5 representative frames from each sequence of the target category as the positive samples and label them as y = 1, otherwise y = 0;

2: for m = 1, · · · , Ncdo

3: Calculate µy=1m , σy=1m , µy=0m and σy=0m ;

4: Update µy=1m+1= λ µy=1m + (1 − λ )µy=1;

5: Update σy=1m+1=qλ (σy=1m )2+ (1 − λ )(σy=1)2+ λ (1 − λ )(µy=1m − µy=1)2;

6: Update µy=0m and σy=0m by using similar rules;

7: end for

Similarty matrix

Gaussian kernel

The procedure of DTW

B fram es of v ide o p A frames of video q

Fig. 5.2 Illustration of the similarity matrix construction.

RBF Sequential Kernel Construction

For the i-th view, since we extract features from the frames of video sequences, each video sequence can be described by a set of features with a sequential order (along the temporal axis). The similarity between video vpand video vqunder view i: ki(vp, vq) can be measured

via Dynamic Time Warping (DTW) [9]. Therefore, the kernel function can be defined as: ki(vp, vq) = exp(−

DTW(Xpi,Xqi)2

2σ2 ), where DTW (Xpi, Xqi) indicates the sequential distance

computed via DTW and σ is a standard deviation in the RBF kernel. In this way, we can easily obtain the kernel matrices for different views using the above equation.

Similarity Calculation

Based on the above kernel construction, we can obtain kernel matrices K1, · · · , KM ∈ RN×N

with the same size for M views with different dimensions. Furthermore, we use the label of training video sequences to supervise the calculation of the similarity matrix Wi for the i-th

view. Then each component of Wiis computed as follows:

(Wi)pq= ( exp(−DTW(X i p,Xqi)2 2σ2 ), C(p) = C(q) 0, otherwise , (5.27)

where C(p) is the label function which indicates the label of video vpand p, q = 1, · · · , N.

matrix Kias illustrated in Fig. 5.2. Then we have the diagonal matrix Diin which (Di)pp=

∑q(Wi)pq and the Laplacian matrix Li= Di−Wifor each view i.

Documento similar