2.2 Bases Teóricas
2.2.1 La Deserción Escolar en el Estudiante
Let us first consider the case where the test video clip can elicit only one emotion from the viewers. Then given the test feature vector y ∈ ℜk, which is a k-dimensional feature vector that represents the affective content of a test video clip. We assume there are m basic emotional states (emotional categories). Let αj,i ∈ ℜk for i = 1, . . . , nj be
the representative feature vectors of the jth emotional state, where nj is the cardinality
of this set of representative feature vectors. We put the representative feature vectors of the jth emotional state into a k×nj matrix Aj, which is called the “sub-sample matrix”
and Aj = [αj1, . . . , αj,nj]. Accordingly, we form a k× n matrix A = [A1, . . . , Am], which
is called the “sample matrix”. Then, we have n = ∑mj=1nj. If we write y as a linear
combination of the representative feature vectors, then for each emotional state we can have:
y = αj1βj1+ αj2βj2+· · · + αj,njβj,nj = Ajβj (3.4)
where βj = [βj1, βj2, . . . , βj,nj]
T ∈ ℜnj, for j = 1, . . . , m, and β
j is the linear coefficient
vector.
Furthermore, we have assumed that any feature vector representing an emotional state can be represented only as a linear combination of the corresponding representative feature vectors. As a consequence, if y belongs to the qth emotional state, where q ∈
{1, . . . , m}, then, the linear coefficient vector βj for j = 1, . . . , m is equal to the zero
vector 0 except βq as shown in Eq.(3.5). Moreover, based on Eq.(3.5), we can write y as a linear combination of the representative feature vectors of all the emotional states using Eq.(3.6) as follows.
βj = 0 j̸= q ̸= 0 j = q (3.5) y = A1β1+· · · + Aqβq+· · · + Amβm= Ax (3.6) where, A = [A1, . . . , Am] x = [β1, . . . , βq−1, βq, βq+1, . . . , βm]T
Figure 3.1: An example for the “ideal case” of the relationship between the entry values of x and each column of sample matrix A based on the sparse representation: y = Ax.
where, A is the complete “sample matrix”, and x∈ ℜn(n =∑mj=1nj) which is called the
“sparse solution” is a sparse vector within which the entries should be zero, except for the ones associated with the qth emotional state. Specifically, let us treat x as a signal. When there are a large number of representative feature vectors for each emotion (nj for
j = 1 . . . m is large), the signal x is very long because of the large n. However, the signal x only has s non-zero coefficients (s = ∥βq∥0 ≤ nq ≪ n), while Eq.(3.5) is satisfied.
Therefore, x is sparse and compressive because its non-zero coefficients are concentrated on a small set. Referring to the previous discussion about sparse representation, x can be compared to f with A being the downsampling matrix Ψ, y being the partial information signal of f . We therefore aim to find x by solving Eq.(3.6). Intuitively, all the non-zero coefficients of the solution of Eq.(3.6) should only correspond to the columns of Aq, which
happens in the “ideal case”, that is, when Eq.(3.5) is satisfied. Accordingly, the location of the non-zero entries of x predicts the emotional property of the test feature vector
y – the emotional state whose corresponding entries in x are non-zero. Fig. 3.1 shows
an example to visually depict the relation of the entry values of x to each sub-sample matrix in A in the ideal case.
However, Eq.(3.6) is under-determined when k≪ n, which is more typical in practice, that is, the number of solutions of Eq.(3.6) is infinite. When we solve this equation by considering the ℓ1-minimization problem using COSAMP [NT09], we can obtain the
Figure 3.2: An example for the “practical case” of the relationship between the entry values of ˜x and each column of sample matrix A by solving y = Ax using the COSAM-
P [NT09].
approximation ˜x of x.
ℓ1 : ˜x = arg min
x ∥x∥1, Subject to y = Ax (3.7)
It is possible that ˜x cannot guarantee that its non-zero coefficients would only be associ-
ated with the columns of Aq, that is, the coefficients of ˜x do not satisfy Eq.(3.5). Other
than the under-determined nature of the linear equation, two other reasons can also ex- plain this phenomenon. First, the noise, such as environmental noise and camera motion, possibly impacts the values of the extracted features. Second, it could be an instance of the “practical” case. The test video clip represented by y simultaneously contains several emotions, such as happiness mixed with sadness in case of nostalgia. Therefore, the non-zero entries of x are separated into several emotional categories. Fig. 3.2 shows the visualization of the practical case of the solution ˜x.
We need to discuss about how to determine the emotional property of y and the intensity of each emotion within y when the coefficients of its corresponding ˜x do not
satisfy Eq.(3.5). We focus on the dominant emotion. We would like to guess that the columns from the “dominant” (or say “main” or “correct”) emotional category in A should have the most contribution into ˜x; therefore, we not only capture how well the
of each emotion within y.
For the first point, we introduce the function denoted by Φj(x) that returns a new
vector which is composed of all the coefficients of x corresponding to Aj, and Φj(x) ∈
ℜnj. Then, let ˜y
j = AjΦj(˜x), for j = 1, . . . , m, represents the approximation of y for the
jth emotional state in terms of ˜x. The difference φj between y and its approximation
˜
yj can be computed by Eq.(3.8).
φj =∥y − ˜yj∥2=∥y − AjΦj(˜x)∥2 (3.8)
The value φj can be interpreted as how close or how well the coefficients within ˜x are
associated with the sub-sample matrix Aj for j = 1, . . . , m. Namely, the smaller φj
is, the closer y and ˜yj are, which means the representative feature vectors of the jth
emotional state match better with y. We therefore determine the “correct” emotional property of y is the qth emotional state using Eq.(3.9).
q = arg min
j φj = arg minj ∥y − AjΦj(˜x)∥2 (3.9)
This classification rule is corroborated by the experimental results in Section 3.4. From another angle, the difference φj represents the degree of match between y and
the columns of Aj. In other words, the smaller the difference φj is, the more significant is
the jth emotional state’s (representative feature vectors) contribution to y. Obviously, the intensity of an emotional state is directly proportional to the importance of this emotional state within y. It means the intensity of the jth emotional state is inversely proportional to the difference φj. In order to computationally describe the intensity of
the jth emotional state, let φ = [φ1, . . . , φm]T ∈ ℜm be the residual vector of y. Then,
we can compute the intensity Υj of the jth emotional state within y using Eq.(3.10).
Υj = 1−
φj
|φmax|
(3.10)