• No se han encontrado resultados

14 Y 28 COMPRIMIDOS CON CUBIERTA PELICULAR // MUESTRA MEDICA: CAJA POR 7

For the hand-arm dataset, an HMM configuration with five states and four mixtures per state achieves the highest testing ML recognition rate (79% ± 3.3%). There are three movement types in the hand-arm dataset that are used in both forward and reverse di- rections (making the six motion paths in Table 4.1). Motion paths A, C, and F start at distinct positions and end at a same position (with the right arm extended in front of the body at the chest level). Therefore, the starting positions of A, C, and F, in addition to the starting position of B, D, and E (the end position of A, C, and F) form four distinct starting positions for the movements in the dataset; hence, the choice of four mixtures enables capturing the distinct motion paths in the hand-arm dataset.

The trained HMMs are used to classify the training hand-arm movements and an av- erage ML recognition rate of 100% is achieved. The 100% ML recognition of the training movements confirms an accurate encoding of the training movements by the resulting HMMs. Therefore, a configuration with five states and four mixtures per state is se- lected for the hand-arm movements as the appropriate class-specific HMM configuration for further analysis. Table 4.4 shows the recognition rates achieved at different stages of the hand-arm movement modeling in the proposed approach. The table also includes the F1-measure for the testing movements. For the hand-arm dataset, the 2NN classification in a 6D sPCA-GRBF (σ = 0.5) subspace results in the highest recognition rate (96% ± 1.6%) among sPCA subspaces. Table 4.5 shows the confusion matrix for the 10-fold cross-validated k NN classification of the hand-arm movements in the 6D sPCA subspace.

Table 4.4: Cross validated recognition rates (%) and testing F1-measures for the hand-arm dataset

Technique Train Test F1-measure

HMM + ML 100 79±3.3 0.84

FS + SVM 100 97±1.4 0.99

FS + sPCA + 2NN 100 96±1.6 0.98

4.4

Discussion

The SVM recognition rates and F1-measures in the Fisher score spaces exceed the ML recognition rates for both the full-body and the hand-arm datasets (Tables 4.2 and 4.4, respectively). The high SVM recognition rates and F1-measures in the Fisher score spaces illustrate the discriminative quality of the Fisher scores and confirm earlier findings that discriminative analysis based on Fisher scores results in improved recognition rates as compared to generative models used to derive the Fisher scores [173, 171, 174, 175, 176, 177]. Affective movement recognition based on HMM-derived Fisher scores exploits the differences between the generative processes for different affective movements (modeled in terms of the gradient of log likelihood with respect to the parameters of the class-specific HMMs), and results in a higher recognition rate than the ML classification that makes use of the differences in likelihoods only (posteriors of the movements given the class-specific HMM models).

The proposed approach is robust to kinematic differences and can handle multiple kinematic expressions within the same affective category via mixture of Gaussians. In such a model, kinematically dissimilar movements populate distinct Gaussians in the mixture Table 4.5: Confusion matrix (%) for the 10-fold cross-validated 2NN classification in 6D sPCA subspace for the hand-arm dataset

Sadness Happiness Fear Anger Surprise Disgust

Sadness 96 0 0 0 0 4 Happiness 0 96 4 0 0 0 Fear 0 0 100 0 0 0 Anger 0 0 0 100 0 0 Surprise 0 4 0 4 92 0 Disgust 0 0 5 0 0 95

at each hidden state.

For both datasets used in this study, the recognition rate and F1-measure in the sPCA subspaces are comparable to the SVM recognition rates in the Fisher score spaces, which demonstrates the discriminative and generalizability (i.e., ability to discriminate between unseen test movements) qualities of the resulting low-dimensional sPCA subspace. The HMM-derived Fisher scores highlight the movements’ important kinematic and dynamic features, and sPCA identifies the most discriminative features in the high-dimensional Fisher score space. Therefore, high recognition rates comparable to SVM recognition rates in the Fisher score space are achieved in the sPCA subspace spanned by only a few dimensions. Another appealing property of the resulting sPCA embeddings is that the subspace dimensions form a minimal set of features salient for affective movement recognition, which can be used to explore correspondences between low-level and high- level movement features.

3D and 6D sPCA subspaces resulted in the highest testing recognition rates for the full-body and hand-arm datasets, respectively. Provided that the affective expressions are perceivable as intended, the resulting 3D sPCA subspace for the full-body movements might correspond to the pleasure-arousal-dominance (PAD) model in which the four intended emotions (anger, sadness, happiness, and fear) in the full-body dataset are characterized with distinct levels of pleasure, dominance, and arousal [43]. On the other hand, the hand- arm dataset has exemplars for all the six basic Ekman emotions, which is probably the reason we need a higher dimensional subspace (6D) as compared to the full-body dataset in order to discriminate between the hand-arm movements. For instance, surprise is shown to be distinguishable from other basic emotions along the unpredictability dimension [184], and to distinguish disgust, the avoidance dimension might be needed [185].

The intended affective expressions are recognized above chance level in both datasets (Tables 4.3 and 4.5). In the full-body dataset, mainly the angry and happy movements are confused with the other movements. For the angry movements, the highest confusion is between fearful (17%) and happy movements (13%). Similarly, for happy movements, the highest confusion is between fearful (12%) and angry movements (12%). Considering the circumplex model of emotion [165], the observed confusions seem to be related to the similarities between the affective expressions along the arousal and valence dimensions. For instance, anger and fear are both high arousal and negative valence expressions.

For the hand-arm dataset, most of the confusion occurs in the recognition of surprised movements (Table4.5), as they are confused with happy (4%) and angry movements (4%). Happiness, anger, and surprise expressions are similar in the arousal dimension of the circumplex model of emotion (high-arousal). The observed confusions in the affective

movement recognition and the possible link to the dimensional models of emotion merit further investigation to identify whether distinct basic emotions can be correctly recognized from movements or if only varying levels of affective dimensions (e.g., arousal and valence) are accurately recognizable from movement.

We also compared the recognition performance in our study to that achievable by human observers for the full-body dataset. Note that the full-body movement labels are the actor labels. Kleinsmith et al. [115] tested human perception of the intended emotions from the most expressive postures of 108 of movements in the full-body dataset (referred to as apex postures in [186]). The overall recognition rate was 54.7% with the least recognized postures being fearful ones (49.4 % recognition rate), and the most recognized being the sad postures (63.4 % recognition rate). Therefore, the proposed recognition approach is superior to human observers in decoding affective expressions. It should be emphasized that this comparison is made to illustrate the discriminative quality of the proposed recognition approach and that the perceptual study is done using apex frames from the movements and not the whole movements. Nonetheless, the achieved recognition rates using the proposed approach are promising.

Similar to other data-driven approaches, the success of the proposed approach relies heavily on the training data. For accurate recognition, the training data should provide ex- emplars covering a wide range of movements and emotions with kinematic, stochastic, and interpersonal variabilities. Furthermore, the proposed approach is a supervised technique that requires movement labels as an input.