La consulta y el consentimiento previo en relación a derechos colectivos de los

CAPÍTULO II: El Proyecto del Eje Multimodal Manta-Manaos y los estándares internacionales

3. Garantías de los derechos aplicables en los proyectos de desarrollo

3.1. Participación social

3.1.2. La consulta y el consentimiento previo en relación a derechos colectivos de los

The results shown in Table 6.1 outlines statistics on detection rates using the in-house dataset on the micro-movement detection using T1 and the proposed method. Using T2 calculated using the baseline of a participant’s neutral facial expression, the results indicate a higher performance.

The proposed method is able to spot a large number of micro-movements when using a higher temporal resolution of 200 fps. Further it was concluded that the spatialHOGdescriptor outperforms when compared usingLBP. Further enhancements to the ABT could be to employ metaheuristic search algorithms to automatically adjust the threshold setting guided by a fitness function.

6.5.1 Measures of Performance

For these experiments, a reminder of the performance measure equations used are outlined. The Precision measure of exactness determines a fraction of relevant responses from results. It is defined as

P recision = T P

T P + F P. (6.3) Recall calculates the fraction of the results that are relevant to the experiment and that are successfully retrieved. It is commonly used with recall to form an under- standing of the relevance of the results returned from experimental classification. It is defined as

Recall = T P

T P + F N (6.4)

The F-measure determines the harmonic mean between Precision and Recall and is commonly used in place of accuracy as it provides a more detailed analysis of the data. Using this measure advances on just using accuracy for results in

Table 6.1: Results using T1 and T2 including Recall, Precision and F-measure on the SAMM dataset.

Method Recall Precision F-Measure LBP - [109] 0.5171 0.6084 0.5595 HOG - with T1 0.4657 0.7181 0.5650 LBP - with T2 0.7829 0.6508 0.7108 HOG - Proposed method 0.8429 0.7041 0.7672

Chapter 4. The equation can be defined as

F -M easure = 2T P

2T P + F P + F N. (6.5)

6.5.2 Peak Detection

As the peaks formed from the feature difference analysis are quite small, it was required to complete the peak detection manually. The process involved plotting each movement and thresholds and cross-checking with the ground truth FACS

coding to determine a TP,FP orFN.

6.5.3 Method Comparisons

As the method described in Moilanen et al [109] is a similar feature difference method to the one proposed, it is used for comparing results. The original method is replicated on the SAMM dataset, and the threshold calculation (T1 ) is used for the first time with the spatial HOG feature. The baseline threshold (T2 ), is applied to bothLBP and HOG.

The best result was Recall at 0.8428 using the proposed method and the T2 threshold. This is much higher than the performance using LBP or HOG with the T1 threshold with 0.5171 and 0.4657 respectively. Due to the dataset being high-speed videos and a higher resolution compared with previous experiments, it is likely that calculating distance will lead to large difference value peaks. T1 was calculated using the mean and maximum values of the feature vector, so it will always detect a peak. This has the disadvantage in a real-world scenario if there are no movements (neutral face or no expression) they will always be misclassified as a movement.

Figure 6.4: Illustration of a micro-movement sequence where a micro movement (peak 1) is missed by T1. The green solid line shows T2, which detects

all micro-movements.

Using T2, the neutral expression for the participant is used as a threshold, therefore it is not affected by the values of the target sequence. Fig.6.4 shows how a large peak increases the T1, whereas the proposed method detects the large peak but also peaks that have been missed due to using the sequence in the threshold value calculation.

In Fig.6.5 T1 uses the neutral sequence to calculate a threshold that appears to have detected movements, but in fact are the baseline neutral expression of that particular participant. Further, when a neutral sequence is input, the baseline stays above the peaks as they are below the participant’s baseline.

Due to the proposed method’s sensitivity, the Precision is affected and more

FPs are observed. This can be seen in Table 6.1 as the Precision value between methods is relatively small, and using T1 withHOGhas a better Precision value of 0.7181 compared with usingHOGand T2, which achieved 0.7041. Further tuning of the baseline selection would be required to reduce false positives but preserve the sensitivity of micro-movements detection. To provide a fair comparison with the proposed baseline method, only methods that have been used for micro-movement detection have been used, therefore LBP-TOP with GD proposed in Chapter 4

Figure 6.5: The green solid line shows that T2 stays above values that are considered to be a person’s baseline. The dashed red line shows peaks falsely

detected by the previous detection method using T1.

Table 6.2: The best results of the previous feature difference method and the proposed ABT method, using the spatial HOG feature. The SAMM and

CASME II datasets are compared.

Method Dataset Recall Precision F-Measure Moilanen et al. [109] SAMM 0.5171 0.6084 0.5595 Moilanen et al. [109] CASME II 0.5219 0.2123 0.3031 Proposed Method - ABT SAMM 0.9125 0.7304 0.8179 Proposed Method - ABT CASME II 0.3951 0.2361 0.2961

6.5.4 ABT Results

The results presented in Table6.2show that the method in [109] does not perform as well on theSAMMdataset and mirrors a similar problem exhibited in their own results on the higher resolution CASME-A dataset, where the recall was 0.52. In the CASMEII dataset, both methods performed poorly due to the same reasons, with the recall in [109] being similar at 0.5219. With the lower amount of frames available (i.e. fewer baseline frames) the N interval value had to be set to 13 for CASME II, and more noisy peaks were detected. As the SAMM dataset has the highest available resolution on micro-movements, it shows that the method in [109] struggles to process such data effectively.

By contrasting and comparing the baseline feature and movement feature using ABT, the proposed method substantially increases the detection rate and produced the best result of 0.9125 and 0.8179 for recall and F-measure, respectively. We observe that the precision has increased by 3% when compared to the results of the proposed method in Table6.1. The proposedABTmethod manages to reduce some of theFP, but overall still remains a challenge for micro-movement detection.

6.6 Discussion

Results from the original baseline outperformed the state of the art when comparing with Moilanen et al. [109]. However, further needs to be done to address limitations. Firstly, using only spatial (XY) features does not allow for temporal difference considerations, and usually leads to the lowest result overall [9].

Both proposed methods in this Chapter use a block-based approach, in other words, split the face into a set amount of blocks to extract features from. These blocks can also be describe as small video cubes, referencing the3D video volume. One of the biggest limitation of this approach is that it can introduce unwanted descriptors from parts of the face that are not useful for micro-movement detection. Examples of irrelevant parts would be the neck and hair. A ways of localising the face parts to remove this information would be very beneficial for local feature analysis.

Results from the proposed ABT further outperformed the state of the art in both machine learning and difference analysis based approaches. Shreve et al. [6] obtained 74%TPs, 26%FNs and 44%FPs. Moilanen et al. [109] achieved the best result of 71%TPR. Finally, Li et al. [9] provided the most comparable results with a highest AUC of 92.98%. It should be noted that many results show different metrics for performance, indicating a need to standardise result outputs for a fairer comparison. Further experiments on other datasets than SAMM and CASME II would be advantageous to test the robustness ofABT, however the lack of baseline sequences within other datasets currently limit the experiments.

All results were calculated after manual peak detection completed by cross- checking when a movement peak crossed the threshold with the ground truth

a real-world scenario. One way around this problem would be to use automatics peak detection. As all the peaks are shaped similar to a Gaussian curve [150], if the zero-crossing is found and the threshold is crossed, then results could be generated much faster.

6.7 Summary

In this Chapter, a way of detecting micro-movements in the SAMM dataset was created by using participant neutral baselines as part of a threshold to determine when a micro-movement had occurred. An Adaptive Baseline Threshold (ABT) was introduced to balance the baseline feature values and adapt to the movement that was currently being processed.

To summarise, areas for improvement are detailed. Each dataset sequence used, whether this be a micro-movement or baseline sequence, was split into even blocks of 5×5 in dimension (25 blocks in total). Even though good results were obtained, splitting the whole face can include irrelevant information such as hair or have muscle regions split across multiple blocks.

It would be useful to know where on the face a person exhibits a micro- movement. The advantages of this include AU identification and to aid user un- derstanding. By using a block-based approach and averaging the blocks with the greatest feature difference values, the local information about where on the face the movement occurs is lost.

Local Feature Analysis with

FACS-Based Regions

In this Chapter, 26 face regions are defined based on Facial Action Cod- ing System. The proposed method becomes fully objective when a focus is placed on the muscle activation rather than emotional interpretation. Using these regions means that irrelevant information, such as hair, is removed during feature vector calculation, leading to better accuracy in determining a correct micro-movement. The method is validated on the two most recent micro-movement datasets: SAMM and CASME II.

7.1 Introduction

In this Chapter, 26 regions based on FACS are proposed to solve the problem of providing useful local information. Further, these regions are specifically created byFACS coders to align the regions toAUs, removing any irrelevant information that would be present in a block-based approach. The regions are fitted to the shape of each participant’s face by using a PWA transform. The main method fits the mask to the face, however fitting the face to the shape of the mask is also discussed.

3D HOG[5,98] is used as the feature that should best describe micro-movements, as the spatial only planes (XY) performed the best in the previous Chapter. Two other temporal features that have been used in micro-movement research,

Figure 7.1: The processing pipeline of the proposed method using FACS-based regions for micro-movement detection.

LBP-TOP [17, 94] and HOOF [9, 145], will be used to test the robustness of this approach. The proposed SAMMdataset is used alongsideCASME II [17] dataset to validate this approach.

This Chapter will conclude with an novel algorithm that combines the feature differences obtained from the local regions to output a video sequence with the local regions highlighted. Showing where a micro-movement occurred on the face can help the user understand micro-movements further, ready for interpretation, and allows forAUs to be predicted. The overall pipeline for this proposed method can be seen in Fig. 7.1.

In document La iniciativa para la Integración de la Infraestructura Regional Sudamericana (IIRSA) en Ecuador desde un enfoque de derechos humanos: el caso del proyecto del eje multimodal Manta-Manaos (página 75-80)