2. ESTUDIO DE CASO
2.4. Informe
An indexer (algorithm 6) has been developed to convert the raw source image data into a structured data set containing rows of behavioural descriptions and ground-truth comprehension labels. Each row represents a measurement of behavioural channels over a given period of time. The period of time over which behaviours are aggregated is an experimental variable discussed in Chapter 10. The data processing pipeline used to convert images into behavioural descriptors is based on that described in section 6.4 of the literature review.
The behavioural model extracted from images is detailed in section 9.2.1 of this chapter. In this section the indexing algorithm (algorithm 6) is discussed. Tutorial conversation logs, answer scores and image data were gathered during the pilot study of Hendrix 1.0 (Chapter 8). As shown in algorithm 6, the indexer first selects all tutorial users for which data is held. The indexer loads the user’s tutorial log file and extracts all answer entries. Each answer
entry contains a unique identifier and a score for the answer given.
Using the collection of answer unique identifiers, the indexer loads each question’s image set in turn. The image set for each answer contains a chrono- logically ordered set of 360 by 240 pixel images recorded from a front-facing web camera attached to the PC terminal on which Hendrix 1.0 was used during the pilot study (Chapter 8).
For each image set the indexer groups the images into batches representing a given time period. The number of images in each batch, or time window, is representative of a specified time over which behavioural observations will be aggregated. Buckingham et al. (2014); Rothwell et al. (2007) showed positive results when aggregating behaviours over a one-second time window, equivalent
9.2 Behavioural indexing tool 163 Algorithm 6: Pseudo-code algorithm for indexing non-verbal behaviour
1 Create a matrix to hold data as results; 2 Get all users as users;
3 for each user in users do
4 Extract (gender, ethnicity) from tutorial log file asdemographics; 5 Extract all (answer_id, answer_score
6 end
7 ) pairs from tutorial log file asanswers;
8 for each answer in answersdo
9 Read images from directory ’image_data/answer_id’ in chronological
order as images;
10 Groupimages by selecting n images and skipped i images to create
batches for images for time windows of observation;
11 for each window in windows do
12 for each image in window do
13 Create new matrix to hold non-verbal behaviour;
14 Search image for a f ace using an artificial neural network;
15 if f ace is found then
16 Search image for eyes using an artificial neural network; 17 if eyes is found then
18 Make measurements of behaviours in face and features
as nvb;
19 Add nvb to matrix;
20 end
21 end
22 end
23 Averagematrix columns to producesummary;
24 Add demographic +summary +answerscoreto results;
25 end 26 end
27 Randomise row order ofresults;
28 Write each vector and label in results to CSV file;
to 15 sequential images. The optimal time window to use is a parameter for experimentation which is discussed in Chapter 10.
Once images are grouped into time windows, the indexer uses an artificial neural network to search the image for faces fitting within a 50 by 50 pixel square. If a face is found, the face region is scanned using a second artificial neural network for 15 by 15 pixel regions containing eyes. While literature discussed in section 6.3 has highlighted a potential weakness of artificial neural
164 Image pre-processing and comprehension classifier training
networks with scale variance, the approach was selected for initial trial due to reported success in similar applications (section 6.3.1).
Should face and features be located within an image, a bank of pre-trained classifiers is used to make measurement of non-verbal behaviours. The non- verbal behaviours measured are detailed in section 9.2.1. Measurements are made by first reducing the dimensionality of the pixel data for a feature using principal component analysis (PCA), as discussed in section 6.3.1), and they are then passed into a specifically trained artificial neural network to produce a binary true or false classification, a +1.0 or -1.0 output, indicating the truth of a given behaviour. For example, a measurement will be made by selecting the left eye pixel data, reducing using PCA and passing to a classifier which will give a true or false value indicating if the eye gaze is directed forward.
Geometries, movement and skin tone changes are measured by calculating the change in the location of features and by sampling the colour values of pixel data contained within facial features.
True or false questions are asked in sequence for each feature and each relevant behaviour, resulting in a vector of +1.0 or -1.0 observations.
The vector of observed behaviour for each image is added to a matrix for the time window and then summarised using cumulative and average statistics to produce a single 39 variable descriptive vector, as discussed in section 6.4 of the literature review.
Finally, the demographic data for the user, behavioural vector and answer score are written as a single row in a comma separated value file. The process is repeated for each answer given by each user.
To facilitate the experimentation discussed in Chapter 10, the indexing process is repeated with a range of time window durations and intervals.