II. MARCO TEÓRICO
2.1 Marco conceptual
2.1.4 Páginas Amarillas de Expertos
As described above, the RSVP technique consists of streams of stimuli (distrac- tors) that appear sequentially on the same spatial location and, within each stream, participants are told to find a particular stimulus of interest (target). With this description in mind, it is not difficult to see similarities with the odd- ball paradigm.
By inserting a few target images amongst a large amount of distractors, or non-targets, BCI systems can detect a P300 component when an image that contains a target is presented within a stream of pictures [Gerson et al., 2006;
Healy et al., 2010; Kruse & Makeig, 2007].
Moreover, since P300s are time-locked to the onset of the image that contained the target, the estimation of the temporal location of the image within the stream that can be obtained through the BCI has a lower variance than the one derived from user’s key presses [Huang et al., 2011; Luck, 2014], and it can provide a continuous measure of confidence [Huang et al., 2011].
Big organisations such as NASA and ESA have expressed their interest in us- ing BCIs for the automatic classification of images by intelligence analysts [Healy et al.,2010;Kruse & Makeig,2007]. If successful, this could speed up the process of revising large amounts of pictures that are acquired by means of satellites and other sensors throughout the Earth.
Mathan et al. [2006] showed grayscale satellite images of a port in an RSVP task while recording EEG from volunteers in two experiments [Mathan et al.,
2006, 2008], at presentation rates of 10 and 20 pictures/second. Mathan et al.
[2006] used a classifier trained with data from one participant to classify EEG epochs from the other, achieving high AUC values (0.84–0.85). In a second study, Mathan et al. [2008] presented images at a rate of 10 Hz to professional intelligence analysts. They showed that the BCI was able to speed up the target detection task without compromising detection accuracy (with respect to manual detection). This result was confirmed in their latest experiment [Huang et al.,
2011], in which they reported on an RSVP–BCI system (without any manual input) which was able to speed up the traditional broad area search paradigm by 5 times (measured in seconds/km2), a claim that has since been made re-
peatedly [Birisan & Beling, 2014; Marathe et al., 2016; Touryan et al., 2013]. Moreover, this work implemented an incremental learning (i.e., pre-trained clas- sifiers are given additional samples to adapt their parameters, rather than starting the training from scratch each time) approach [Poggio & Cauwenberghs,2001] in addition to cross-session generalisation, in which classifiers learnt using data from previous days (which was also successfully attempted by Manor & Geva [2015]). Cecotti et al. have studied different aspects of the RSVP paradigm and the evoked P300 response to targets, such as target probability in the streams of
images [Cecotti et al., 2011b], modality (i.e., audio vs visual RSVP, with and without key presses) [Cecotti et al.,2011a], and the impact on attention of adding a second visual task in parallel to the RSVP stream [Cecotti et al., 2012]. For instance, by varying target probability in a faces vs cars task (where participants had to press a key when they detected a face) they found that the behavioural performance and the amplitude and spatial distribution of the evoked potentials were significantly modulated by this parameter. They obtained the best classifier performance for target (face) detection at a target probability of 10% [Cecotti et al.,2011b]. Moreover, they showed that it is possible to use a BCI for detection of targets through RSVP while multitasking by adding a secondary task (e.g., in addition to the RSVP target detection, they asked participants to press a button when they detected a green dot on a map that was shown next to the RSVP stream) [Cecotti et al., 2011a, 2012; Marathe et al., 2016]. Even though performance dropped for the easier task, it did not change significantly in the difficult one (in this case, the RSVP target detection task).
In addition to studying the effects of modifying aspects of the RSVP task, they studied ways of improving the performance of BCIs that used this paradigm to detect targets. One way in which they did this was by generating artificial trials by adding a small jitter and Gaussian noise to the stimulus onset reference in the EEG epochs, which serves a three-fold objective: (1) increase the size of the target class, hence combating the class imbalance and (2) training times, and (3) adding variability to the data, so that the classifier is invariant to small time shifts [Cecotti et al., 2015]. In another experiment, they added a measure of the confidence from the user through the output of the classifier [Marathe et al.,2015]. The effects of non-stationarity of the EEG data were also considered. Cecotti &
Ries[2015] showed that the way in which epochs are selected to train a classifier has an impact on its performance, due to factors such as tiredness or habituation effects [Marathe et al.,2016]. Finally, another way to combat the changes in the EEG is by using active learning [Joshi et al., 2012; Tong & Chang, 2001], an iterative semi-supervised technique which has been applied to situations in which there is abundance of data but obtaining the labels is expensive [Marathe et al.,
2016]. In each iteration of active learning, an active learner (i.e., a classifier) selects what are the data samples that are most informative, and it queries an oracle or expert for their labels, so that they can be added to the training set of the next iteration. In this way, the labelling effort is greatly reduced [Joshi et al., 2012; Marathe et al., 2016]. According to Marathe et al.[2016], the query to the expert can be done by providing feedback to the user and looking for error-related potentials (an ERP that occurs when the output of the BCI is not the one intended or expected by the user) [Marathe et al., 2016], or through a behavioural response to that feedback. Perhaps, another interesting aspect of this work was the fact that, in some blocks, participants were presented with 500 ms long videos instead of static images as is traditional in image triage. The neural responses elicited by the short movies were more robust than those from the images [Marathe et al., 2016].
Other research groups attempted the classification of images by combining EEG signals and computer vision [Kapoor & Shenoy, 2008; Manor et al., 2016;
Pohlmeyer et al., 2010, 2011; Sajda et al., 2010; Uˇs´cumli´c et al., 2013]. One of the most important findings from the iterative design developed by Uˇs´cumli´c et al. [2013] was the fact that their EEG classifiers were able to detect types of targets that were different from the training and the test sets, even though the
behavioural responses showed that some types of targets were more difficult to discriminate than others (a fact that could also be observed from the ERP aver- ages, where some of the targets elicited larger P300 waveforms). In contrast,the approach taken byManor et al.[2016] attempted joint detection of targets (build- ings in satellite images, presented at 5 Hz and 10 Hz) by a multimodal neural network that fused information from the EEG and the image at the feature level. They tried two different approaches: a fully supervised system, in which the type of targets is known a priori, and a semi-supervised network, in which the type of target is unknown, so the computer vision component first trained an autoencoder on known non-target images and its output on target and non-target images was used as input features to the neural network together with the EEG response.
With respect to collaborative BCI approaches in the RSVP task,Yuan et al.
[2012] performed a collaborative target detection task with groups of 3 people using visual evoked potentials in offline and online experiments1. The BCI trained
with data from the three participants was able to detect targets faster than the subjects’ reaction times. Moreover, they reported an average increase of 11% in the accuracy of the group vs individuals (6% higher than the best individual).
Due to the speed at which the images are shown in RSVP experiments, par- ticipants do not usually have time to foveate to a target — that is, the target disappears before overt attention — the moving of the eyes in the direction of a salient item — can reach it. The reduction or absence of eye movements reported by Potter & Levy [1969] (at rates greater than 4 pictures/second) and Neider
1When talking about offline/online BCIs we follow the traditional definition: online BCIs
are those in which the brain signals of the participant are used at the time of collection to control the BCI. On the contrary, in an offline BCI data are recorded for posterior analysis, and the state of the interface is either static or manipulated by the experimenters.
et al. [2013] (for images shown for 150 ms or less) is particularly welcome in BCI systems based on RSVP stimulation, because EEG signals are severely dis- torted [Luck, 2014] by the artefacts produced by eye movements2.
For this reason, RSVP-based BCIs pose a very attractive alternative for de- veloping gaze-independent systems that are also suitable for severely locked-in people with no gaze control. Acqualagna et al. [2010] studied different presen- tation rates (83, 116 and 133 ms per item) in black and white vs colour (3 or 5 different colours) conditions [Acqualagna & Blankertz, 2011, 2013], and com- pared them to the traditional gaze-dependent speller from Farwell & Donchin
[1988]. They found that the RSVP-speller paradigm was a suitable option for people with severely impaired oculomotor control. Their results were replicated by Treder et al. [2011], who performed a variation of the RSVP paradigm at a presentation rate of 5 Hz.
Before finalising this section and moving on to the limitations of the RSVP technique, it is important to note that research using the combined RSVP–BCI paradigm has been done not only on na¨ıve participants, but also on experts in different fields, as shown on Table 2.1.
Moreover, a large amount of literature has compared behavioural responses with BCI performance and speed, and demonstrated that the combined RSVP- BCI paradigm is faster than traditional manual search and while being capable of maintaining the same levels of performance [Bigdely-Shamlo et al., 2008;Birisan & Beling, 2014; Kapoor & Shenoy,2008;Kapoor et al., 2008;Parra et al., 2008;
Pohlmeyer et al., 2011; Poolman et al., 2008; Sajda et al., 2010; Touryan et al.,
2However, while eye movements during a trial produce performance-reducing artefacts in
most BCIs, having one’s eye gaze fixated on a target as opposed to, for instance, a fixation
Table 2.1: Summary of BCI literature in which experts were used for RSVP tasks.
Profession Type of image Relevant literature
Intelligence analysts Satellite imagery
Birisan & Beling [2014];
Healy et al. [2010]; Huang et al. [2011]; Kruse & Makeig [2007]; Mathan et al. [2007,2008]
Transportation secu- rity officers
X-ray images of lug-
gage Trumbo et al. [2015]
Military Urban landscape pa-
trol simulation Touryan et al. [2013]
Medical Mammograms Hope et al. [2013]
2013].