CATEGORIAS DE RIESGO - DETERMINACION DEL RIESGO DE AUDITORIA

CAPITULO II.............................................................................................................................. 2

3.4 DETERMINACION DEL RIESGO DE AUDITORIA

3.4.3 CATEGORIAS DE RIESGO

While the previously mentioned modes of interaction primarily focused on deictic gestures that are performed either directly by hand or with the help of a pointing device, other research approaches used eye-tracking for the resolution of deictic references. It can be assumed that eyes are an ideal pointer and a person’s eye movements and eye fixations strongly correlate with the person’s attention to an object in the environment (Just and Carpenter, 1976; Starker and Bolt, 1990). Sibert and Jacob (2000) argue that people easily gaze at the world while performing other tasks. So the additional effort for eye-gaze combined with other input techniques is quite low. In their studies they found out that the selection of objects by eye-gaze is faster than selecting with a mouse. People tend to look at things that are in the focus of their interest. Thus, it can be expected that a user is tracking the object he is talking about while he is formulating a speech command. Koons et al. (1991) utilise this feature of gaze and augment the interpretation of deictic references when the input from other modes is partial or seg- mented. They designed a prototype system that collects input from speech, gestures, and eye movements. In their emergency scenario they display a two-dimensional map on the screen and show icons representing helicopters, airplanes, trucks, fire crews, and fire locations (Figure 3.3). With the multimodal interface, the user is able to request information or give commands to modify the contents of the map database. The input from all three modalities is interpreted by modality-specific parsers that represent their results in a common intermediate frame-based form. The important thing is that each syntactical token of the interpretation is annotated with timing information. Since in their prototype both gestures and eye movements are adequately treated as contributors with deictic interpretations, the timing information can be used to replace ambiguous or underspecified information of the spoken request with their interpretation results. The resolution of the missing content is based on typed features of a taxonomy for the presented objects.

Figure 3.3 – Multimodal interface for speech, gesture and eye-gaze input. (Koons et al., 1991)

Moniri et al. (2012) extend the concept of eye-gaze based reference resolution with the identification of real objects in the environment. Their scenario takes place in a modern car which is instrumented with an eye tracker, head tracker, GPS logging module, two displays, and a speech recogniser. The aim of the presented system is to improve the car infotainment system with multimodal communication. While the driver controls the car, he is able to request information about objects in the environment. For example, the command “what is this building?” that is recognised by the speech recogniser is answered with information about the building that is actually in the line of sight of the driver. In contrast to the system introduced before, the referenced object is not displayed on a screen where the current 2-dimensional location is known but it is part of the real 3D world. Thus, the spatial resolution of the object is much more complex and is solved using an algorithm that regards the eye-tracking data, head tracking information, GPS position, orientation of the car, and a spatial model of the environment.

In Kern et al. (2010), drivers’ glances at the screen are used for an explicit gaze-based interaction. The approach realises a direct interaction on a visual display without the drawback of taking the hands off the steering wheel. The benefit of a hands-free interaction is an increase in road safety. The idea is to replace all actions that can be performed by a single touch on the touch screen with gazes. The duration of the gazes on the screen should not significantly increase in comparison to the use of the touch screen. While looking on the screen, the user gets a visual feedback by highlighting the object the user is currently looking at. Two strategies are tested for the selection of an item. The first strategy uses a gaze in combination with pressing a button. The

3.1 Overview of research in multimodal interaction 41

Figure 3.4 – Head and eye-tracking are used for identifying the building in the focus of attention of the co-driver (Moniri and M¨uller, 2012).

second uses a dwell time approach where the user has to look at an item for a predefined period of time (about 150-250ms). In an experimental application, gazes were used for the post-correction of unconstrained dictation for, e.g., email, twitter, or text messages. For this, a misunderstood word was selected by an eye-gaze and replaced with a word from a list of alternative recognition results.

Toyama et al. (2014) combine a head mounted eye-tracker with an augmented reality system on a head mounted display. The eyes are used to indicate regions of interest in text documents and to activate text recognition and translation functions. The mixed- reality system translates text snippets from Japanese to English and displays the result close to the Japanese text on the head mounted display. Two gaze gestures are proposed for activating OCR text reading and translation. The first strategy is to look at the beginning and the end of the text line alternately and repeatedly (gaze repetitive leap). The second strategy moves the gaze from the beginning to the end gradually (gaze scan).

In Qvarfordt (2005), eye-gaze patterns are used to directly influence the dialogue be- haviour on an interaction and not only to apply additional information. The presented iTourist system is an interactive system for city trip planning and exploits findings from a user study of human-human collaboration systems that showed that users’ interest can be sensed based on eye-gaze patterns. A city map contains icons for several points of interest, including hotels, restaurants, attractions, nightclubs, bus terminals, and the tourist information office. If a place is presented, the system gives information about the object via speech output and simultaneously shows images of the object. The activation of this presentation is exclusively controlled by eye-gazes. For this purpose, an algorithm was developed that calculates the activation level of every object on the screen.

The level depends on how long and how often a user focuses on an object. Thus, the presentation of an object starts if the activation level exceeds a defined threshold value, respectively it stops if the activation level drops below this threshold again. A second eye-gaze pattern detects whether the user switches back and forth between two places on the map. In this case information about the distance between these two objects is given.

Nowadays eye-gaze control systems are often used for enabling people with disabilities to communicate and interact with the world. They can use their eyes to write texts by looking on the keys of a virtual keyboard that is afterwards synthesised to spoken language. Other systems let them control graphical user interfaces by clicking on buttons or selecting elements on the screen. First implementations of eye-gaze control have already found their way onto the mass market. For example Samsung introduced with their smartphone Galaxy S4 the “smart pause” functionality. This feature recognises with the front camera if the user looks away from the handset and when watching a movie, the device will pause the film.

In the previously presented examples, eye-tracking plays an active role when commands are given to the computer. Stiefelhagen and Yang (1997) propose that eye-gaze can also be passively applied in a multimodal dialogue system. Their presented dialogue system is a system that allows one to work on diverse tasks that are each running in an individual window on the screen. The primary interaction is performed by speech commands. Although today in modern speech recognition systems the problem is not critical anymore, 20 years ago it was important to limit grammar size in order to main- tain performance and reliability. To achieve this, a camera-based gaze-tracking system detected the window that was currently in focus of the user. Thus, the grammar model could be reduced to grammar rules that only concern the tasks that are available for this window.

Marshall (2007) describes an approach for identifying the individual’s cognitive state from eye metrics and presents a set of metrics that are useful in measuring the cognitive awareness of the user. First is the index of cognitive activity that is determined from raw pupil measurements. This approach observes the high frequency details of pupil changes that are independent from pupillary responses to changes in light which is a slow dila- tion. A correlation between an increasing index and demanding cognitive processing has been validated across a number of complex cognitive tasks. Other metrics are blinking, the movement of the eyes, and a difference between horizontal location for left and right eye that permits inferences whether the eyes are focused on a specific feature or not. Although this information is not used to directly contribute to human-computer interaction, it provides valuable information about the user’s state. In a situation adaptive dialogue system it can be used to adapt the dialogue strategy currently being pursued by the system to the user’s attention.

3.1 Overview of research in multimodal interaction 43

In document UNIVERSIDAD MAYOR DE SAN ANDRÉS (página 51-59)