Variaciones por Intereses - Activo Numero de Cuenta

The work presented in this thesis investigates the addition of eye tracking to the gamut of bodily tracking devices used in ICVEs. The primary interactive use of eye tracking is to enhance the behavioural fidelity of avatars with faithful reproduction of their embodied users’ oculesic cues during AMC. Gaze is investigated throughout the main experimental work presented in Chapters4–6, and an avatar that features gaze behaviour that is replicated from their embodied user may be referred to as a tracked gaze avatar. Eyelid movement and pupil dilation are also investigated in Chapter6, and an avatar featuring the three oculesic cues of gaze, eyelid movement, and pupil dilation may be referred to as an oculesic avatar. Tracked gaze and oculesic AMC has the potential to support superior gaze awareness than that possible in VMC, as the spatial context of ICVEs allows users to move freely within a perceptually-unified shared environment. Avatars generally exhibit the attentional cue of head orientation, driven by the prerequisite head tracking. Head orientation is a significant contributor to how observers estimate an individual’s attention, but provides lower fidelity than the combination of both head orientation and gaze direction channels.

2.3. Avatars 63

Figure 2.14: Sequences of immersive AMC illustrating the changing focus of attention inferred from direction of gaze (represented by the green circle) despite similar head orientation duringgreeting (top sequence) and apuzzle task (bottom sequence). The addition of tracked gaze enables communicational gaze rather than just attentional head orientation.

Figure2.14presents two sequences of interaction, taken from the experiment documented in Chap- ter5. The two sequences are displayed horizontally and aim to illustrate the operational difference between the use of tracked gaze avatars, and avatars featuring just head movement. The two sequences each show three images captured from the perspective of a participant engaged in three-party object-focused AMC. Each avatar’s eye movement is driven by an eye tracker worn a user, and similarly, each avatar’s head is updated based on user movement from head tracking. While both sequences of images show similar views of the virtual scene, and corresponding head orientation can be used as a general indicator of attention, the user’s direction of gaze indicated by the green circle varies dramatically throughout the interaction. Thus, in a VE populated with objects and avatars, the benefit of tracked gaze avatars is likely to become apparent.

In summary, the richness of attentional signalling and observation during AMC between users of ICVE systems corresponds to tracking capability. Due to the shared virtual space provided by ICVE systems, tracked gaze AMC allows gaze to be used similarly to collocated interaction, thereby overcom- ing VMC’s restriction on gross movement and indicating objects with eye direction. Work prior to that presented in or associated (such as [MRS+07]) with this thesis, avatar and agent gaze has been driven by simulation, or appeared static. The following sections review work in the agent and avatar literature, together with studies found in the general computer graphics and animation literature, to present the current state-of-art in oculesic behaviour of virtual humans. Corresponding to the prior review of human oculesics, presented in Section2.1, gaze, eyelid kinematics, and pupil dilation are covered.

Gaze Models

As the complexity of VEs and avatar behaviour increases, so does the difficulty in maintaining a direct correlation between the user’s wishes and the avatar’s actions [PSS+01]. Control of the full range of human nonverbal cues via tracking is impractical, and also too complex and temporal to be directed by means of manual input. Consequently, models directing various behavioural channels have emerged

2.3. Avatars 64 [GB04]. Gaze models aim to generate naturalistic eye movement for a given interactional state and scenario in order to enhance the realism of a virtual humanoid. Other than VEs, application areas of gaze models include video games, computer-generated films, and general models of attention or gaze prediction.

Gaze models typically exhibit several types of characteristic behaviours which are often inferred by the current state of an unfolding interaction. In the case of conversation, this has included who is speaking and who is listening [PLBB02]. Further input to such analytical models often include parameters corresponding to behavioural components of gaze such as fixation time, angular velocity and saccade magnitude. These values implement statistical generalisations about human gaze behaviour derived from empirical studies of saccades [GB06] and/or statistical models of eye tracking data [PLBB02]. Manip- ulation of such parameters can dramatically influence how an avatar’s psychological state is perceived, including being excited or sleepy [DLN05], and dominant or submissive [KG08]. Critically for AMC, results from associated user studies indicate that avatars exhibiting gaze behaviour that is directly related to the current interactional state are able to significantly improve subjective quality of communication compared to static gaze (no eye movement) or random gaze (eye movement is not inferred from any interactional state) [GSBS01,GSV+03,DLN05]. Such findings support Vilhjalmsson et al.’s assertion that, in order for avatars to meaningfully contribute to communication, their animation needs to reflect some aspect of the interaction that is taking place [VC98].

Lee et al.’s Eyes Alive model is based on both empirical studies of saccades and statistical models derived from empirical eye tracking data captured from dyadic conversation [PLBB02]. Eye trajectory kinematics were extracted from the eye tracking data, which was further segmented according to whether the wearer was speaking or listening. The model takes into account the dynamic characteristics of eye movement, including saccade magnitude, direction, duration, velocity, and inter-saccadic interval. An autonomous virtual agent head was used to exhibit various methods of gaze control: static, random, and model-based. On a standard display, experimental participants were then asked to give feedback relating to the perceived naturalness of the agent’s gaze. Results indicated that model-generated gaze was perceived as more natural, friendly and outgoing, while stationary gaze was perceived as lifeless, and random gaze gave an unstable element to the agent.

Garau et al. [GSBS01] presented a parametric gaze model which took timings from the classic social science research on collocated dyadic conversations including Argyle and Cook [AC76], Argyle and Ingham [AIAM73], and Kendon and Cook [KC69]. Similarly to Lee et al.’s Eyes Alive model, gaze animations varied between speaking and listening states. For the speaking state, mean saccade fixation was 1.8 seconds when looking towards the conversational partner, and 2.1 seconds when looking somewhere else in the visual field. A mean frequency of 14 “at partner” glances per minute was programmed. For the listening state, mean saccade fixation was 2.5 seconds when looking at the conversational partner, and 1.6 seconds when looking away. A mean frequency of 17 “at partner” glances per minute was programmed. When looking at the conversational partner in both speaking and listening states, the avatar’s eyes focused directly ahead, assuming that this was the partner’s location. Values

2.3. Avatars 65 for vertical and horizontal angles of “away” gaze were chosen randomly from a uniformly distributed range between 0–15◦. The associated experiment investigated impact of gaze on perceived quality of communication by comparing different gaze behaviour. Pairs of participants were asked to conduct a conversational role-playing task over a non-immersive video-tunnel link, on which an virtual humanoid representing the partner was displayed. The avatar exhibited either modelled or random gaze. Results indicated that an avatar whose gaze behaviour was directly related to the conversation consistently and significantly improved the quality of communication compared to random gaze.

Vinayagamoorthy et al. later combined the statistical elements of the Eyes Alive model with the timing data implemented in Garau et al.’s model [VGSS04]. Consequently, the model presented an approach to generating natural gaze motion which took theoretical information from social psychology studies to infer the gross gaze behaviour, and augmented this with the spatio-temporal eye trajectories derived from eye tracking data. Led by Garau, the user study extended those associated with the inform- ing models [GSV+03]. While the same role-playing task in [GSBS01] was used, AMC was performed in an ICVE between pairs of participants using either a CAVE system or wearing an HMD. As well as gaze behaviour (a component of behavioural fidelity), the experiment also investigated representational fidelity of avatars, embodying users either with ‘cartoonish’, or photo-real avatar representations. These full-body avatars appeared life-size to participants, and exhibited either random or model-based gaze behaviour. Speaking and listening states were both divided into sub-states of “at partner” and “away”, de- termined by head-orientation derived from head tracking data. Findings relating to the photo-real avatars concurred with the previous studies, showing that gaze models inferred from interactional states can sim- ulate behaviour that significantly enhanced the perceived quality of communication. However responses to the lower-realism avatars were adversely affected by the more realistic inferred gaze, supporting the theory of a significant interaction effect between appearance and behaviour as subsequently addressed in [Gar06]. It was also noted that experimental participants stood facing each other and maintained appropriate personal space in accordance to Hall’s social proxemics classifications [Hal68]. Subsequently, Yee and Bailenson’s work into the Proteus Effect observed this phenomenon in non-immersive AMC [YB07].

Gaze behaviour is a significant indicator of level of engagement, and gaze models have also been implemented as modules in broader simulations of human attention for agents, illustrating an alternate paradigm for simulating eye movement. Gu and Badler’s model [GB06] aims to provide agents with human-like responses to environmental stimuli by modelling aspects of human vision, memory and attention. The model implements low-level motor control of saccades and smooth pursuits, together with high-level gaze patterns that consider multi-party turn taking practices that include next speaker selec- tion, engagement, cognitive workload, and distractions. The model views cognitive resources of agents as a finite resource. Thus, as an agent is assigned more demanding tasks or conversational situations, their mental workload increases and more attention is devoted to the most salient feature in a scene, consequently increasing the likelihood of missing an unexpected event or environmental distraction. Similarly, Peters and O’Sullivan introduced a gaze model as part of a broad agent animation system

2.3. Avatars 66 that infers interest from other agents in a scene, based on gaze, head orientation, body posture, and locomotion [PPB+05]. Subsequently, the observing agent makes the decision to continue speaking or take another action. The model takes into account external occurrences in the environment for both the speaker and the listener, and generates appropriate gaze to adjust engagement levels accordingly.

While the Gu and Badler [GB06] and Peters et al. [PPB+05] models do not have associated user studies, and are presented in rather theoretical manners, their approach to gaze modelling highlights the complexity and unpredictability of human social interaction that models based purely on statistical and empirical observations are unable to consider. Previous work on visual attention modelling on static images and dynamic video scenes provides a saliency-based approach to gaze modelling that, combined with statistical properties of eye movement, is investigated in Chapter7. The remainder of this section covers work related to development of the model.

The aim of modelling visual attention is to focus computational resources on a specific, salient region within a scene. Koch and Ullmans framework for simulating human visual attention focuses on the idea that the control structure underlying visual attention needs to represent such locations within a topographic saliency map [KU85]. Multiple image features such as colour, orientation, and intensity may be combined, forming a saliency map that reflects areas of attention. Similarly, the intrinsic saliency of an object within a scene can be derived from parameters such as proximity, eccentricity, orientation, and velocity [FW99]. Computation of this intrinsic saliency can then be used to determine the spatial coding of gaze fixations in the virtual scene.

The extrinsic saliency of an object determines the duration of fixations [PLN02], and is concerned with coherent fixation distributions during inspection of a scene. Henderson and Hollingworth [HH99] review this area of high-level scene perception research further, which concerns the role of eye move- ments in scene perception, focusing on the influence of ongoing cognitive processing on the position and duration of fixations in a scene. Their review speculates whether ongoing perceptual and semantic processing accounts for the variability of fixation durations, which range from less than 50 ms to more than 1000 ms in a skewed-distribution with a mode of 230 ms. The average fixation duration during scene viewing is also stated to be 330 ms, with a significant variability around this mean. The review suggests that the fixation positions are not random, rather that they cluster on both visually and semantically informative regions. Spatial distribution of the first few fixations in a scene seems to be controlled by both the visual features in the scene and global semantic characteristics. As viewing progresses and local regions are fixated and semantically analysed, positions of later fixations come to be controlled by both the visual and semantic characteristics of local regions. The length of time the eyes remain in a given region is immediately affected by both characteristics.

Henderson and Hollingworth’s research [HH99] leads to the hypothesis that the eye will be attracted to regions of a virtual scene that convey the most important information for scene interpretation. The intrinsic saliency of an object in a scene determines the spatial distribution of fixations, inferred from continuous interaction within the virtual scene. The extrinsic saliency drives the temporal coding of fixations, determining their duration. The gaze model presented in Chapter7 takes head tracking as

2.3. Avatars 67 input, which defines a user’s current view of a virtual scene. The extrinsic saliency of objects and other avatars is then calculated, and gaze is distributed accordingly. The approach also implements a plausible linear interpolation algorithm for the dynamics of the human eyeball.

In document Activo Numero de Cuenta (página 53-58)