• No se han encontrado resultados

0 1 2 3 4 5 6

Efficiency Natural interaction

Averagegradefromparticipants

Keys Mouse Gestures

Figure 2-13: Survey results for document browsing task. All 19 participants graded the naturalness and efficiency of interaction on a scale of 1 to 5, 5 meaning best.

agent like Sam (see Section 2.2.1) can help the participants when directly triggered.

Also, head gaze can determine if the speaker is talking to the avatar or to another person.

Head gaze is also an important cue for estimating the attention of the user and can help to recognize whether the participant understood the last explanation. Mack, described in Section 2.1.1, is an example of such a system, where the head gaze is used to determine grounding by observing if the user also looked at the map when pointing to it. In the interactive system described Sections 2.1.2 and 2.3.1, Mel uses head gaze to know if the user looked at the pointed object, the iGlassware.

Eye Gaze Eye gaze and head gaze are usually correlated and most observations made for head gaze also apply to eye gaze. Estimating both head and eye gaze can be used by an interactive system to know when someone is talking to it. When trying to estimate the focus of attention of the user, it is particularly useful to estimate the eye gaze of the participant if the targets are close to each other (i.e. small field-of-view).

The “conversational tooltips” experiment described in Section 2.2.2 suggests that head gaze can be used to estimate the focus of interest of the user. If the targets were closer to each other, the use of eye gaze would definitely improve the accuracy of the system.

Head Gestures Head nodding is a natural gesture for grounding. Even when in-teracting with an avatar who cannot recognize head nods, human participants did perform head nods (see Section 2.1.2). The user study described in Section 2.3.1 shows that people head nod more often when the interactive interface is able to perceive gestures and gives feedback of it awareness. This result demonstrates that adding perceptual abilities to a humanoid robot that the human is aware of and gets feedback about provides a way to affect the outcome of the human-robot interaction.

In our experiment with gesture-based interactions (see Section 2.4.1), we showed that head nods and head shakes can be useful for non-embodied interfaces. Hu-man participants naturally used head gestures over conventional input devices like

keyboard and mouse when answering dialog boxes.

Eye GesturesIn Section 2.2.3, we presented a user study where human participants naturally performed gaze aversion gestures when interacting with an embodied agent.

A gaze aversion gesture while a person is thinking may indicate the person is not finished with their conversational turn. If the embodied agent senses the aversion gesture, it can correctly wait for mutual gaze to be re-established before taking its turn.

Chapter 3

Visual Feedback Recognition

Visual feedback such as head nodding and gaze aversion are naturally performed by human participants when interacting with an embodied agent. As discussed in the previous chapter, the recognition of visual feedback can improve the performance of embodied and non-embodied interactive interfaces. In this chapter, we describe our algorithms for accurate, online head gaze, eye gaze, head gesture and eye gesture recognition using a monocular or stereo camera.

Estimating head gaze accurately for an extended period of time without drifting is a great challenge. In Section 3.1, we present a new Adaptive View-based Appearance Model (AVAM) that can be acquired online during tracking and used to accurately estimate head gaze over a long period of time1. The main novelty of our approach relies on the fact that estimating the pose of a newly acquired frame will also improve the quality of the view-based appearance model. Given that the head gaze path crosses itself during tracking, our AVAM model will be able to estimate the user gaze with bounded drift.

In Section 3.2, we present our approach for eye gaze estimation based on a pre-acquired view-based appearance model. The eye appearance model was built using eye images from 16 subjects looking at 35 different targets under different lighting conditions. With our approach, eye gaze estimation can be performed from low

1This work was done in collaboration with Ali Rahimi. It was originally published at CVPR 2003 [66].

resolution images. Our approach is user independent and can handle glasses and changes in lighting condition.

In Section 3.3, we introduce a new algorithm for visual gestures recognition, the Frame-based Hidden Conditional Random Field (FHCRF). Given the estimated head gaze and eye gaze, our FHCRF model can accurately recognize and discriminate visual gestures like head nodding and gaze aversion from other natural gestures. Our discriminative model learns both sub-gesture patterns and the dynamics between gestures to achieve better performance. In our results we demonstrate that using the FHCRF model for visual gesture recognition outperforms models based on Support Vector Machines (SVMs), Hidden Markov Models (HMMs), and Conditional Random Fields (CRFs).

Documento similar