erally, intentions potentially involved when considering "real" gaze – with these gaze cues. As a study by Hietanen et al. (2008) suggests, pictures of faces do not necessar- ily have the same effect on an observer as a real face does. The observer may lack the feeling of being looked at since she does not attribute any intentional, social meaning to the gaze cue.
The described studies do suggest that gaze can elicit both levels of response, reflexive as well as voluntary orienting, which raises the question about how these may co-occur and possibly interact. Moreover, it is unclear whether it is also voluntary orienting when children, for instance, follow their mothers’ gaze to establish joint and shared attention. Previously, attention shifts have been called voluntary or volitional in the context of (covert) orienting when such an attention shift was elicited by a central sym- bolic cue that predicted a target in an uncued location (e.g. Friesen et al., 2004). There may be a qualitative difference between using such a symbolic cue (after being told that it is useful and having trained to interpret it accordingly) compared to following some- one’s gaze to an object or person and inferring mental states of the gazer (because as a child one has learned that gaze-following potentially reveals interesting information).
2.5. Joint Attention and Language Comprehension
In the previous sections, we have explained that an important aspect of gaze is related to understanding that eyes capture information about the environment. Knowing that an individual’s gaze is often directed to entities in the vicinity and that this provides the individual with (visual) information about this entity makes gaze-following a use- ful strategy for learning (what does an unknown word refer to), survival (is there a source of danger) and smooth communication (what is my partner going to say, want or do). Baron-Cohen and colleagues (1995; 1997a; 1997b) showed in a number of stud- ies, for instance, that a speakers’ gaze direction can normally be a significant cue to the intended referent of the speaker. In one study, children were shown two nonsense shapes and were asked to indicate which of them was beb, a nonsense word. While first they had to guess and deliberately pointed at one shape, the second time a cartoon face named Charlie was placed between the shapes and looked at one of the shapes. Asking the children what Charlie thought was the beb, most of them pointed to the one that the face was looking at. Children with autism, in contrast, mostly stayed with their initial decision and failed to interpret the face’s gaze cue as an indicator for attention and de- sire with respect to a certain shape. These studies by Baron-Cohen and colleagues seem
2. The Utility of Gaze in Situated Communication
to suggest that autistic children typically do not read mental states from the eyes at all and even tend to prefer artificial cues such as arrows over reading eye direction. These results also suggest that gaze is an important cue to an individuals intentions and that not being able to interpret it as such indicates a deficiency in theory of mind formation which further disrupts social interaction ("it is the lack of mental state concepts that causes the failure to understand that eye-direction signifies this range of mental states", Baron-Cohen et al. 1995, p.394).
In addition to this general notion of visual attention and intention ascribed to gaze, a close coupling has been established between produced gaze and language compre- hension and production (reviewed in Section 2.3). Whether, and precisely how, the close alignment of gaze with spoken language production, for instance, helps listen- ers to identify and anticipate utterance content, is subject to ongoing research. The mentioned studies on joint and shared attention, however, clearly suggest that people do monitor and use each others gaze in face-to-face communication. In spoken com- munication, information obtained through gaze-following helps to rapidly ground and resolve spoken utterances with respect to a common environment (Moore and Dun- ham, 1995; Clark and Krych, 2004; Tomasello and Carpenter, 2007). Speakers’ gaze to an object can, thus, function as a visual reference to an object, augmenting linguistic references. Consequently, face-to-face communication produces not only utterance- mediated gaze, but also gaze-mediated gaze which potentially reflects states of joint visual attention.
Studies investigating the utility of such referential gaze cues in face-to-face commu- nication have provided evidence that listeners use speakers’ gaze to identify a referent in the scene before the utterance unambiguously identifies that referent (Hanna and Brennan, 2007). In a first experiment, Hanna and Brennan (2007) found that listen- ers follow and use speaker gaze to constrain their domain of interpretation such that (a) temporary ambiguity is disambiguated, and (b) reference resolution is enhanced since this information is available early during language processing. The experiment was conducted with a director and a matcher facing each other. Both had their own displays hidden behind a low barrier but were shown the other’s display at the begin- ning of the experiment. Displays contained either a mirrored object constellation, i.e., were congruent with each other as shown in Figure 2.1, or contained different spatial object arrangements (non-congruent) such that the director’s gaze was uninformative. The director instructed the matcher to move one of the displayed objects to a specific location. Such an instruction contained a referring expression of the form "the [color]
2.5. Joint Attention and Language Comprehension
Figure 2.1.: Sketch of experimental setting with reversed displays containing a target ("orange circle with three dots on it") and a far competitor, as described in Hanna and Bren- nan (2007). Original pictures and a more precise description can be found in the respective paper.
[shape] with [number of dots]". The display either contained a competitor object of the same shape and color next to the target (’near competitor’) or further away (’far com- petitor’) such that the referring expression was temporarily ambiguous, or it contained no competitor. In the ’near competitor’ condition the director’s gaze towards the tar- get was not clearly disambiguating while in condition ’far competitor’ director’s gaze more clearly distinguished between target and competitor. Results from matchers’ tar- get looks showed that the matcher identified the target before the linguistic point of disambiguation if displays were congruent. Moreover, in the ’far competitor’ condition participants seemed to identify the target as early as when there was no competitor at all, suggesting that director’s gaze was clearly disambiguating.
In a second experiment, Hanna and Brennan (2007) changed the display arrange- ments such that displays were either congruent (mirrored) or reversed. In the reverse condition, objects on the director’s right were to the matcher’s right such that director’s gaze needed to be re-mapped in order to be informative from the matcher’s perspec- tive. Matchers’ target fixations indicated that matchers used directors’ early target fix- ations (visual point of disambiguation) to initially orient towards the same (mirrored) side of their display. 1000ms after the visual point of disambiguation, matchers ap-
2. The Utility of Gaze in Situated Communication
parently remembered the display condition and began to adjust to that, i.e., oriented towards the opposite, "target" side when displays were reversed. The congruent condi- tion replicated previous results showing that director’s (i.e., speaker’s) gaze is an early disambiguating cue. Though matchers did not immediately follow directors’ gaze cue to the target they seemed to use this information about 1500ms later, between the color onset and the linguistic point of disambiguation, to identify the target. Interestingly, matchers’ target fixations in the reverse condition showed a considerably smaller but nonetheless significant benefit of director gaze for target identification. This suggests that speaker gaze helped to identify referents even when this gaze cue was initially misleading. Listeners seemed to establish a mapping of the speaker’s gaze to their own visual scene and, still, made use of the speaker’s gaze early during comprehension.
The results from Hanna and Brennan in addition to other previous results (e.g., Baron-Cohen et al., 1997b), suggest that people infer intended referents from the speaker’s gaze after the initial, reflexive response to gaze (Friesen and Kingstone, 1998; Driver et al., 1999; Langton and Bruce, 1999). That is, beyond the possibly reflexive at- tention shift in response to gaze, people seem to be able to impose the communicative context onto the visual stimulus and, thus, may still interpret the gaze cue as a visual reference which reflects communicative intentions.
The above mentioned findings show that gaze during spoken communication is sys- tematically and automatically coupled to situated speech. This close coupling in ad- dition to the general notion of seeing and visual attention ascribed to gaze may be the reason that listeners interpret speakers’ eye movements on-line as visual references to help rapidly identify, and disambiguate among, intended referents.