• No se han encontrado resultados

Del Derecho a la ciudad, el Derecho al Trabajo y el Espacio Público

In the experiments described above, we have seen that given a visual display and an utterance, participants were quickly able to integrate the different sources of information and attended to relevant regions as well as to use visual information for language processing purposes. It is, however, not clear whether the current form of the Visual World Paradigm is suitable to investigate situated language processing in all aspects. In particular, to determine whether the non-linguistic aspects of cognition and possible limitations they impose on the integration of language and situation are accounted for appropriately, it is important to characterize similarities and differences between the Visual World Paradigm and naturally occurring situations in which language is processed. Firstly, many natural language processing situations include two or more interlocutors and the linguistic material consists of dialogue, rather than isolated sentences. We will leave this aspect aside, however, and focus on the differences with regard to the situation. A typical language comprehension situation would not be restricted to a computer screen, but rather include the whole physical surrounding of the language comprehender. Important differences between, e.g., the clip-art displays used in Altmann & Kamide (1999) and a natural situation include the physical nature, the complexity, the dynamics, the physical and the temporal extension of the situation. These aspects will now be discussed in turn to identify questions, which require further experimental investigation.

Perceiving a physical object as opposed to a stylized clip-art picture is likely to lead to a much richer internal representation as it usually exhibits more details and affords physical manipulation. This could, in principle, result in the language comprehender assigning it a higher significance over clip-art pictures. A number of visual world studies used real-world objects that had to be manipulated by the participants (e.g. Tanenhaus et al., 1995). To our knowledge, language-mediated eye movements were similar to those in studies using clip-art scenes, suggesting that participants do not treat physical objects in a privileged way during language processing.

The different degrees of complexity in natural situations as compared to arrays and Ersatzscenes have been pointed out by Henderson & Ferreira (2004), but received little attention otherwise in the visual world literature. The vast majority of experiments used displays with 3-5 objects – this is a number that can be held in working memory simultaneously and arguably might also be attended at the same time. Andersson, Ferreira & Henderson (2011) tackled this issue, by using photographs of highly cluttered scenes. While participants still fixated mentioned objects in this setting, the overall probability was lower and the latency longer than in most existing studies. This suggests that highly simplified scenes provide us with results qualitatively comparable to natural situations, but considerably amplified.

The dynamics of a natural situation compared to a static clip-art display comprise the unfolding nature of ongoing events as well as movements of objects including their appearance and disappearance. Knoeferle & Crocker (2007) and Ellsiepen, Knoeferle & Crocker (2008) approximated an unfolding event using a sequence of clip-art scenes, where the event was completed and the protagonists static at the time the sentence was played. Compared to a static scene that depicted the event as ongoing, the information provided by the event received less consideration, but was still exploited to a certain degree for anticipation and syntactic disambiguation. While this loss in impact could theoretically be related to the dynamic nature of presentation, it is more likely that it is due to the event being completed and having to be retrieved from working memory, while the characters in the scene were still present.

The physical extension of the scenes used in most experiments are small enough to be completely in the field of view and thus easily surveyed without moving the head. In the experiments in Altmann & Kamide (1999), e.g., the whole scene subtended approximately 33◦ of visual angle horizontally. In contrast to this, a surrounding visual context could only be fully exploited by the listener, if internal representations of the objects out of view were accessible to the language processing system. While no results on immersive environments within the Visual World Paradigm have been reported so far, the blank screen paradigm (Altmann 2004, see section 2.1.4) does address the usage of internal representations. However, in these experiments, the visual context is constrained to only four objects, a quantity that can easily be held in visual working memory simultaneously. A natural scene would almost always surpass this limit by a multitude. While the possibility to use internal representations suggests that in an immersive environment the language comprehender can draw on internal representations for objects out of view, it is problematic to estimate their influence in a visually rich surrounding.

in a natural situation. In most experiments, the participant is presented with a new visual context with every new sentence, whereas in a natural situation, although there might be dynamic changes to the environment at every instant, a lot of features will stay the same across sentences, or even dialogues. On the one hand, participants in a visual world experiment have thus comparatively little time to build up a representation of the scene. On the other hand, they might be more attentive to their visual context, because it is completely new and it is presented to them in connection with a sentence. Importantly, the task can play a role in whether the participant tries to actively integrate scene and sentence or not. If the task is to manipulate scene objects in accordance to spoken instructions (Tanenhaus et al., 1995; Allopenna et al., 1998), the participant has to establish reference to perform the task. In this case, the influence of purely linguistic processing on the direction of visual attention cannot be distinguished from non-linguistic task-related internal goals. In the look and listen task (Altmann & Kamide, 1999; Knoeferle & Crocker, 2006), on the other hand, the presence of the visual context is not entirely motivated for the participants, giving rise to the possibility that participants actively or implicitly engage in looking for connections between sentence and scene. Of course, this is also possible to happen in natural situated language processing under certain circumstances. Still, the strong stance of language guiding attention automatically loses some power of persuasion with this option.

These differences show that based on the current findings in the Visual World Paradigm, it is difficult to estimate the potential impact of memory and visual attention processes or restrictions on situated language processing. Also, it is not obvious how predictions of the CIA and the FOA translate to more natural language processing situations as the notions of memory and attention remain to some extent unspecified. The next section will provide some background on the cognitive components that we identified as being relevant in order to formulate specific questions on how visual attention, spatial indexing and working memory representations influence situated language processing.

2.2. Cognitive Components Involved in Situated Language