• No se han encontrado resultados

The perceptual layer plays similar role to The external component of Slater & Usoh’s model (1993), and the bottommost layer of Steuer’s model (1992). We propose that this layer represents a series of perceptual analyzers, there being at least one per sensory modality. The need for separate analyzers per modality is derived in part from the ideas of Slater (1999), who suggests that presence is a function of many separate display system factors. Slater argues that many of these factors are independent of each other, and this independence allows VR systems engineers, who generally work with limited display resources, to trade factors off against each other to maximize presence in a particular system. In order to model this independence, it is necessary to have separate perceptual analyzers, so that the level of activation in any one analyzer can be held constant while the activation the others can be changed.

For example, Slater’s argument implies that a VE which includes both sound and graphics might produce different presence levels to one which only includes graphics, and some evidence to this effect is presented by Hendrix & Barfield (1995b).

By modeling the perceptual analyzers of sound and vision as separate entities, it is possible to explain this phenomenon by arguing that in the graphics-only VE, only the vision analyzer is activated, while in the graphics and sound VE, the graphics analyzer remains activated, but the auditory analyzer becomes active as well. It is this increase in overall activation which leads to the increase in presence.

CHAPTER 5:THE CONNECTIONIST MODEL OF PRESENCE 67

The top level of each perceptual analyzer contains two nodes (labeled ‘O’ and ‘R’ in Figure 5-1) – one of these nodes (the ‘O node’) will become activated if the perception by that modality results in the percept being identified as an object, and the other (the ‘R node’) will becomes activated if the perception by that modality results in the percept as being identified as a rendition of an object. We define a rendition of an object as any symbol or reproduction which aims to represent or symbolize an object, regardless of how physically dissimilar the rendition is from the object. In the case of vision, renditions include photographs, video footage, geometric models, sketches and even a verbal description of a scene can all be regarded as renditions of a scene. The modeling of the O and R nodes is derived partly from the work of Lombard & Ditton (1997) who argue that presence is the illusion that the VE experience is not mediated by technology. In this model, mediation would be experienced when a the R (rendition) node is highly activated, leading to the perception of the objects not as real, but as representations of real objects. On the other hand, if the O node is highly active, this will lead to the perception of the virtual environment of consisting of real objects, and consequently, the sense of non-mediation.

The O and R nodes are connected to the action layer by means of excitatory connections. Each O and R node is connected to many action nodes, with each connection encoding one of the possible responses which are appropriate for any particular percept. Activation of an O or R node will lead to a partial activation of all the nodes in the action layer to which it is connected. Usually, the activation provided by the perceptual layer is not sufficient to activate the action nodes to a level where a response occurs.

In order for that to occur, a certain amount of activation is also required by from the conceptual layers.

For example, a user who perceives an object as a door in a virtual environment may respond to that object in one of many possible ways, none of which is directly suggested by the percept itself, but rather by previous experiences as well as by abstract knowledge about doors (Norman, 1988). In a few cases, such as reflex or automatic actions, perceptual stimulation alone will provide sufficient activation of the action layer for a response to occur.

5.1.1

Further motivation for modeling the O and R nodes

Slater & Steed (2000) emphasize the difference between experiencing data originating from the “real world” (which they term being in an R state), and experiencing data emanating from a VE display (which they term being in a V state). Specifically, they note that data from a VE display will contain glitches in image quality, sudden changes in frame rates etc. This type of display artifact clearly creates a distinction between VE data and “Real world” data, but we think the distinction is more subtle that

“real world”/VE. Our decision to create a separate apparatus for detecting objects and renditions of objects is based largely upon the work of Gibson (1979). Gibson suggests that the rendition of an object is not simply a poor quality reproduction of an object, which contains less information than the object itself. Rather, Gibson points out, a rendition is often identified as such by the fact that the rendition contains information which the object itself does not. For example, we can appreciate the difference between an apple and the photograph of an apple not simply due to the failings of the photograph to capture all the information present in the apple (such as depth information, its hardness, smell, etc), but also due to the fact that a photograph of an apple will also contain features of the photograph such as glossy reflections, the grain of the film, and perhaps the smell of photographic chemicals, which the object does not contain. Gibson further points out that it is possible to fool the visual system into perceiving a rendition when it is in fact an object on display. As an example, Gibson presents a study in which researchers prepared a window of a house so that a majority of subjects reported it as being a framed photograph of a garden rather than a garden itself. Based on these results, we reject the notion that a continuum of quality or amount of information exists between an object and a rendition of that object, rather suggesting that each of these is perceived separately. We propose that the process of deciding if a percept is an object or a rendition occurs as one of the final steps of perception (due to its relatively high level of abstractness), and we thus place it at the top of the perceptual analyzer. Furthermore, we allow the ‘O nodes’ and ‘R nodes’ to connect to different action layer nodes, based on the notion that the response to an object will tend to be different to the response

CHAPTER 5:THE CONNECTIONIST MODEL OF PRESENCE 68

to the rendition of that object. For instance, on perceiving an apple (object), my first reaction might be to smell it, but on perceiving a photograph of an apple (rendition of the object), my first reaction would probably not be to smell it, but rather a different response.

This same idea is recently expressed by Slater (2002b) as the notion that presence occurs as a process of selecting between two competing hypotheses: that the percept is either representing a real scene, or a virtual scene. In Slater’s view, the scene presents evidence to the viewer which may or may not convince them that the scene they perceive is real. Our conceptual layer can be thought of as an explanation of the mechanism by which this happens. The O node represents the hypothesis that the scene (or a particular aspect of the scene) is real, and the R node represents the hypothesis that the scene is not real. The level of activation in each node represents the degree of confidence that the hypothesis is true, and the final determination of which of the two is true occurs as a process of the activation and mutual inhibition between the two nodes.

As Slater (2002b) reminds us, it is usually not the case that a percept is identified simultaneously as both an object and a rendition, as this would lead to ambiguity and confusion. For example, a wax apple (rendition) might be mistaken for an apple (object), but once it has been identified as a wax apple, it is not likely be perceived as a real apple. This disambiguation feature of perception is important, and we model this by placing the O and R nodes on the same layer, and allowing inhibitory connections between them. These connections allow disambiguation by allowing the node which is more activated to inhibit and thus overpower the other, ensuring that at any moment, one of the two is dominant over the other.

5.1.2

Implications of the perceptual layer model

a) Using an interactive activation and competition model allows for perception to contribute to presence by degrees rather than in a dichotic “on/off” style. Evidence that perception affects presence in this way is contributed by Lessiter, Freeman and Davidoff (2001), who conducted a study into the effect of varying the quality of audio playback on presence. They created four levels of audio quality, and found that presence increased in each subsequent level of quality.

This suggests that presence varies continuously as a function of the perceptions by a single modality. The variable activation connections between each perceptual analyzer and the action layer in this model allows for the replication of this effect.

b) This model allows modeling of the benefits of multimodality to presence (as demonstrated, for example by Salln?s (1999), who found that presence in a multimodal environment was superior to that in a unimodal environment) by modeling each of the sensory input channels as a separate, independent, perceptual analyzer, each of which is independently capable of contributing activation to the action layer.

c) This model also explains how a stimulus which is initially perceived as a rendition can, with increased exposure, be later perceived as an object. This effect is demonstrated by Nass, Steuer & Tauber (1994). They asked a group of subjects to engage in a text-based dialog with a computer agent, aware that the agent was only a program, and not another human user. After a period of interaction with the agent, Nass et al. found that subjects’ interaction style changed, behaving towards the agent as if they were interacting with another person (for instance by using polite expressions, etc). This finding suggests that although the subjects began by perceiving the program as simply a rendition of a conversation with another person, exposure to the system changed the perception to that of an actual conversation with a person.

This type of effect is modeled by postulating separation between the O node and the R node, each with its own connections to the action layer, and by postulating an inhibitory connection between the O and R nodes to allow disambiguation between of the two.

CHAPTER 5:THE CONNECTIONIST MODEL OF PRESENCE 69

d) Each of the O and R nodes has its own connections to the action layer. This models the possibility of different reactions based on whether the stimulus is perceived as an object or a rendition. This effect can be seen, for example, in the difference in postural reactions which can be caused by changing from a simple projected display (which is more likely to be perceived as a rendition) to a stereoscopic projected display (which is more likely to be perceived as an object), reported by Freeman et al. (2000).