It is important to realize, however, that the mere existence of topographically arranged sensory ‘maps’ does not explain our ability to locate features in space. That is, the simple fact that certain cells in some map (or set of maps) selectively respond to modality- specific stimuli that are in a particular receptorally-defined location does not explain how we perceive or experience the stimuli as being located in an external three-dimensional space around our body. Nor does it explain how we represent spatial contents. In other words, the mere existence of such maps fails to explain the structure of our spatial phenomenology. In order to provide such an account, we first need to examine how the brain coordinates different types of maps.
For example, the structure of our phenomenal experience of visual space is not retinotopic, but rather is egocentric. However, topographically-organized retinotopic maps in the visual system cannot locate stimuli in visual egocentric space by themselves, because the same retinal coordinates can correspond to different egocentric locations depending on which way one’s eyes are pointing. Thus, one must coordinate information
regarding the location of retinal stimulation with information regarding the position of one’s eyes in order to begin to locate a stimulus in a visual egocentric space.
Indeed, even the retinotopic map found in V1—the earliest stage of processing the visual cortex—is not simply a mirror image of what is occurring at the retina. Rather, recent work in size constancy (Murray et al., 2006; Fang et al., 2008; Sperandio et al., 2012; Pooresmaeili, et al., 2013)has shown that the degree of eccentricity of activation in V1’s retinotopic map more closely reflects the perceived size of an image, and not its retinal image size. That is, the size of the image projected onto the retina by a stimulus can remain constant while the perceived size of the stimulus varies widely, with such variations precisely corresponding to the degree of eccentricity in V1 activation. This effect is most likely due to modulation of V1 by other brain areas that process other kinds of visual information (such as linear perspective cues, information about eye position and focus, binocular disparity information, and so on). It is this mechanism which underlies our capacity to perceive a particular object as having the same size irrespective of viewing distance, and to perceive different-sized objects as being of different sizes even when they both project equally-sized images on the retina due to a difference in their respective distances.
This sort of coordinating of different types of visual spatial information is an example of what Grush (2000) calls “stabilization-coordination” (or s-coordination), which involves establishing a relationship between different types of sensory and motor arrays in order to “stabilize” the elements of those arrays for the purposes of forming a higher-order
representation of those elements. This sort of coordination typically comes into play when coordinating sensory and motor information for the purpose of constructing a spatial map that is modality-specific. Furthermore, the resulting map is a “higher-order” representation insofar as it involves the construction of a spatial representation whose coordinate scheme is not specifically defined in terms of receptor stimulation.
Grush contrasts s-coordination with what he calls “coincidence-coordination” (or c- coordination), which is the process of coordinating two or more different sensory maps
which overlap in their parts or subparts in order to create a higher-order, “virtual” map.93 This sort of coordination comes into play when, for example, one is coordinating spatial maps in different modalities; i.e., in order to represent the fact that the region of space represented by our visual system and the region of space represented by our auditory system are in fact the same region of space. (Or, e.g., that the stimulus in the region of space represented by our auditory system is to the left of the region of space that is currently being represented by our visual system.) In one sense, the resulting map is “higher-order” in the sense that it is not tied to any particular modality. However, the map is also “higher-order” in the sense that it may not be actually physically instantiated in any single location in the brain (or anywhere, for that matter). That is, there need not be some topographically organized brain region wherein maps of different sensory
modalities are coordinated with one another. Rather, these “virtual” maps exist only as distributed, higher-order representations.
In what follows, I will describe an example of Grush-style multi-modal c-coordination of spatial maps. I have already given a rough sketch of how the visual system s-coordinates some kinds of visual information (specifically, the coordination of eye-position with retinal coordinates and the coordination of various types of size constancy information) to ‘stabilize’ elements of a higher-order map of location in visual egocentric space. Next, I will describe mechanisms of spatial localization in the auditory system, focusing on the way in which different types of auditory information are s-coordinated in order to locate auditory stimuli in space. I will then proceed to show how the visual system and auditory system maps can be c-coordinated (along with motor information) to provide a higher- order representation of multi-modal space. (Admittedly, this will be a vastly
oversimplified account.) Importantly, as we shall see, the crucial ingredient for c-
93 To illustrate this idea, Grush (2000) suggests the example of c-coordinating a map of California with a
map of Oregon: “a map of California and a map of Oregon, so long as each includes at least a bit of the surrounding region, can be coordinated by identifying these regions – the little bit of northern California on the southern end of the Oregon map is identified with the northern California on the California map, etc. One c-coordinates the two partial maps in order to construct a larger, higher order map. This higher order map may be virtual, in the sense that there is no need to actually physically abut the maps. The two
component maps might even by at very different scales, and thus impossible to physically join so as to get a viable physical map.” (p.67)
coordination of different maps is the existence of a common frame of reference that allows the brain to integrate different kinds of sensory and motor information into an inter-translatable coordinate scheme.