We have discussed Biederman’s (1987) viewpoint- invariant theory, according to which ease of object recognition is unaffected by the observer’s viewpoint. In contrast, viewpoint-dependent theories (e.g., Tarr & Bülthoff, 1995, 1998) assume that changes in viewpoint reduce the speed and/or accuracy of object recognition. According to such theories, “Object represent- ations are collections of views that depict the appearance of objects from specifi c viewpoints” (Tarr & Bülthoff, 1995). As a consequence, object recognition is easier when an observer’s view of an object corresponds to one of the stored views of that object.
Object recognition is sometimes viewpoint- dependent and sometimes viewpoint-invariant. According to Tarr and Bülthoff (1995), viewpoint- invariant mechanisms are typically used when object recognition involves making easy cat egorical discriminations (e.g., between cars and bicycles). In contrast, viewpoint- dependent mechanisms are more important when the task requires diffi cult within-category discriminations (e.g., between different makes of car).
Evidence consistent with the above general approach was reported by Tarr, Williams, Hayward, and Gauthier (1998). They considered recognition of the same three-dimensional objects under various conditions across nine experi- ments. Performance was close to viewpoint- invariant when the object recognition task was easy (e.g., detailed feedback after each trial). However, it was viewpoint-dependent when the task was diffi cult (e.g., no feedback provided).
Vanrie, Béatse, Wagemans, Sunaert, and van Hecke (2002) also found that task complexity infl uenced whether object recognition was viewpoint-dependent or viewpoint-invariant. Observers saw pairs of three-dimensional block fi gures in different orientations, and decided whether they represented the same fi gure (i.e., matching or non-matching). Non-matches were produced in two ways:
Object recognition is rather fl exible. As Hayward and Tarr (2005) pointed out, you could put a working light-bulb on top of almost any object, and perceive it to be a lamp.
9781841695402_4_003.indd 90
An invariance condition, in which the side
(1)
components were tilted upward or down- ward by 10°.
A rotation condition, in which one object
(2)
was the mirror image of the other (see Figure 3.10).
Vanrie et al. predicted that object recognition would be viewpoint-invariant in the much simpler invariance condition, but would be viewpoint-dependent in the more complex rotation condition.
What did Vanrie et al. (2002) fi nd? As predicted, performance in the invariance condi- tion was not infl uenced by the angular differ- ence between the two objects (see Figure 3.11). Also as predicted, performance in the rotation condition was strongly viewpoint-dependent because it was greatly affected by alteration in angular difference (see Figure 3.11).
(a)
(b)
Figure 3.10 Non-matching stimuli in (a) the invariance condition and (b) the rotation condition. Reprinted from Vanrie et al. (2002), Copyright © 2002, with permission from Elsevier.
Figure 3.11 Speed of performance in (a) the invariance condition and (b) the rotation condition as a function of angular difference and trial type (matching vs. non-matching). Based on data in Vanrie et al. (2002).
Non-matching trials Matching trials 2500 2000 1500 1000 500 0 0 30 60 90 120 150 180
Angular difference between objects in degrees
Mean reaction time (ms)
2500 2000 1500 1000 500 0 0 30 60 90 120 150 180
Angular difference between objects in degrees
Mean reaction time (ms)
Non-matching trials Matching trials
(a) Invariance condition
(b) Rotation condition
9781841695402_4_003.indd 91
Blais, Arguin, and Marleau (2009) argued that some kinds of visual information about objects are processed in the same way, regard- less of rotation. In contrast, the processing of other kinds of visual information does depend on rotation. They obtained support for that argument in studies on visual search. Some visual processing (e.g., conjunctions of features) was viewpoint-invariant, whereas other visual processing (e.g., depth processing) was viewpoint- dependent.
Some theorists (e.g., Foster & Gilson, 2002; Hayward, 2003) argue that viewpoint- dependent and viewpoint-invariant informa- tion are combined co-operatively to produce object recognition. Supporting evidence was reported by Foster and Gilson (2002). Observers saw pairs of simple three-dimensional objects constructed from connected cylinders (see Figure 3.12), and decided whether the two images showed the same object or two different ones. When two objects were different, they could differ in a viewpoint-invariant feature (i.e., number of parts) and/or various viewpoint- dependent features (e.g., part length, angle of join between parts). The key fi nding was that observers used both kinds of information together. This suggests that we make use of all available information in object recognition.
Evaluation
We know now that it would be a gross over- simplifi cation to argue that object recognition is always viewpoint-dependent or viewpoint-
invariant. The extent to which object recog- nition is primarily viewpoint-dependent or viewpoint-invariant depends on several factors, such as whether between- or within-category discriminations are required, and more gen- erally on task complexity. The notion that all the available information (whether viewpoint- dependent or viewpoint-invariant) is used in parallel to facilitate object recognition has received some support.
Most of the evidence suggesting that object recognition is viewpoint-dependent is rather indirect. For example, it has sometimes been found that the time required to identify two objects as the same increases as the amount of rotation of the object increases (e.g., Biederman & Gerhardstein, 1993). All that really shows is that some process is performed more slowly when the angle of rotation is greater (Blais et al., 2009). That process may occur early in visual processing. If so, the increased reaction time might be of little or no relevance to the theoretical controversy between viewpoint- dependent and viewpoint-invariant theories. In the next section, we consider an alternative approach to object recognition based on cognitive