• No se han encontrado resultados

Unas palabras más sobre el problema terminológico

CAPITULO I Il contesto storico 11

3.1. Unas palabras más sobre el problema terminológico

In the field of robotics, object recognition is almost exclusively considered to be a com-puter vision problem. Research in psychology and cognitive science, however, highlights the importance of sensory modalities other than vision for object recognition tasks. For example, Sapp et al. (2000) describe a study in which toddlers were presented with a sponge that was deceptively painted as a rock. As expected, the toddlers believed that the object was a rock until the moment they interacted with it (by touching it or picking it up). This and several other studies (Heller, 1992) illustrate that proprioceptive information (i.e., how objects feel when lifted or pushed) can be very useful when vision alone is insufficient. Studies have also shown that tactile exploratory behaviors are commonly used by infants when exploring a novel object (Ruff, 1984). For example, Stack and Tsonis (1999) have reported that, in the absence

of visual cues, 7-month-old infants use more efficient tactile exploratory strategies and can perform tactile surface recognition to some extent.

Natural sound is also an important source of cues about objects. The work of Gaver (1993) and Grassi (2005) has shown that even when a direct line of sight is not available, humans can extract the physical properties of objects from the sounds that they produce. The importance of everyday natural sounds is perhaps best summarized by Don Norman in his book “The Design of Everyday Things”:

“[. . . ] natural sound is as essential as visual information because sound tells us about things that we can’t see, and it does so while our eyes are occupied elsewhere.

Natural sounds reflect the complex interaction of natural objects: the way one part moves against another; the material of which the parts are made – hollow or solid, metal or wood, soft or hard, rough or smooth.” (Norman, 1988, p. 103)

According to Gaver (1993), the ecological approach to perception provides the insight that listening consists of perceiving the properties of the sound’s source (e.g., bouncing ball, car engine, footsteps, etc.), rather than the properties of the sound itself (e.g., pitch, tone, etc.).

These insights have been confirmed by multiple experimental studies. For example, Giordano and McAdams (2006) demonstrated that humans can accurately recognize an object’s material (e.g., wood, glass, steel, or plexiglass) when listening to the sounds generated when the object is struck. Sound also allows us to perceive many physical properties of objects. Grassi (2005) showed that human subjects were able to provide reasonably good estimates for the size of a ball dropped on a plate by simply listening to the impact sound.

In addition to perceiving the physical properties of objects, non-visual sensory modalities are also useful for object individuation. Wilcox et al. (2006) describe several experiments documenting how infants use auditory information when figuring out whether two stimuli are produced by the same object or by two different objects. Their findings show that sounds that reveal the physical properties and the structure of objects (e.g., rattling sounds) are more useful for individuation than sounds that do not (e.g., tones produced by an electric keyboard).

In a follow-up study, Wilcox et al. (2007) conducted experiments that showed how prior

experience with an object in the tactile sensory domain can subsequently improve an infant’s object individuation performance when using color alone. More specifically, their results re-vealed that combined tactile and visual exploration of objects increases the sensitivity to color differences of 10.5-month-old infants on an object individuation task. According to Wilcox et al. (2007), one possible explanation for this observation is that combined visual and tactile exploration of objects produces more detailed and robust object representations than the ones attained when using visual exploration alone. In fact, other research in psychology has shown that object exploration in a natural setting (as opposed to a research lab) is an inherently multi-modal process. Consider the simple act of touching an object. In Chapter 4 of “Tactual Perception: A Sourcebook”, Lederman writes:

“Perceiving the texture of a surface by touch is a multi-modal task in which infor-mation from several different sensory channels is available. In addition to cuta-neous and thermal input, kinesthetic, auditory, and visual cues may be used when texture is perceived by touching a surface. Texture perception by touch, therefore, offers an excellent opportunity to study both the integrated and the independent ac-tions of sensory systems. Furthermore, it can be used to investigate many other traditional perceptual functions, such as lateralization, sensory dominance, and in-tegration masking, figural aftereffects, and pattern recognition.” (Lederman, 1982, p. 131)

Indeed, Lynott and Connell (2009) have shown that humans require the use of two or more sensory modalities to accurately represent many object properties (e.g., texture, stiffness, and material type). This finding suggests that humans can integrate feedback from multiple channels of information in an efficient manner when perceiving objects. Ernst and Bulthof (2004) provide some details on how this is done based on an experimental study in which human participants were tasked with inferring an object’s height using both proprioceptive and visual feedback. Their results suggest that humans use a weighted combination of the predictions of the two modalities, where the weights are proportional the estimated reliability of each modality (Ernst and Bulthof, 2004). The weighted combination ensures that a sensory

modality that is not useful in a given context will not dominate over other more reliable channels of information.

Inspired by these findings from psychology, this dissertation shows that a robot’s ability to represent objects and perceive their properties may be greatly improved if the robot can experience the objects through a wide variety of sensory modalities. More specifically, this dissertation aims to show that many object properties can only be grounded successfully if the robot is allowed to use non-visual sensory feedback. Indeed, the studies described in this dissertation have already shown that a robot can recognize objects and their properties us-ing auditory, proprioceptive, and tactile feedback, provided that the robot can estimate the reliability of each modality in different sensorimotor contexts (see Chapters 4and 5).

2.1.4 Object Perception using Exploratory Behaviors

One way in which humans leverage information from different sensory modalities is through the use of what psychologists call exploratory behaviors (Power, 2000) or exploratory procedures (Lederman and Klatzky, 1990). In his book, “Play and Exploration in Children and Animals”, Power writes:

“[ . . . ] exploratory behavior in infancy and childhood appears to serve an information-gathering function. Using a variety of methods, researchers have demonstrated that during exploration infants and young children extract at least short-term informa-tion about the characteristic of objects, including informainforma-tion about texture, hard-ness, weight, shape, size, and sound potential.” (Power, 2000)

Infants’ use of exploratory behaviors when learning about objects is tightly connected to their ability to detect sensory events that occur over the course of object manipulation. Gibson (1988) concludes that our basic knowledge about how objects behave in the natural world is gathered through constant observation of how objects are affected by our own actions during play. In other words, when exploring an object, infants observe perceptual outcomes (e.g., sounds and movement patterns) that are subsequently used to form expectations about how an object behaves when a specific action is applied on in (Gibson, 1988).

Numerous experiments in psychology have investigated how such expectations are formed and how they are used to anticipate events in the future. For example, Hauf and Aschersleben (2008) have shown that 9-month-old infants can predict the occurrence of auditory and visual events that occur after pressing a button. The same line of research has even shown that exploratory behaviors may have a role in the early social development of infants. An experiment by Hauf et al. (2007) investigated infants’ interest in the actions of others and showed that infants are more interested in watching another person manipulate an object if they themselves have had a chance to explore the object beforehand.

Other research has studied how exploratory behaviors enable infants to ground object prop-erties in their own experience with objects. In a study by Paulus and Hauf (2011), 11-month-old infants were initially exposed to objects of two different materials, one heavy and one light, and after exploring the objects through manipulation, the infants showed preference for the lighter objects. Furthermore, at 13 months, the infants were able to associate the visual appearance of objects with their material type and used that knowledge to show preference towards novel objects made of the lighter material (Paulus and Hauf, 2011).

Combined, these studies show that the ability to apply exploratory behaviors on objects is fundamental to the development of motor, perceptual, and social skills in infancy. An important question is how systematic object exploration strategies emerge over the course of infant development. Power notes:

“[ . . . ] exploratory behaviors become more planful and systematic and are less driven by stimulus characteristics, with increasing child age. Moreover, the use of system-atic exploratory strategies is associated with greater information yield.” (Power, 2000)

Thus, when infants first start exploring objects through actions, their behaviors tend to be random and seemingly without an intended purpose or plan. As the infant develops, how-ever, exploration strategies become more systematic and show greater levels of intent. This progression is likely mediated by the acquisition of object knowledge, which serves to guide the application of specific exploratory behaviors intended to uncover specific object properties.

The research described in this dissertation is largely inspired by these findings from psy-chology. Therefore, the robot in this work explored objects using a wide variety of behaviors, many of them modeled after the ones performed by infants, toddlers, and young children (e.g., scratching, shaking, pushing, grasping, etc.). The studies described here have indeed shown that by using exploratory behaviors, a robot may recognize objects (see Chapters 4 and 5), solve the odd-one-out task (see Chapter 6), as well as assign category labels to novel objects (see Chapter7).