• No se han encontrado resultados

CAPITULO I Il contesto storico 11

3.3. Discursos alrededor de la novela

3.3.1. Italia

Most object categorization methods in robotics fall into one of two broad categories: 1) unsupervised methods, in which objects are categorized using unsupervised machine learning algorithms (e.g., k-Means, Hierarchical Clustering, etc.) and 2) supervised methods, in which a training set of objects is annotated with the correct labels and used to train a recognition model that can label new data points.

Several lines of research have demonstrated methods that enable robots to autonomously form internal object categories based on direct interaction with objects (Nakamura et al., 2007;

Griffith et al., 2012; Dag et al., 2010; Sun et al., 2010b). For example, Griffith et al. (2012) showed how a robot can use the frequencies of auditory and visual events in order to distinguish between container and non-container objects. Dag et al. (2010) and Sinapov and Stoytchev (2008) have also shown that, through interaction with objects, robots can learn to categorize and relate objects based on the types of effects that they produce as a result of the robot’s actions.

In contrast, supervised methods for object categorization attempt to establish a direct mapping between the robot’s object representation and human-provided semantic category labels. A wide variety of computer vision methods have been developed that attempt to solve this problem using visual image features coupled with machine learning classifiers (Fergus et al., 2004; Ponce, 2006; Opelt et al., 2006). Several such methods have been developed for use by robots, almost all working exclusively in the visual domain (Lopes and Chauhan, 2007;

Lai and Fox, 2009; Marton et al., 2009; Wohlkinger and Vincze, 2010; Leonardis and Fidler, 2011; Lai et al., 2011a). One advantage of visual object classifiers is that they can often be trained offline on large image datasets. Nevertheless, they cannot capture object properties that cannot always be perceived through vision alone (e.g., object compliance, object material, etc.). In other words, disembodied object category representations that are grounded solely in visual input cannot be used to capture object properties that require active interaction with an object. To address this limitation, the robot in this research grounded the semantic category labels of objects in its own sensorimotor experience with them, which is in stark contrast with approaches that rely purely on computer vision datasets.

Indeed, several studies have already demonstrated some ability of robots to assign category labels to objects based on interaction with them (Takamuku et al., 2007; Sinapov and Stoytchev, 2009; Araki et al., 2011; Chitta et al., 2011). For example, Takamuku et al. (2007) demonstrated that a robot can classify 9 different objects as either a rigid object, a paper object, or a plastic bottle using auditory and joint angle data obtained while the robot shook the objects. Araki et al. (2011) described a robot that learned to associate words describing an object (e.g., “cup”)

with object clusters discovered using an unsupervised method. Sinapov and Stoytchev (2009) showed that by applying five different exploratory behaviors on 36 objects, a robot may learn to recognize their material type and whether they are full or empty, based on the auditory feedback produced by the objects.

More recently, we proposed a graph-based learning method that allows a robot to esti-mate the category label of an object based on pairwise object similarity relations estiesti-mated from different couplings of five exploratory behaviors and two sensory modalities (Sinapov and Stoytchev, 2011). In that experiment, the robot was able to classify 25 objects according to object categories such as plastic bottles, objects with contents, pop cans, etc. The accuracy was substantially better than chance, despite the fact that visual feedback was not used.

Despite all of these advances, current work on category recognition suffers from two broad limitations. First, most object category recognition approaches are entirely vision-based and as such, they would be unable to detect object properties that cannot be extracted using vision alone. While some research has focused on using different sensory modalities coupled with actions, most studies to date use a small number of behaviors (typically just one) and a small number of sensory modalities. To address this limitation, the research described here grounds human-provided category labels in a wide variety of robot behaviors and sensory modalities.

Our results indicate that using a large number of different behaviors (10 in this case) coupled with the visual, auditory, and proprioceptive sensory modalities can enable a robot to recognize 20 different categories over a set of 100 different objects (Sinapov et al., 2012).

The second broad limitation of most existing approaches is that they only deal with human-provided semantic labels that can be expressed as a unary relations. For instance, any object category can be viewed as a collection of items that share some property (e.g., round, red, etc.).

Many semantic labels, however, cannot be expressed with unary relations. For example, the label “taller than”, can only be expressed as binary relation. Furthermore, in most learning tasks, the robot is only tasked with learning to detect the value of a given attribute (e.g., the color of an object). Such a robot would be able to classify a red ball as having the label

“red,” but would still be unable to detect that a set of objects vary by the attribute “color.”

To address these limitations, this document describes a relational approach to representing

semantic category labels that can handle many types of object relations beyond simple unary object categories (see Chapter 9).