Beatus ille. El valor de la naturaleza - La construcción de la serie

EL CAMINO. UN RELATO TRADUCIDO EN MIRADAS

3.2. El camino. La mirada de un niño en un mundo rural

3.3.2. La construcción de la serie

3.3.2.6. Beatus ille. El valor de la naturaleza

Gestures in general are of great importance in the context of humanoid robotics, especially from the point of view of human-robot interaction (Breazeal, 2002). On one hand, the idea to build robots whose physical appearance resembles humans is intended to facilitate the interaction with robots by catering on our strong tendency toward anthropomorphisation (Duffy, 2003). On the other hand, body language, and gestures made with hands in particular, are of paramount importance for human communication (Kendon, 1980; McNeill, 1992; Goldin-Meadow, 2003; Hostetter, 2011). It is not surprising therefore that the amount of literature on gestures in human-robot interaction is very large. Considering that the communicative gestures are not the focus of the present work, a thorough review of this broad field is far beyond the scope of this thesis. Nevertheless, recognising the relevance of the topic, a few selected prominent robotic studies that looked at gestures are presented below.

Naturally, much of the early work on gestures in humanoid robots focused on tackling the engineering issues that arise already at the level of gesture production.

As an example one can take the work of Marjanovi´c, Scassellati and Williamson (1996) who implemented a visually-guided pointing system in the humanoid robot Cog. The task of the system was to saccade the robot’s eyes to the target, and then generate a smooth trajectory for the robot’s arm that would bring it into the configuration corresponding to pointing to the target. This involved autonomous learning of the mapping between the space of the plane of the image perceived by the robot’s wide-angle camera and the robot-centred frame of reference in the space of the pan and tilt camera movements in order to achieve the saccadic behaviour. The realisation of the robot arm movements was based on linear interpolation between four available ‘postural primitives’ and a bi-directional mapping between the visual and motor spaces. Both the ‘ballistic map’ (the mapping from the eye position to

the arm position) and the ‘forward map’ (the mapping from the arm position to the eye position) were implemented using the radial basis function approach, and were trained simultaneously with the least-mean-squares gradient descent method based on sample reaching movements and the robot’s visual observation of its own arm.

The main limitations of the system were that it did not use all of the degrees of freedom available in the robot and that the workspace covered by the pointing was only two-dimensional.

An important category of communicative gestures are the representational ges-tures, which include

movements that represent the content of speech by pointing to a referent in the physical environment (deictic gestures), depicting a referent with the motion or shape of the hands (iconic gestures), or depicting a concrete referent or indicating a spatial location for an abstract idea (metaphoric gestures) (Hostetter & Alibali, 2008, p. 495).

Since such gestures co-occur with and semantically supplement speech, the two to-gether constitute a form of multi-modal communication. Salem, Kopp, Wachsmuth, Rohlfing and Joublin (2012) proposed a system for the production of representational gestures accompanying synthesised speech on a humanoid robot asimo (Sakagami et al., 2002) aimed at achieving more natural human-robot interaction (see also Salem, 2012). The system was based on segmentation of the continuous multimodal communication signal into chunks representing single ideas. Temporal synchronisa-tion within chunks was achieved by adapting the gesture producsynchronisa-tion to the timing dictated by the structure of the synthesised speech. The repertoire of the available gestures included iconic gestures (which illustrated the shapes or sizes of objects), pantomimic gestures (which demonstrated the activity the robot was referring to verbally), as well as deictic gestures (location indications). The effectiveness of the system was assessed empirically in a study with human participants who interac-ted with the robot in three conditions (speech without gestures, gestures congruent with speech and gestures incongruent with speech) and evaluated the experience afterwards. Multimodal communication turned out to be associated with higher

ratings of the quality of the interaction with the robot, and, interestingly, on most of the criteria (6 out of 8) the scores were higher in the incongruent than in the con-gruent speech-gesture condition. This led the authors to conclude that ‘imperfect’

communicative behaviour of a robot may lead to even stronger positive response in humans.

In order to facilitate natural, bi-directional human-robot interaction, in addition to being able to produce gestures, the robots need the ability to interpret the body language of others. Hafner and Kaplan (2005) considered the case of the pointing gestures as a means of non-verbally directing the other party’s attention. Possession of such a skill is important for example in order to bootstrap influencing the attention verbally. Hafner and Kaplan considered a scenario with two Sony aibo robots. The first robot, the ‘adult’, performed pointing gestures in an attempt to direct the attention of the other robot, the ‘child’, to an object located either to the left or to the right. The task of the child robot was to look at the adult robot and learn to exploit its indications in order to find the object being pointed to. Gesture recognition was achieved using a multi-layer perceptron classifier, trained on a set of 2300 sample images taken with the camera of the child robot while it looked at the parent robot. The images were pre-processed by extracting the features that have been determined to be the most suitable for the classification using pruning methods. The classifier was then trained via backpropagation. The achieved success rate of discriminating pointing to the right from pointing to the left reached 95.96%

using the selected subset of the image features, and 98.83% when all considered features were used. While limited in several ways, the study was, according to its authors, the first attempt to teach a robot to interpret the pointing gestures of another robot.

The progress in computer vision algorithms as well as in sensing technology enables the construction of increasingly more sophisticated systems of gesture re-cognition. Recently, Fanello, Gori, Metta and Odone (2013) proposed a system for recognition of 3-dimensional actions in the context of human-machine interaction,

capable of one-shot learning (i.e. learning based on a single example) and classify-ing the actions in real-time. In addition to colour video data, the system relies on the depth information (which can be obtained e.g. using the Microsoft Kinect

sensor, Kinect for Windows website, 2013) to identify the region of interest in the input image. According to the authors this significantly reduces the complexity of the system architecture and permits the use of simpler features to describe the per-formed action without sacrificing the discriminative power. The features in terms of which the actions in the region of interest are described are the 3D histogram of flow and the global histogram of oriented gradient. These are followed by a sparse-coding stage which finally feeds the data to the linear Support Vector Machine-based clas-sifier. Optionally, the system can also use the body part tracking data (available as well through the Microsoft Kinect sensor), in order to isolate the hands of the person and thus allow the recognition of hand gestures. The system of Fanello et al.

is capable of high classification accuracy while retaining real-time performance (25 frames per second on a 2.4 GHz personal computer). It has been applied for example in a human-robot interaction scenario in which a human competes with the iCub humanoid robot (see section 4.3) in a gesture memorisation game (Gori, Fanello, Metta & Odone, 2012). In the game the players take turns in performing a sequence of gestures, always starting with repeating the gestures made by the opponent in the previous turn and including an additional gesture at the end. The challenge for the human is to memorise the prolonging sequence of gestures correctly, while the robot may lose due to the failures in the recognition process. The game has been awarded the second place in the ChaLearn 2011/2012 One-shot-learning Gesture Challenge demo competition (ChaLearn Gesture Challenge website, 2013).

3.6 Computational Modelling in Mathematical

In document Las imágenes de Josefina Molina: de la escritura literaria a la audiovisual (página 117-124)