Presentación y Apertura de las Ofertas - Formularios del Contrato

Sección IX. Formularios del Contrato

D. Presentación y Apertura de las Ofertas

For robot swarms to be fully aware of a human’s location, the direction in which the human is facing needs to be determined. This is achieved by estimating the pose of the human’s face. To estimate the face pose, the first step involves face detection. After a face has been detected, relative measures from different

face poses are computed using a face score system. The face scores are used determine the angular positionφ and the distance d between a human and a UAV. The estimated angular distance (φ,d) is used as a measure for coordinated UAV deployment and human-relative localization, as presented in the next sections.

4.3.2.1 Face Detection

Face detection is an active topic in computer vision due to its significant role in many real-world applications such as, face recognition, gaze detection, and pose estimation. However, detecting faces is a challenging problem due to the factors associated with illumination conditions, facial expressions, and camera position. In general, face detection provides a normalized and user-centric view of humans. In the context of HRI, face detection identifies the direction and visual orientation in which humans are facing and is functional in determining the relative angular and radial position of humans from the viewpoint of robots.

(a) (b) (c)

Figure 4.7. Face pose estimation using the frontal camera of an airborne Parrot. Face windows (bounding boxes) around the detected face of a human. Identified face poses: (a) Right, (b) Center, (c) Left.

Face detection is performed using the OpenCV implementation of the Viola- Jones face detector [Viola and Jones,2004], which computes a face window (or

bounding box) around a detected face. As face detectors are insensitive to small changes in orientation and position, they have the ability to compute multiple face windows around a detected face[Couture-Beil et al.,2010a,b], as shown in

Figure 4.7. The number of detected face windows (bounding boxes) represent the recognition confidence of a face detection classifier. The larger the number of detected windows, the more confident the classifier is in detecting a face and vice versa. We use the recognition confidence to estimate the face pose.

In practice, the recognition confidence is obtained by setting the OpenCV face detection parameter minNeighbors (which specifies the number of neighbours

each candidate window should retain) to the minimal value, which identifies all groups and subgroups of neighbouring windows clustered around the face. The recognition confidence from OpenCV’s face detector is used to build a face score system, which is introduced in the next section.

As robots can easily loose a detected face and detect false positives (FPs), a Kalman Filter is adopted for smoothing face detection estimates. Using a nearest neighbour strategy with the Mahalanobis distance as an estimate (i.e., computing the covariance of the detected face windows), the best window to use for face tracking is determined. In addition, the face centroid F_cent(x, y) is computed by averaging the centroid of all detected face windows.

4.3.2.2 Face Score System

Inspired by the well known AdaBoost technique [Viola and Jones, 2001] that

implements a robust face detector capable of detecting frontal-views of faces, we use a combination of two face detectors for detecting faces from frontal and lateral views. In practice, we consider two pretrained Haar feature-based cascade classifiers (OpenCV face detectors) which are used by every UAV in the swarm. One Haar classifier F C_f is trained on the frontal-views of the face profile and the other classifier F Cs is trained on the lateral-views (left and right views) of the

face profile, as illustrated in Figure 4.7. The red coloured face windows show detections from F C_f (i.e., the frontal-view classifier) and the blue coloured face windows are the outcomes of F C_s (i.e., the lateral-view classifier). For every image i acquired by the frontal camera of a Parrot, four relative face measures F m= {F m_f, F mf f, F ms, F ms f} are computed:

(i) F m_f (frontal-view): Computed by running classifier F C_f on image i. (ii) F m_{f f} (frontal-view flipped): Image i is flipped horizontally 180◦to obtain

i_h which is processed by classifier F Cs.

(iii) F ms (lateral-view): Computed by running classifier F Cs on image i.

(iv) F ms f (lateral-view flipped): Obtained by running classifier F Cs on ih.

The four face measures, namely, the frontal-view, frontal-view flipped, lateral- view, and lateral-view flipped, each represent the number of detected face windows (see Section4.3.2.1). To represent F m in terms of a meaningful represen- tation, a set of three face scores{Sc, Sr, Sl} are computed using Sc= F mf+ F mf f,

S_r = F m_s and S_l = F m_{s f}. The three face scores{Sc, Sr, Sl} are used for estimat-

is detected with a high confidence and vice versa for low scores. For instance, when a robot is positioned directly in front of a human (i.e., frontal-view), the value of Scis larger than Sr and Sl. However, if a robot is positioned towards the

left or right of the human (i.e., lateral-views), the value of S_l or S_r respectively (depending upon the side) is larger than S_c. If all three scores are below the threshold ST H, we consider that the human is not present in field of view of the

robot or the face (human) is too far away to be reliably detected.

The relative distance between a human and a robot is computed as the average area of face windows, using d =Pi

TFA(i)/T, where F_A= [F m_{ar ea}(i), . . . , F m_{ar ea}(T)]

represents the total area of all face windows and T = (F mf+F mf f+F ms+F ms f)

denotes the total number face windows. Large values of d indicate that the robot is near to the human and small values indicate that the robot is far away.

4.3.2.3 Learning Face Pose Estimates

The face scores and the relative human-robot distance {Si c, S i r, S i l, d i_{} computed}

from a single image i are referred to as face pose features. The face pose fea- tures represent the estimated angular distance (φ,d) between a human’s face and a UAV. We consider the face pose estimation problem as a supervised learning task, in which the objective is to predict the face poseφ in a [0, 180◦] semi- circular plane in front of the human operator. To learn and predict face poses, we adopt the Locally Weighted Projectron Regression (LWPR) algorithm [Vijayaku- mar et al.,2002] which belongs to a family of online incremental learning meth-

ods that perform piecewise linear function approximation using regression. As the LWPR is a non-parametric local learning system that makes use of a mixture of locally linear kernalized regressors, it learns a non-linear regression function with 2nd-order online methods and makes use of samples (observations) arriv- ing incrementally over the course of time. For more details regarding the LWPR refer to [Klanke et al.,2008;Glaude et al.,2011;Vijayakumar et al.,2005].

Consider a supervised non-linear regression task in which x_i = {Si c, S i r, S i l, d i_}

represents a set of face pose features computed from image i and y_i denotes the face pose φi (i.e., the target label) in i. Given a set of N training samples as

input-output tuples ({x, . . . , xN},{φ, . . . , φN}), the LWPR learns the relationship

(mapping) between the face pose features and the face pose for every sample in N. For a set of M testing and validation samples{x, . . . , xM}, the task of the LWPR

algorithm is to predict the face pose of every testing sample{φ, . . . , φM}. To learn

and predict face poses using the LWPR, a dataset of face images is acquired using a swarm of N = 4 airborne Parrots (see Section 6.1). The experimental results for face pose estimation using a single UAV are reported in Section6.6.2.2.

4.3.2.4 Localization of Humans

Using information from the face pose features {Sc, Sr, Sl, d} and the predicted

face poseφ (which are computed from every acquired image), UAVs in a swarm can deploy and localize relative to the location of human operators. Considering a swarm of R = {1, 2, . . . , N} UAVs for r ∈ R, the goal of every Parrot r is to move to a target position that optimizes the swarm’s spatial distribution. This is achieved by using the following set of local mobility rules:

Radial Positioning (Rule 1): With the goal of gathering better quality observa-

tions, the radial position of each Parrot is selected at angular intervals such that the human is surrounded in a[0, 180◦] semi-circular plane. At every control step t, the angular distance (rt

φ,rdt) between the human’s face and a robot is com-

puted. The angular distance (rt

φ,rdt) is used as feedback for the UAV’s attitude

controller to simultaneously steer the roll and pitch. This allows the robot to manoeuvre itself 180◦/N degrees apart from the other robots while maintaining an optimal distance d= 2m between itself and the human. At the swarm-level, this results in the maximization of the angular distance of every robot with respect to its closest neighbours. This approach works well as long as a minimum distance of d = 1.5m is enforced between neighbouring UAVs.

Tangential Positioning (Rule 2): With the aim of increasing the amount of mu-

tual information collectively gathered by a UAV swarm, the predicted face pose (at every control step) rt

φ is used by a UAV to manoeuvre its tangential position

by steering the yaw angle. As soon as a UAV detects the human’s face, it fixates its position in the direction facing towards the human.

Altitude Positioning (Rule 3): When interacting with UAVs that are located on

the ground, it is natural for humans to bend their body and tilt their head down. However, when UAVs are airborne, the goal of each UAV is to maintain a fixed altitude with respect to the height of the human operator[Nagi et al.,2014b,a].

To achieve this, at every control step t a Parrot checks it’s elevation component and maintains a fixed altitude with respect to the human’s height. This manoeuvre is performed by constantly minimizing the Euclidean distance between the face centroid Fcent(x, y)t(see Section4.3.2.1) and the centroid of acquired image.

At every control step, each Parrot estimates its angular, radial, and elevation components using the local mobility rules and steers its heading in the direction

provided by the resultant vector while maintaining a fixed altitude. The com- bined application of these rules enables UAVs in a swarm to position themselves along a semi-circle surrounding the human, as illustrated in Figure 4.8.

(a) (b)

Figure 4.8. Spatially-aware swarm deployment and human-relative localization using a swarm of N = 4 airborne Parrots.

4.4 Summary of Experimental Results

The experimental results and discussion of this chapter are given in Section6.6. The results investigate: (i) the performance of the algorithms and techniques using which spatially-situated robots in a swarm understand if they have been selected or not, and (ii) the effect of deployment and mobility strategies on the swarm-level gesture recognition performance. In the context of robot selection, individual and group selection scores are investigated with respect to surrounding non-selected robots. An inversely proportional relationship is found between the selection accuracy and the size of the swarm. In the case of large swarms, selection accuracy decreases with the increase in swarm size. In the case of deployment, mobility strategies reshape the spatial distribution of the swarm and provide better gesture recognition performance compared to situations with no deployment (i.e., when individual and swarm-level sensing positions are not op- timized). In addition, different mobility strategies have been compared with respect to the swarm-level recognition accuracy, the swarm size, and the com- munication capabilities of the Foot-bot platform.

4.5 Summary of Contributions

This chapter presented swarm-level coordination mechanisms to fulfil the sub- goal outlined in Section1.4.2.2. Strategies that allow humans to select spatially distributed individuals and groups of robots from a swarm were introduced with the use of spatially-addressed gestures, and the developed algorithms enable

robots in a swarm to understand if they have been selected or not. For spatial selection, individual robots in a swarm calculate an individual or group score. The individual and group scores provide a relative measure and determine if a human is pointing (providing a spatial gesture) towards an individual robot or a group of robots. Robots that obtain the highest scores are chosen as the selected individual or group member. The distributed mobility strategies enable spatially-aware deployment of heterogeneous robot swarms (UGVs and UAVs) for proximal interaction with humans, and provide human-relative localization in context of the considered HSI scenario (see Section1.2).

Chapter 5 Learning as a Swarm

For robot swarms to recognize gesture commands given by humans (see Sec- tion 3.4), first the robots have to learn the commands defined in the gesture language (see Section 2.2.1), before they can be classified. The focus of this chapter is on the development of supervised learning strategies that allow robot swarms to distributively and collectively learn gestures in real-time supervised by humans instructors, which is one of the sub-goals outlined in Section1.4.2.3. This chapter is organized as follows. First, we investigate the use of offline learning methods by using a dataset of gesture images for training (i.e., building a classifier), as shown in Figure5.1(a). The red and blue coloured samples (acquired by an individual robot in a swarm) represent a binary (two-class) classification problem. Although offline (batch) approaches are very efficient and provide good learning and classification performance with a swarm of robots, the main limitation of offline methods is that no new knowledge can be added/up- dated into the trained classifier. In this context, we direct attention towards on- line incremental learningmethods as shown in Figure5.1(b). In online learning, samples arrive incrementally over the course of time and are used for training. Every time new samples arrive they are used for retraining the current classifier. In this way, new knowledge is incrementally updated into the classifier model.

To include humans in the loop of online learning, we introduce the learning strategy in Figure 5.1(c). The scenario depicted in (c) is as follows: a human provides a gesture to a swarm, the swarm classifies the gesture and conveys feedback to the human based on the swarm-level recognition outcome of the gesture. Based on the swarm’s feedback, the human provides the swarm with the label of the given gesture sample, which is used by individual robots in the swarm to update their classifiers. The entire process from the human presenting a gesture to the robots updating their classifiers, is termed as an interaction round.

Offline (Batch) Learning Training Dataset Samples Classifier Samples Classifier Samples Classifier Samples Classifier Tr aini ng Tr aini ng Tr aini ng New Samples Online Online Gesture Sample Swarm Feedback Sample Label (a) (b) (c) Interaction Round Update Classifiers

Figure 5.1. Supervised learning strategies for robot swarms for learn gesture com- mands. A sample represents a gesture image acquired by an individual robot. (a) Offline/batch learning using K = 2 gesture classes. (b) Online incremental learning with K = 2 classes. (c) Online learning using feedback from humans.

The cooperative learning strategy in Figure5.2uses the learning approach in Figure5.1(c). The only difference of Figure5.1(c) with the cooperative learning is that, individual robots in a swarm share and exchange acquired gesture samples with each other (using information selection and sharing strategies) before building a swarm-level classification decision (see Section5.4).

Swarm knowledge

Information Sharing and Exchange Incremental

Classifier Update

Figure 5.2. Distributed cooperative learning with a swarm of N = 5 robots using information selection and sharing strategies. By incrementally sharing knowledge, individual robots learn information sensed by other robots in the swarm.

The research presented in this chapter has been collaborated with a number of colleagues and experts at IDSIA: offline learning with support from Dan Cire¸san, Ueli Meier, Jürgen Schmidhuber and Frederick Ducatelle, online learning with guidance from Hung Ngo and Eduardo Feo Flushing, and cooperative learning in collaboration with Alessandro Giusti and Gianni Di Caro. The Con- volutional Neural Network (CNN) is one of the adopted offline learning method, and is based on Dan Cire¸san’s implementation. Frederick Ducatelle assisted with the first implementation of the gesture classification algorithm on the Foot-bot platform, and also supported in testing the algorithm with small sized swarms. The Confidence-Weight Swarm Learning (CWSL) algorithm has been developed in collaboration with Hung Ngo. Strategies for cooperative learning have been formulated by Gianni Di Caro, and have been implemented by Alessandro Giusti. My contributions in this chapter include: investigating the use of different types of learning algorithms (suitable for swarm learning) with advice and collaboration from different experts, and designing and performing experiments.

5.1 Background and Related Work

This section reviews related works in different domains. The covered topics include, distributed learning in wireless sensor networks (WSNs) and multi-camera systems (see Section 5.1.1), supervised online learning strategies that use feedback from humans (see Section 5.1.2), and collaborative training (learning) strategies in multi-classifier and ensemble-based systems (see Section5.1.3).

In document DOCUMENTOS ESTANDAR DE LICITACION (página 30-33)