CAPÍTULO III APELACIÓN Y QUEJA
Numeral 1 0.6% de los IN (10) - Ocultar o destruir bienes, libros y registros contables, documentación
As extensively described in Chapter 4, the main point of generalized plan design is to abstract and generalize two elements: Robot action plans, and the information to parametrize them. While the former results in highly abstract static plan constructs, the latter results in a lack of information actually required to perform a task. To ground the abstract task description in the current situation and context and to make it executable these semantic, vague descriptions need to be formulated in a language low level robot components understand: Poses, velocities, distances, etc. Manually specializing every possible situation based on known pairs of task description/context information would ultimately undo the plan generalization (and again produce non-scaling, major eort for human plan designers). Instead, I propose a multi-modal analysis of Episodic Memories to automatically determine which parameters t any particular situation well, including inter- and extrapolation of arguments if necessary.
Finding the correct correlations from these EMs and transforming them into actionable parameters for an autonomous robot is hard and eort-prone in itself due to the sheer amount of data a robot produces, and the often non-obvious relations between intentions and eects. To ease this process, I propose a novel approach for multi-modal data analysis on the basis of Gaussian distributions [104]. In particular, I concentrate on non-deterministic environments and deduce multivariate, mixed Gaussian distributions for parameter ranges to help an autonomous robot in making informed decisions.
In the remainder of this section, I will give an overview of the approach together with a comparison to a strongly related technique by Stulp et al. [90], will explain in detail how correlations are determined, and discuss their use-case in generalized robot action plans.
5.6.1 Approach and Comparison
I demonstrate the multi-modal, experience-backed learning process at the example of mobile ma- nipulation, and more specically object fetching tasks in a kitchen environment. The processing pipeline of the approach is shown in Figure 5.11. More concretely, a mobile robot parametrizes its own plans of where to position itself for picking up objects based on its own experience data, processed using a Gaussian regression technique. In the following, I give an overview of the steps taken to present a distribution of possible parametrizations to a learning robot:
First, recorded Episodic Memory data is transformed into a suitable format for machine learning. This is the decision point for the characteristics a plan designer wants to take into account when learning correlations. Any data extractable from the EMs can be used, be it either nominal or numerical values. The n-dimensional training data is then clustered using a K-Means algorithm, and for each of the resulting clusters an n-dimensional Multivariate Gaussian Distri-
Data Collection and Experience-Based Learning
bution (MVG) is calculated. Using all resulting MVGs, an equally weighted, normalized Mixed Multivariate Gaussian Distribution (MMVG) is calculated. The MMVG now acts as a non- linear multivariate interpolation mechanism between all the prior experiences' parametrizations and can be queried for a probability value at any point in the n-dimensional parameter space. From this distribution, costmaps are generated that a robot control program can sample from. Examples for these costmaps are robot base locations most suitable for grasping an object, or for opening a drawer.
A similar goal, but with a completely dierent approach, was pursued by Stulp et al. [90]. They used a Support Vector Machine (SVM) based learning approach to generate Point Dis- tribution Models (PDMs), and nally generate a probability map of the regions well-suited for grasping using a Monte Carlo Simulation. I compare my approach to theirs, and show how my approach extends the type and dimensionality of source data that can be used, at the cost of precision. I also show that, given the use-case of producing costmaps for location selection, the loss in precision is negligible. Their processing steps are parallel to mine, and are shown in Figure 5.11 as well.
5.6.2 Multi-modal Data Analysis
One of the main advancements of this work over previous approaches is the increase in search space dimensions. While previously the only features used to determine whether a position to perform, say, a grasp action was well-chosen were the numerical relative distances in x and y direction, I introduce MVGs over an arbitrary number of task parameters. My multi-modal data analysis covers both, real and nominal values: Real values are measured based on their actual numerical value, while nominal values are assigned an index number in their category. This approach is well-formed, as nominal values are not interpolated while querying for probabilities, but their exact indices are used.
Clustering of Source Data using K-Means
Robot EMs contain local areas of interest in a global context. In this concrete case, these are collections of poses around an object where the robot tried to place itself for grasping. Given that these local areas are scattered throughout a larger area, they need to be clustered according to their n-dimensional feature vector.
To arrive at a sensible number of clusters, K-Means clustering is performed up to a given maximum number of clusters. Their average silhouette value [76] is calculated, and the cluster amount with the lowest average is used. This results in an optimal point distribution over all clusters. I use the Euclidean distance measure for n-dimensional data.
Multivariate Gaussian Distributions
Based on clustered source data, an MVG is calculated for each cluster. To this end, for cluster i, the covariance matrix Ci is calculated:
Ci = ni X k=1 Xi,k− Xi T Xi,k− Xi (5.1) While any number p ≤ n of aspects can be included from the source data, the amount of data actually stored in the MVGs is minimal: For each MVG (i.e. per data cluster) only its covariance matrix Ci ∈ Rp×p and the source data's mean value Xi ∈ Rp are stored, resulting in a space complexity of O(p2) per cluster. The density function f
i(X) allows to evaluate the distribution at any one particular point X ∈ Rp for the MVG around cluster i:
6. Multi-modal Analysis of Robot Experiences fi(X) = exp −1 2 X − Xi T Ci X − Xi (5.2)
Gaussian Mixture Models from Overlaid MVGs
Given the MVGs from the source data, I form an MMVG by applying the same weight 1 m to every one of the m clusters involved. To sample from the overall distribution, the mixture's density function F (X) needs to be evaluated:
F (X) = m X i=1 wifi(X) wi = 1 m (5.3)
The weight wi could be chosen dierently, for example using the relative amount of points involved in the cluster i. Since I assume that the experience data can include isolated clusters with only a few occurrences that still play an important, legitimate role (compare case 1 in Figure 5.12), I decided to use equal weights.
5.6.3 Evaluation
I have implemented the presented parametrization learning framework7 and applied it to a scenario in which a mobile robot performs object fetch tasks in a kitchen environment. It picked up objects from three tables, multiple times. A manually dened heuristic based on Euclidean robot/object distance and orientation was used to collect the training data.
In total, around 40 pick trials were recorded. Figure 5.12 shows the extracted feature points (green) as well as the resulting MMVGs, shown as gradient heatmaps. The distributions reect the probability of success in the given task based on the relative position of the robot w.r.t. the object, as well as their relative orientation to one another. It is important to note here that the learned model does not include characteristics about the environment itself; the coordinates used for training and results retrieved afterwards are purely relative. From top to bottom, the relative orientations between robot and object in the Figure's plots are −230◦, −90◦and +300◦. Except for the last plot, the resulting distributions reect the source data very well. My assumption is that the number of data points (given three independent variables, x, y, and θ) in that region is too low (and scattered too much) to generate a properly aligned distribution. A statistically signicant amount of source data would mitigate this problem. Besides this, the distributions give a very good prior of where to stand in order to grasp an object, before falling back to the manually designed heuristic, even for the suboptimal last case.
The EMs used in this evaluation were recorded using the robot memory system SemRec [105] and include both, high-volume low level sensor data and low-volume high level semantic plan data. Therein included are object descriptions, exact robot motions, grasp details, and kinematic poses at all times. These are the source for the training data used.
The maximum expected cluster count depends on the task performed, the size of the envi- ronment, and the number of experiences involved. It is safe to say that for this example case the number of involved objects gives a hint towards that maximum. I decided to use ten clusters at maximum, while having ve objects involved in the pick and place task. Most of the time, this would result in two to three clusters, but leaves room for situations in which objects are cluttered in the same space.
Data Collection and Experience-Based Learning
sink area counter top
m ea lt ab le co un te r to p
kitchen island counter top x 1.0m
y
sink area counter top
m ea l ta ble co un te r to p
kitchen island counter top x 1.0m y
sink area counter top
m ea l ta ble co un te r to p
kitchen island counter top x 1.0m y
sink area counter top
m ea l ta ble co un te r to p
kitchen island counter top x 1.0m y
Figure 5.12: Learned probability distributions for successful grasping, depending on relative distance and orientation between robot and object. The right three plots show regions of high success probability for grasping the objects on the table near them, as a density function f := f (relative-position, relative-orientation); chosen relative orientations from top to bottom: −230◦, −90◦, and +300◦. The pure experience data points are shown in the left plot.