Learning and recognition of hybrid manipulation tasks in variable environments using probabilistic flow tubes

The learning problem consists of generalizing the demonstrations into probabilistic representations of the flow pipe for each of the three motions in a new state of the environment (2). The state of the world at a certain time step t in the shown trajectory is denoted by T (t), which is the intersection of all variables at this time step (C (t) , D (t) , P (t ) , Q (t)).

Identifying Motion Variables

34; relEffStart: Is the starting position or orientation of the robot's end effector generally the same with respect to certain points of interest throughout the demonstrations. The approach determines which of the parameters listed above are statistically similar to the different training sequences by fitting a Gaussian (p, E) to each parameter at the beginning and end of all trials. Judging by the large spread in the pbin (0) - peff (0) values, it can be concluded that relEffStart is not a relevant feature of the candidate variable P, or that the relative positions between bin and effector do not matter at the start of Motion.

Data Processing and Flow Tube Generation

Each temporal match of two sequences corresponds to a transition of the cost matrix from the element matching the origin of the two sequences, which, in the opposite corner, cmn. Thus, the problem of finding the best time matching is reduced to finding the traversal of the cost matrix that results in the least total cost. If the two sequences are very similar, the cost matrix transition will be close to the diagonal.

Pre-learning

The inputs include the library of learned tasks L, where pre-learned PFTs are stored, and a new environmental state T (0). Alternatively, one can prelearn the PFT associated with environmental state p (by taking PRELEARNPFTs offline), and normalize its flowtube to the environmental state given by c (by calling GETPFTsFROMHERE). The property described above is only useful if there are enough pre-learned PFTs to provide good coverage of the environmental states.

Enabling Autonomous Execution

Small values in d can help indicate which time steps in the PFT correspond to the current implemented state. Then the algorithm represents how the current execution time tcurr differs in time from the points in the time component of the PFT or (C name me. The time step in the PFT corresponding to the current execution state occurs when the weighted distance is the smallest.

Temporal Alignment of Partial Motion

34; circle the box around the container counterclockwise, making an anchor loop (or 'figure 8'), first around x, then around o," and "making an anchor loop, first around o, then around x." marked magenta marks indicate indicates the positions in each PFT that are determined to best match the current position of the partial test move in black. The PFT is not spatially close to the current test position, but rather in a more reasonable position that is temporally consistent with the current implementation.

Compute Log Likelihoods

During recognition, the most likely recognized activity can be any of the activities in the library, including "unknown". As shown in the setup of Algorithm 6.1, the input of plan learning consists of a set of demonstrations S of the sequence of activities composing the plan, a new label f' used to describe the plan, a new environmental state T ( 0), and the current library of learned tasks L. Of the demonstrations, a subset Y contains keyframes recorded as an additional discrete variable in the demonstrations.

Learning Unknown Activities

For example, in the first segment of trial 1, the recognizer found that 96% of the time spent in that segment, activity 1 had the highest log likelihood among the three activity labels. When LEARNUNKOWNOWNACTIVITIESINPLAN is used in the context of learning a plan, it is first called using keyframe trials provided as training data. To also take advantage of keyframe-free trials, the OFFLINEPLANLEARNING algorithm repeats the process of learning previously unknown activities from keyframe trials and using newly learned activities to aid in automatic segmentation, which converts more trials without keyframes in keyframe trials. .

Auto-segmentation

The optimization function is defined by the sum of the recognized log-likelihood of each activity in the subtask sequence. Since this algorithm is a minimizer, the optimization function is the negative sum of the subtask log probabilities. The algorithm therefore uses the negative sum of the log-likelihoods over time as the optimization function to feed to the Nelder-Mead minimizer.

Validate auto-segmentation

To assess how good the set of computed keyframes is, the algorithm compares the recognition probabilities of the identified activity sequence, LLq, with the maximum likelihood for all activity labels, max LL, collected over all time steps along the way. The maximum likelihood for all maxi LL labels represents the maximum potential of the trial curve likelihood from what we learned in the library, so if the activity sequence probability LLq is very close to the maximum potential, then the system can be very confident about the placement of the keyframes that they come from the optimizer. Those between trials without keyframes that have been newly segmented, either autonomously with high confidence or validated by the user, are removed from the .

Generating Plan-level Probabilistic Flow Tube

Two-dimensional Variable Environment
Two-dimensional Static Environment
Hardware Validation of Real-time Recognition
Hardware Demonstration of Autonomous Execution

Five of the 10 trials for each movement were used for testing for each of the two cross-validations. In the physical setup shown in Figure 7-11, PR2 faced a table where certain objects were placed, and all movements were learned kinesthetically by manually moving the robot's right arm. We were limited by the physical capabilities of the robot in terms of the types of tasks and the number of objects we could use.

Validation of Plan Learning

Two-dimensional Environment Tests

In addition, I note that the recognition optimization approach slightly overestimates the “move ball to center” activity in trials 8 and 11. The motion variable inference approach outperforms the recognition optimization approach in both computation time and segmentation accuracy. Therefore, it seems reasonable to use the motion variable inference approach for auto-segmentation whenever possible, and use the recognition optimization approach as a backup for motions without identifiable motion variables.

Hardware Demonstration of Plan Learning

The system was shown nine training samples of the design, 3 of which were assumed to be trials with pre-segmented keyframes. While learning, I assumed that this information was only available for 3 trials, while the others were used as test cases for automatic segmentation. First, vision sensors were often unreliable and had a narrow range, causing occasional errors in tracking object positions.

Results Summary

Compliant Execution

The covariance sequence of a probabilistic flow tube can be provided to a controller as a cost function during compliant execution. In narrow regions of the flow tube, a deviation from the nominal trajectory will have a high cost, while the same deviation in a wide area of the flow tube will have a lower cost. On some robots, this may correspond to variation in the robot stiffness as it follows the nominal trajectory.

Obstacle Avoidance

Note that across different training trials, the effector (blue) and the box. red) positions match at the start of contact, and effector (blue) and bin (green) positions match at the end of contact. After identifying all the candidate motion variables, the algorithm uses clustering to determine whether patterns exist across different training samples, such as those shown in Figure 4-2, for each candidate X C {C, D, P, Q} across different conditions . A narrow spread in the values of a movement variable across many training trials indicates that the variable is relevant to the movement.

IDENTIFYMOTIONVARS (S)

M AKEPFT (S,.F, T (0))

Normalize demonstration series (line 3): Normalize the selected subset of demonstrated data series to fit the values of the motion variables for the new situation. First, all orientations are rotated during the movement by an amount equal to the offset between the starting orientation and the desired starting orientation. The demonstrated sequences can have different numbers of data entries, so fast dynamic time-warping is again used to temporarily match each of the normalized demonstrated sequences in S"" with the average sequence Teff, and interpolate them so that they all have the same number of data entries (lines 9-12).

PRELEARNPFTS (S, f, TO)

GETPFTSFROMHERE (f, T (0))

In the case that the trajectory does not belong to any of the movement labels in the library, the calculated log likelihood of all labels will be very small in value. The parts of the log likelihood corresponding to the previously defined activity sequence q are then collected and joined as LLq. For example, the algorithm failed to identify the activity “move the ball to the center” in trial 5, instead considering it as part of the activity “move the box to the bin”.

34;box to x, then bin" plan, the segmentation accuracy is 95% versus 74% in favor of the motion variable inference approach. Although beyond the scope of the current work, I provide some ideas on how to approach this problem an in future Each activity time slice a(T) represents the rth activity in the plan trajectory at time step T since the beginning of the activity.

The probability that the user moves to the next activity is then given by 1 - P (ar1 a r).

ONLINERECOGNITION PFT ,r0I L, Tcurr, W)

RECOGNIZETASKSUSINGKEYFRAMETRIALS (Y, )

From the motion variables, the algorithm knows that the distribution of the motion variable E is an indication of how much variation the motion variable had during training. Recall from Figure 6-2 that these log-probabilities can be visualized as the green lines in one of the green boxes. While the generated Gaussians work well for the "move box left 1" and "move box to x" movements, I argue that the generated Gaussian mixture model for the "move box to bin" movement does not fully capture the complexity of the movement .

On average, my algorithm recognized the movement correctly 71% of the time spent during a test movement, while. The pink trajectory describes the generated nominal trajectory for the PFT and the covariances are shown in light blue.

During the first part of the execution, 'move 1 left' is also a low probability contender, but after the user moves beyond 1 unit, the move is likely to 'move left'. For example, GMMs cannot distinguish a circular clockwise motion (such as winding a cable) from a counterclockwise motion (unwinding) because they occupy the same spatial area, as shown in the comparison of the GMM representation in Figure 7.5 with our probabilistic representation of the flow tube. of state space is quite favorable for recognition, because it makes movements more divergent.

During the AAAI LfD Challenge, my teammate and I enabled the PR2 robot to autonomously execute a scenario involving many of the movements shown in Figure 7-12. The starting position of the robot effector is marked with an asterisk, while the initial environmental states of the objects are also noted in each trial.