• No se han encontrado resultados

2.4 Servicios de Telefonía Móvil

2.4.7 Otras tarifas de voz

In this thesis, the problem statement is to automate the endoscopic camera to re- act to the gripper motion in a manner which is similar to how a human would move the camera. The assumption here is that the human inherently knows the re- wards/penalties of the camera movement and moves the camera optimally according to that. The inverse reinforcement learning algorithm presented in the earlier sec- tion gives an estimate of the reward function and a policy. Since the actual reward function remains an unknown, there is still a need of a metric to determine how well the algorithm works.

The estimated reward function can be evaluated based on the corresponding policy obtained. A trajectory then can be generated from this policy by taking

LEARNING AUTOMATED ENDOSCOPE CONTROL

actions dependent on the state which is the gripper motion. Once an automated trajectory is generated for a test data set of gripper for which human generated trajectory is also available, the two trajectories can be compared.

For comparison between trajectories, a similarity metric is defined as, σ : Q × Q → [0, 1] between the human operated camera trajectory τH = [qH1 , qH2 , qH3 , ...] and the automated trajectory τA= [qA1, qA2, qA3, ...] is defined as,

σ(τA, τH) = e− √ 1 T PT t=1||qHt −qAt||22 (6.14)

where ||.||2 denotes the norm of the vector q in the joint space. Since, this is an exponential function the output value would be between 0 and 1, with 1 being exactly the same trajectory and 0 being the similarity of two infinitely different trajectory. With this metric defined, the quality of a policy obtained from the inverse reinforcement learning can be quantified with respect to a human generated trajectory.

Chapter 7

Results and Discussions

This chapter shows the results obtained from various approaches presented earlier in the thesis and discusses them thoroughly regarding their impact towards the objective of this thesis.

7.1

Subtask classification

The classification of the task into 4 subtasks using the time series motion data of the gripper data was done using the long short term memory recurrent neural network as described in Chapter 3. The LSTM network was chosen due to its ability to selectively remember the inputs and predicted outputs of the system and use those to generate predictions for the current time. The data obtained from the user was labeled into the 4 subtasks as described in Chapter 5 by going through the videos recorded frame by frame and using the simulation time in the frame to decide the transitions down for classification. The rules to classify a frame into the subtasks are also specified in Chapter 5.

In order to maximize the accuracy obtained from the neural, parameters such as inputs, number of hidden layers, number of neurons in each hidden layer, activation

RESULTS AND DISCUSSIONS

Figure 7.1: Graphs showing the accuracy and loss for the implemented neural net- work evolve over the number of epochs for train and validation data sets. A training accuracy of 74.5% and a validation accuracy of 71.3% was obtained.

RESULTS AND DISCUSSIONS

functions and the optimization function (stochastic gradient descent, adam) should be varied and compared. A lot of different trials were required for the combination of each of these parameters. These trials were performed and the best results were obtained with the following parameters. All of these trials were implemented using the keras library [79] using Tensorflow [80] the backend for computation.

Inputs to the neural network were given as 4 dimensional vector including the position of the gripper and its jaw angle. This 4-d vector was concatenated with 10 previous time stamp values to contain the information of the derivatives of the inputs. Hence, the total inputs for a sample given to the neural network was a 10x4 matrix. Inputs with differing number of previous data to be used were also tried along with using the orientation of the gripper to classify. However, these resulted with lesser accuracy or overfitting, where the training accuracy was very high but the validation accuracy was low showing the network did not generalize well.

The optimizer used was the Adam optimizer [77]. The parameters resulting in best performance were a leaning rate of 0.0007, β1 = 0.0999 and β2 = 0.0999. Again, these parameters were fine tuned by running the algorithm for multiple trials with other values of these parameters. Other optimizer that was tried was the stochastic gradient descent [78] but the Adam optimizer resulted in better accuracy and converged faster.

The best results for validation accuracy is shown in Fig. 7.1. This figure shows the evolution of the training and validation accuracy along with the loss in error with respect to the epochs for the training of the classification. It can be observed that the training accuracy is 74.5% and the validation accuracy is 71.3%. This shows that the trained model does overfit or underfit the data and hence the model represents a good generalization of the data.

RESULTS AND DISCUSSIONS

Figure 7.2: Confusion matrix for the subtask classification generated for a com- pletely unseen data set. It can be seen that the accuracy is reduced due to confusion between subtasks 3 and 4.

taining 10 LSTM cells in addition to the input and the output layers. The input layer had a linear activation function so that the LSTM layer would get the inputs as they were without alterations. The output layer had a softmax activation function for classification into the 4 subtasks.

For a better understanding of the accuracy and generalization of the trained model, Fig. 7.2 shows the confusion matrix for a completely unseen test data set. This data set was not used by the neural network for either training or validation. The confusion matrix depicts the comparison between the true labels and the pre- dicted labels. It can be observed from the figure that the model classifies the first two subtasks with a very good accuracy. However, for the subtask 3 and 4, the algorithm is not that accurate and is confused.

RESULTS AND DISCUSSIONS

A possible explanation for this is that when the user is about to place the ring on the peg, the ring needs to be approximately horizontal above the peg. However, when the gripper is carrying the ring, it is vertical is orientation due to there being only one gripper to hold the ring. Therefore to align the ring, the users have to move the gripper horizontally in a to and fro motion at larger speeds so that the inertia of the ring makes it horizontal. This motion data given to the neural would make it difficult for it to determine as to whether the user is approaching to place the ring or placing the ring. To resolve this issue and increase the accuracy, either a relabeling of the data is necessary or filtering out this part from the gripper motion should be done. More discussion in ways to improve the accuracy of the classification is done in the next chapter.

Documento similar