• No se han encontrado resultados

III. El incumplimiento de las penas comunitarias 117

4.   Incumplimiento de TBC 149

4.2   Incidencias en ejecución del TBC 155

Chapter 7: Multi-Armed Bandits for Pruning Feature Maps 135

In all the experiments, the playing time is set to five times the number of feature maps (arms). For example, for pruning the convolutional layer in AlexNet trained on Bird-200, the total playing time is 5×256 = 1,280.

The results in Table 7-2, show that, in general, the UCB1 and Thompson Sampling pruning algorithms maintain the accuracy of the original unpruned models.

In addition, as the results in Table 7-2 show, the MAB methods outperform the other pruning techniques. The experiments were carried out layer by layer, starting with the first convolutional layer followed by the next layer and so on. We make two observations based on these results:

• First, that we expected a pattern where the later convolutional layers were more likely to be pruned than the earlier convolutional layers. We believed this would happen for two reasons for this. First, to recognize objects in images, the first layer aims to learn to recognize edges, the second layer combines edges to form motifs, the third learns to combine motifs into parts, and the next layer learns to recognize objects from the parts identified in the previous layer and so on [20]. From this sequence, the first layers will be for general feature detection while the later layers will aim to detect specific objects. Thus, it was expected that it would be easier for a pruning algorithm to determine which feature map does not belong to any classes in the later layers while that is difficult to remove the earlier feature maps given these layers extract the edges which relate to all classes. The second reason, is that in ConvNets, the later layers have a larger number of feature maps than the previous layers which makes it more likely to have feature map that are not important. However, the experiments show that this was true for only three of the four experiments with convolutional networks but not for AlexNet. Although further experimentation is needed, one reason for this might be that AlexNet was pre-trained on the ImageNet data and hence one would expect the need for pruning the earlier layers which may be too generic.

• Pruning all the convolutional layers together is better than pruning each layer separately. For example, when pruning the LeNet model based on all layers together the algorithm prunes nearly 22% from the total number of feature maps. We think the reason is that pruning each layer separately, we enforce each layer to prune some of the feature map while pruning feature maps in all layer, the pruning algorithm will determine the unimportant feature maps across the layers which is expected will be in later layers.

Chapter 7: Multi-Armed Bandits for Pruning Feature Maps 136

Table 7-2: Result of pruning convolutional layers. The green cells indicate that the method has good accuracy in contrast of red cell. The arrows point up if the error high,

Chapter 7: Multi-Armed Bandits for Pruning Feature Maps 137

To assess if the differences between the algorithms are significant, the Friedman test was applied. The results indicated that there was a significant difference between the algorithms as the p values of Friedman test is 4.25×10−23. Table 7-3 presents the average rank of the algorithms for pruning feature maps.

Name of Method Mean Rank

UCB1 4.28

Thompson Sampling 4.10

Model before pruning 3.55

Greedy Pruning 2.03

Based on the Magnitude 1.03

Table 7-3: Average rank of the algorithms for pruning feature maps based on accuracy, where a higher rank is better.

Table 7-3 indicates that pruning feature maps with UCB1 has the best mean rank over the other algorithms followed by Thompson Sampling, while pruning filters based on the magnitude has the worst rank followed by greedy pruning of the weights.

Since the Friedman test shows a significant difference between the different methods, a post hoc test was used to find which algorithm(s) performed significantly better than the others. For this, we used the Nemenyi post hoc test and the result is shown in Figure 7.4 where the lines from the proposed algorithms with length of critical difference CD is plotted to show the significant difference to the proposed algorithms. The x-axis in the diagram is the axis on which we plot the average ranks of algorithms, where the rank increases from left to right. In Figure 7.4, the results indicate that the proposed algorithms performed statistically better than the greedy pruning and pruning filters based on the magnitude algorithms, as the difference between them is greater than the CD=1.133. However, these results do not allow us to reject the null hypothesis between the proposed pruning algorithms and the original unpruned model. The results and implementation of these tests are available online24.

Chapter 7: Multi-Armed Bandits for Pruning Feature Maps 138

Mean - CD

Figure 7.4: Comparison of all classifiers against each other with the Nemenyi test. Horizontal lines show the critical difference away from proposed pruning algorithms and

any algorithms. CD=1.133.

The evaluation results presented in the previous sections indicate that the two proposed pruning algorithms performed well on several data sets. Different sets of experiments were conducted from which conclusions can be drawn, as follows:

• MAB algorithms offer a useful way of pruning feature maps and corresponding filters

7.3.

Discussion

Chapter 7: Multi-Armed Bandits for Pruning Feature Maps 139

in ConvNets.

• Pruning based on UCB1 and Thompson Sampling mimic the direct method of pruning feature maps on the LeNet model and therefore have potential for being as effective as the direct method but without the computational overheads of the direct method. • Pruning all convolutional layers together is better than pruning every layer separately.

The pruning algorithm can determine which layer has the most unimportant feature maps.

We have developed two pruning algorithms that can prune convolutional layers in trained ConvNets model. The proposed pruning algorithms are based on UCB1 and Thompson Sampling. Our evaluation shows strong performance compared to the baseline approaches of greedy pruning or pruning based on the magnitude. One of the limitations of the proposed algorithms is pruning and testing one feature map at each play time, which makes the proposed algorithms slower than some baseline algorithm. In Chapter 8, we describe another pruning algorithms that include pruning multiple feature maps at the same time, reducing the total play time.