Finalmente, el EEG está preocupado por los posibles efectos de la eximente especial de arrepentimiento del artículo 427 CP que exime de castigo al que

Ensemble Learningis an approach where a number of classifiers are used to construct a single, strong, highly accurate classifier. The idea behind ensemble learning is that classification or

regression is made by votes cast by a committee of predictors. Boosting is an ensemble learning approach where multiple weak classifiers are trained in this way. Each weak classifier need only perform slightly better than chance. When querying the system, the response from each of these weak learners is then combined to form a composite predictor (Friedman et al. 2001;Marsland 2015).

Adaboost (Freund and Schapire 1995) is the most popular boosting algorithm. In Adaboost, some number,m, of weak classifiers, Gi, are trained on weighted data points. At the end of the algorithm the final output is the sign of the weighted sum of the classifiers.

At the start of the algorithm the observation weights, wi, i = 1, 2, . . . , n, are initialised to 1

n. The first classifier, G1, is trained on the data and its error rate is calculated. The weight, wi, of each incorrectly classified data point or observation is then increased while the weights corresponding to incorrectly classified points are decreased. The next classifier, G2, is then trained on the data so that it more carefully focuses on observations with a heavier weight, i.e. those that were misclassified by the previous stage. The error rate of this stage is calculated and the observations’ weights are updated accordingly. The process repeats until all classifiers have been trained.

The final output is then:

G(x) = sign m X i=1 αiGi(x) ! (2.56)

where the coefficientsαiare related to the error rate of the classifier. By adjusting the weights at each level, the classifiers are forced to focus their attention on data points that were incorrectly classified on previous levels (Friedman et al. 2001).

TheViola and Jones(2001) algorithm performs visual object detection in images and uses the Haar-like features and integral image discussed in Section 2.2.5. They note that with a 24_{× 24 base resolution, the exhaustive set of rectangle features has over 180,000 members.} An algorithm would need to calculate these features for each position of a sliding window over multiple image scales. This is unsuitable for any real-time system and a more efficient approach was needed in spite of the dramatic performance improvements due to the integral image.

Inspired by the Adaboost method, they hypothesised that even with only a few the avail- able features an effective classifier could be constructed. The challenge was finding the useful features in the set and constructing a strong classifier to use them. A weak classifier, hj(x), is defined that consists of a single feature, fi, a threshold,θj and a parity, pj, that adjusts the direction of the inequality:

hj(x) =

1 if pjfj(x) < pjθj,

0 otherwise, (2.57)

where x is a 24 _{× 24 image-patch and a f}i is a feature (2/3/4-rectangle Haar-like feature) in a specific position and scale. In practice none of these weak classifiers can function with significant accuracy, but through boosting significant improvements are found.

Viola and Jones (2001) propose an adaptation to the Adaboost approach where they use a degenerate decision tree in a cascade as illustrated in Figure 2.19. The cascade works by

Input Sub-Windows

G1 G2 G3 Further Processing

Reject Reject Reject

T T T

F F

Figure 2.19: Cascade Classifier

identifying classifiers that are able to reject negative samples accurately, while still allowing all the potential positive candidates through. The idea is that a simple classifier can quickly reject image-patches or sub-windows that are definitely negative allowing the algorithm to focus in- stead on more probable candidates. In such a case, each stage of the process should prioritise a low false negative rate, over a false positive rate. This means that the classifier would rather pass a negative image on to the following stage than to accidentally reject a positive image.

Viola and Jones (2001) show that for the first stage a two-feature strong classifier can be built using Adaboost. The threshold can then be adjusted to minimise the false negatives at the expense of more false positives. They were able to get their classifier to detect 100% of faces, with a false positive rate of 40%. Suppose a hypothetical dataset of 50 faces and 50 non-faces: such a classifier would accept all 50 faces as well as 20 of the non-faces. By performing two extremely simple, constant time feature calculations this classifier is able to reject 30% of that hypothetical data which means that a more complicated classifier can be implemented for the second stage that needs to check fewer images.

The cascade design exploits the fact that the vast majority of image-patches are non-faces and should be rejected early. With each layer of the cascade, a stronger but more computationally expensive classifier is used. Training a cascade classifier is computationally expensive as the best weak classifiers need to be identified and then merged into a number of strong classifiers – this involves a lot of time spent training classifiers that are ultimately discarded. The trade off is that when correctly configured, such a classifier is able to work with very high reliability and in real-time.

In document GRECO Grupo de estados contra la corrupción Consejo de Europa (página 33-36)