• No se han encontrado resultados

ARQUITECTURA DE SOFTWARE

HARDWARE REFERENCIA COSTO

3.4 DISEÑO DETALLADO

3.4.2 ARQUITECTURA DE SOFTWARE

Decision trees are a well established classification tool and a part of the Toolkit for Multi Variate Analysis TMVA [84], which was used in this analysis.

A decision tree is a binary tree of nodes, i.e. starting with a single node in the first layer each node may either contain the desired classification response (either signal or background), or have exactly two daughter nodes in the subsequent layer. In the latter case each node “decides” which of its two daughters is the correct one for the given event based on an internal threshold of a single event variable. This means that each node splits the phase space into two regions based on that threshold using rectangular cuts, and the complete tree can select hypercubes from the phase space and classify them as either signal or background.

An example of a decision tree is shown in Figure 5.9. It shows a case with two observables E and θ in which signal and background are distributed as shown in the left side of the plot. Such a case cannot be solved with simple rectangular cuts without losing a significant amount of signal. The corresponding decision tree shown on the right side, however, is able to classify a given event into signal and background with very high efficiency.

Before a decision tree can be used it first needs to be build up, or “grown”. As with most multi-variate analysis techniques this is done with training data, i.e. data for which the distinction between signal and background is already known. The growing is done node-by-node. At each node creation step the best cut for discriminating signal from background from all available observables is chosen. In case of TMVA this is achieved by creatingnCuts cut values for each observable choosing the one with the

lowest misclassification error, which is defined as

missclassification error = 1max (p,1p) (5.12)

p= # signal events # total events

withp being the purity of the sample for the given cut. This cut selection is equivalent to choosing the cut with the highest or lowest signal purity. Using the cut the node divides the given data into two branches of leaf nodes for both of which this step is repeated and further nodes are grown. The splitting of a leaf node stops if either the number of events at the node is lower than the TMVA parameter nEventsMin or the

maximum number of layers MaxDepth is reached. In that case the leaf node is classified based on its signal purity as either signal (+1) or background (-1).

The classification power of a single tree is of course limited. Already in the artificial example of Figure 5.9 not the entire signal could be separated from the background. Hence a decision tree is called a weak learner which by itself has only limited discrimi- nation power. This restriction is overcome by boosting.

Boosting

Boosting is the process of converting a weak learner like a decision tree into a strong learner with higher discrimination power by creating a set of weak learners and use their majority vote.

For this analysis the AdaBoost algorithm [85], which is short for adaptive boosting, was used through TMVA. A good description can also be found in [86].

The basic principle is that a total ofT decision trees are built over several rounds t=

1. . . T. A weighting factorDt(xi) is assigned to each individual eventxi ∈ {x1, . . . , xN}.

These weights are considered when building the variable distributions which are used as input for the cut evaluation when creating a node. The idea is that the weighting factor is increased (decreased) for wrongly (correctly) classified events, such that the tree which is build in the subsequent round focuses on the misclassified events.

For the first round all events start with the same event weight, which is always normalized in such a way that the sum all weights is 1:

D1(xi) =

1

N ∀xi. (5.13)

In each roundta single decision tree is built under consideration of the event weight. Typically some events are wrongly classified by this decision tree. The classification error t of this tree (which is not to be confused with the misclassification error from above) is then the sum of all weights of all misclassified events:

t = X

misclassifiedxi

Based on this classification error a boosting parameterαt is calculated, which is used to adapt the weighting factors for the next round:

αt= 1 2ln( 1−t t ) (5.15) Dt+1(xi) = Dt(xi) Zt × ( e−αt if xi correctly classified eαt if xi incorrectly classified (5.16)

where Zt is a normalization factor. The event weighting factors are increased for misclassified events, giving them a stronger consideration when building the next tree.

This boosting step is repeated until a forest of T trees is created. Each tree assigns +1 or -1 for signal and background respectively and is weighted by its classification errort. The classification response of the forest of Boosted Decision Trees to an event is the weighted mean (“majority vote”) of all trees. This response is called BDT.

Boosted decision trees have the advantage that they rely on simple cuts and yet are able to perform cuts in higher dimensions, giving a significant improvement of the classification performance over simple rectangular cuts. In addition they are not sensitive when including observables with small or no relevance to the rejection, as during growing the most significant observable is chosen every time. However, statistical fluctuations of the training sample can lead to the growing of decision nodes specific to these fluctuations. This is called overfitting or overtraining and has a negative influence on the classification performance. This can be checked by applying the Boosted Decision Trees on a statistical independent testing sample, for which the classification in signal and background is already known as well. If the distribution of the response for the training sample differs from the one for the testing sample, the trees were indeed overtrained.

Documento similar