• No se han encontrado resultados

Sociedad cooperativa de segundo y ulterior grado

IV. 1 Crecimiento orgánico

IV.3 Formas de crecimiento externo

IV.3.1 Sociedad cooperativa de segundo y ulterior grado

By its structure itself, a single tree model is actually easily interpretable. It makes sense that the tests close to the top of the tree influence a lot the final prediction while tests of lower level depend generally on variables of lower importance.

Unfortunately, such intuition is totally lost for a forest of trees as there is much diversity among trees. Moreover, forests are often composed of hundreds to thousands of trees and the analysis of each tree individually is completely intractable. For this reason, we cannot infer directly interpretations from the model observation. However, several methods have been proposed to derive variable importance scores from tree en- sembles.

One of them is called the mean decrease of impurity (MDI). It consists in evaluating splits in a decision tree by the decrease of impurity resulting from the test. This quan- tity for one particular splitting variable is then accumulated for each split (weighted by the nodewise sub-sample size) in which the variable is used over the whole forest. This sum actually reflects the importance of the variable in the final prediction. The oper- ation can be repeated for each input variable of the problem to provide an importance

score for each one.

Mathematically, we denote byI(xi,Tj)the importance of a variablexi (∀i= 1, . . . , m) in a single treeTj and this quantity is given by the mean decrease of impurity measure such that I(xi,Tj) = X N ∈Tj|v(N)=xi nN nTj ∆I(N), (2.14)

where v(N)denotes the variable used to split the nodeN, nTj the number of samples

in the learning set,nN the number of samples reachingN and∆I(N)has been defined

earlier in Equation (2.10).

For a forest of T trees, the mean decrease of impurity measure of a variable xi is simply averaged over all the trees in the forest. The importance score of the feature xi is thus given by

CHAPTER 2. PRINCIPLES OF MACHINE LEARNING

18

I(xi) = 1 T T X j=1 I(xi,Tj). (2.15)

Intuitively, a feature will get a high importance score if it appears frequently in the for- est and at top nodes (leading to large n(nN) ratios) and if it strongly reduces impurity at the nodes where it appears.

We provide in Algorithm1a pseudo-code for the building of a forest ofT trees, with the standard Random Forests algorithm described earlier, and the generation of the importance scores for every feature in the learning sampleLS.

Algorithm 1

Random Forests algorithm and feature importance scores genera-

tion.

Require:

A learning sample

LS

(of size

n), the number of selected features

K,

and a forest size

T.

1:

I(x

i

) = 0,∀i= 1, . . . , m.

.Considered as global variables

2:

fort= 1to

T

do

3:

Generate a bootstrap sampleLS

b

fromLS.

4:

Learn_a_randomized_tree(LS

b

)

5:

end for

6:

I(xi)←

T n1

I(xi),∀i= 1, . . . , m.

7:

8:

function

L

EARN

_

A

_

RANDOMIZED

_

TREE

(LS)

9:

ifall objects fromLS

have the same class

then

10:

Create a leaf with that class.

11:

else

12:

Randomly pick

K

features.

13:

Evaluate the expected reduction of impurity

∆I(N)

provided by the

best split on each featurex

i

amongK

at this nodeN.

14:

Select the featurex

i

giving rise to the maximum∆I(N).

15:

I(x

i

)← I(x

i

) +n

N

∆I(N).

16:

Create a test node for the selected split and divideLSinto sub-samples

LS

1

and

LS

2

according to this split.

17:

L

EARN

_

A

_

RANDOMIZED

_

TREE

(LS

1) 18:

L

EARN

_

A

_

RANDOMIZED

_

TREE

(LS

2) 19:

end if

20:

end function

It is worth mentioning that, for Bagging and Random Forests,Breiman[2001] pro- posed an alternative measure that computes for each feature the mean decrease of accuracy (MDA) of the forest when the values of this feature are randomly permuted in the out-of-bag samples. In bagging, only a subset of the original sample is used for fitting each tree. The out-of-bag sample refers to the instances not used during the learning process and there is one out-of-bag sample for each tree of the forest. The con- cept is thus to use these samples to compute the prediction error involved by the forest model. An importance score for a variable xi is thus associated by computing the pre- diction error when this variable is permuted and looking at the difference between the error before and after permutation. Both MDI and MDA measures are used in practice. Experimental studies [Strobl et al., 2007] have shown that the MDI is biased towards

CHAPTER 2. PRINCIPLES OF MACHINE LEARNING

19

features with a large number of values but this bias is irrelevant in our neuroimaging setting in this thesis, where all features are numerical. The MDI measure furthermore benefits from interesting theoretical properties in asymptotic conditions [Louppe et al.,

2013] and is usually faster to compute as it does not require to perform random per- mutations.

Because of their ease of use, their robustness with respect to parameter tuning, their performance and their interpretability, tree-based ensemble methods are widely used in practice, notably for biomedical problems necessitating interpretation of results as well as accuracy. In bioinformatics, for instance, Random Forests are often used for biomarker discovery or genome-wide association studies [Díaz-Uriarte and De Andres,

2006,Lunetta et al.,2004,Genuer et al.,2010,Botta et al.,2014].

Variable importance scores can be useful for feature selection approaches to identify the (most) relevant variables related to a problem.

Definition 1. According to [Guyon and Elisseeff, 2006], a variable xi is said irrelevant

with respect to the outputyif for all subsets of featuresB ⊆V \ {xi},

P(xi, y|B) =P(xi|B)P(y|B),

whereV is the set of input variables. A variable is relevant if it is not irrelevant.

In words, a variablexi is relevant with respect to the output if there is a least one subset of variables B such that the output y depends on xi conditioned on B. Rel- evant variable are thus variables that bring some information about the output in at least one context represented by the conditioning. On the other hand, irrelevant vari- ables never explain the output in any conditioning. One common problem of feature selection consists in identifying all the relevant features [Nilsson et al., 2007, Kursa

and Rudnicki,2011,Sutera et al.,2018]. In [Louppe et al.,2013,Sutera et al.,2018],

the authors have linked through several theorems importance scores with the notion of variable relevance, in asymptotic setting (ie., an infinite number of trees and samples). In this setting, they have for example shown that a variable is irrelevant if and only if its importance score as computed from a forests of totally randomized trees (i.e., grown withK= 1) is zero. WhenK >1, a zero importance remains a necessary condition for a variable to be irrelevant but it is not a sufficient condition anymore as relevant variables can receive zero importances. In finite setting however, irrelevant variables can receive positive importance scores and one has thus to resort to statistical tests to distinguish the limit between truly relevant and irrelevant variables in variable importance rankings

[Huynh-Thu et al.,2012,Genuer et al.,2010].

In many applications like in neuroimaging, variables can be highly correlated among themselves. These variables are said to bepartially redundantwhen they share partially similar information about the output variable. Two variables are totally redundant if they are perfectly correlated. Chapter 7 of [Louppe,2014] showed that redundancy can have a considerable effect on importance measures. If a copyx0iof a variablexi is added to the feature set, the importance score ofxi will decrease. Moreover, it will also impact the importance values of the other variables. It is therefore important to keep in mind such effects when we deal with correlated variables, as it is expected to be the case with neuroimaging applications.