• No se han encontrado resultados

PLAZA Y DISTRIBUCIÓN LOGROS DE APRENDIZAJE

In document Mercadeo: Productos y Servicios Agrarios (página 155-160)

Actividades de comprensión

VALLAS Y PUBLICIDAD MOVIL

4.5 PLAZA Y DISTRIBUCIÓN LOGROS DE APRENDIZAJE

Random Forests are among the recent additions to the ensemble methods and machine learning toolbox. Classifier based on Random Forests are ensemble methods such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. See Breiman (1999) and Breiman (2001) for a overview.

Classification trees, as described in Sec. 4.2.1, can work reliable in many cases. However, one of their drawbacks is that such models tend to have high variance. Small changes in the composition of the training set can led to very different tree structures. This drawback just follows from the hierarchical nature of the tree model: small differences in the top few nodes can produce highly different structure as those perburbations are propagated down the tree. To reduce the variance of tree estimates, Random Forest Classifiers (RFC, Breiman 1999, 2001) uses an ensemble of trees – a forest – and attempt to de-correlate the T trees by selecting a random subset Xtry of the

input features as candidates for splitting at each node during the tree-building process. The result is that the final model has lower variance than a single tree. For a source to classify, the class probabilities are estimated as the proportion of the T trees that predict each class. Again, as in classification trees, classification output for each new source can then be described either as a vector of class probabilities (giving the probability for each of the C classes) or as its predicted class (with the highest probability).

To give the definition of Breiman (2001):

where k} are independent identically distributed random vectors and each tree casts a unit

vote for the most popular class at input X.”

Breiman (1999) (see also Breiman 2001) formalized the concept of a Random Forest with M trees as an estimatior using an ensemble of randomized trees{h(X, Θm), m = 1, ..., M} where the

{Θm} are independent identically distributed random vectors, and the m-th randomized tree is

an estimator h(X, Θm), where x is a feature vector. The predictions of the T randomized trees

are averaged to give the final prediction. (Other possible options are e.g. an average or weighted average of all terminal nodes reached, or, in the case of categorial variables c, a voting majority.) In Random Forest Classifiers, randomized trees are typically built without any pruning (Breiman 1999). The tree building continues until either the terminal node is pure in its classification (i.e.: no further split can be done), or each terminal node contains no more than a pre-defined number of training sample points to split on.

Practical Aspects of Random Forest Classifiers

In the following, the more practical aspects of using a Random Forest Classifier are discussed. A Random Forest Classifier is trained by executing the steps as described in Algorithm (2). Algorithm 2:Random Forest Classifier training

Input:{(X, Yi)}Ni=1: training set

M : number of trees

k: number of features to split on

Output:{h(X, Θm), m = 1, ..., M}: ensemble of randomized trees

begin

for m = 1, ..., M do ˜

X ( X: sample from X with replacement, | ˜X| > 0.5|X| select k features at random from all features

feature providing the best split, according to some objective function, is used for a binary split on that node

output {h(X, Θm), m = 1, ..., M}

Depending on the value of k, there are three different systems: • Random splitter selection: k = 1

• Breiman’s bagger: k = total number of predictor variables

• RFC: k  K where K is the number of features. Breiman suggests three possible values for k: 1/2√K,√K, 2√K.

It is important to note that (Breiman 2001): Having a large number of features, the eligible feature set will be quite different from node to node. The greater the inter-tree correlation, the greater the error rate of the Random Forest Classifier. For this reason, the trees must be as uncorrelated as possible. With decreasing m, both the inter-tree correlation and the strength of individual trees are decreasing. For this reason, so some optimal value of m must be discovered.

When applying Random Forest Classifiers, and tree-based classifiers in general, one should not neglect the importance of hyperparameters. In contrast to parameters found within the estimators, hyperparameters describe the execution of the algorithm itself. Usually they are fixed before the training process begins.

Hyperparameters of Random Forest Classifiers are (Bernard et al. 2009):

• k, the subset of feature randomly drawn without replacement; this number allows to in- troduce more or less randomization in the split selection, in such a in such a way that the smaller the value of m, the stronger the randomization,

• M, the number of trees, • the maximum depth.

In the context of machine learning, hyperparameter optimization is the problem of choosing a set of hyperparameters, usually with the goal of optimizing a measure of the algorithm’s performance. Often, hyperparameter tuning is carried out by a grid search, an approach that will methodically build and evaluate a model for each combination of hyperparameters specified in a grid.

When using a Random Forest Classifier, one must be aware of their strengths and weak- nesses.

Random Forest Classifier are superior to many other methods in terms of accuracy and efficiency, and they are able to deal with unbalanced and missing data. Because the method averages the predictions over multiple trees, the estimated classification probabilities are much more robust to imbalanced training sets than methods using a single tree. They can be parallelized. The feature importance in classification can be easily estimated.

Weaknesses of Random Forest Classifiers are that when used for regression, they are not able to predict beyond the range in the training data. Additionally, they may over-fit data sets that are particularly noisy for small number of trees.

In document Mercadeo: Productos y Servicios Agrarios (página 155-160)