• No se han encontrado resultados

1. PLANTEAMIENTO DEL PROBLEMA

2.5. Marco conceptual

2.5.1. Tuberías de revestimiento

2.5.2.5. Equipo de cementación

Option trees consist of a single structure that efficiently represents multiple decision trees. Introduced by Buntine (1992), option trees are known in the machine learning literature as an efficient variance reduction method (in the context of a bias-variance decomposition of the error). As a middle ground between trees and ensembles, they offer the double advantage of having multiple predictions per example, while having a single concise and interpretable model.

Option trees represent a variant of decision trees that contains option nodes (or options) in addition to the standard splitting nodes and leaves. The option node works as an OR operator, giving multiple optional splits. As such, it provides means for an interactive exploration of multiple models. An option node enables the user to choose which branch should be explored further, possibly expanding only the structure extending underneath.

When used for prediction, an example sorted to an option node will be cloned and sorted down through all of its branches, eventually ending at multiple leaves. As a result, the option node can combine multiple predictions and leverage the ”knowledge” of multiple ”experts”. In the case of option decision trees, the commonly used rule for combining multiple predictions is majority voting. For regression tasks, the multiple predictions are typically averaged to form the final prediction.

Figure 2 illustrates a partially built option tree on the Wine quality dataset. If one follows the left branch of the root node, corresponding to (Alcohol ≤ 11.6), one encounters the first option node, which further leads to a disjunction of three inequalities {(Volatile acidity ≤ 0.25) OR (Citric acid ≤ 0.25) OR (Alcohol ≤ 10.6)}. Putting it all together, one gets the following disjunction:

{(Alcohol ≤ 11.6 AND Volatile acidity ≤ 0.25) OR (Alcohol ≤ 11.6 AND Citric acid ≤ 0.25) OR (Alcohol ≤ 10.6)}.

By deciding to follow one of the three possible branches of the option tree, only the corre- sponding conjunction from this set will be explored further.

In the previous sections, we have discussed extensively various batch algorithms for learning decision and regression trees. Regardless of the selection criterion employed, all of those algorithms basically pursue a hill-climbing greedy search through the space of possible hypotheses. Each selection decision thus deals with the question ”Which path is most likely to lead to the globally optimal model?” Unfortunately, it is not possible to evaluate every path or every hypothesis in H. To provide an answer to this question, each predictive attribute (eventually accompanied with a split point) is evaluated using a chosen measure of merit which determines how well it separates the training examples alone. Having in mind that the selection decisions are performed at a particular point in the instance space which was reached through a sequence of previously chosen selection decisions, the choice of the best split can only be locally optimal.

As discussed by Kohavi and Kunz (1997), this approach will prefer attributes that score high in isolation, possibly overlooking combinations of attributes which may lead to better solutions globally. Breiman et al. (1984) pointed out that the greedy search is also one of the main reasons for the instability of decision trees, since small changes in the training data may result in very different trees. After selecting the split attribute and the split point, the algorithm never backtracks to reconsider earlier choices. Allowing for multiple optional tests

Decision Trees, Regression Trees and Variants 33 Alcohol <= 10.6 Volatile acidity <= 0.25 ... ... ... Citric acid <= 0.25 ... ... ... 5.82 (0.35) Alcohol <= 11.6 ... Volatile acidity <= 0.21 5.12 (0.33) ... Total sulfur dioxide <= 76.0

5.26 (0.43) 5.48 (0.56) Alcohol <= 9.9

Alcohol <= 10.3

Volatile acidity <= 0.2

... ...

Free sulfur dioxide <= 18

Figure 2: An illustration of an option tree. An example is classified by sorting it down every branch of an option node to the appropriate leaf nodes, then returning the combination of the predictions associated with those leaves.

enables poor early selection decision to be reconsidered later on, and duly revised. From a Bayesian perspective, the option node reduces the risk by averaging several choices.

The tree induction algorithm proposed by Kohavi and Kunz (1997) is a TDIDT induction algorithm. The evaluation criterion is based on the normalized information gain. The principal modification to the basic TDIDT algorithm is that instead of always selecting a single split, when several splits are close to the best one, the proposed algorithm creates an option node. Due to the majority voting technique employed, the minimum number of children of an option node is three (four or more choices are reasonable for multi-class problems). The pruning method is also modified to take into account the averaged error of all the children of an option node.

Kohavi and Kunz (1997) studied multiple criteria to choose the number of alternatives at an option node. The simplest one creates an option node whenever there are three or more attributes that rate a multiplicative factor (the option factor) of the best attribute with respect to the evaluation criterion. However, as using an option factor was shown to produce huge trees for some problem sets and no changes for others, Kohavi and Kunz (1997) investigated different strategies to constrain the creation of option nodes without harming the accuracy of the tree.

Their study revealed that option nodes are most useful near the root of the tree. Several explanations were offered to clarify this. The first one is from the perspective of a hill- climbing greedy search. Namely, the selection decisions that govern the splits near the root of the tree have access to little or no information on the goodness (or the optimality) of the final model that would be obtained by choosing one of several splits with similar estimated merits. On the other hand, multiple studies have confirmed that combining multiple models works best if their structures are not correlated or when they are anti-correlated. By introducing option nodes closer to the root, the sub-trees that might follow have a higher chance to be different from each other. The best practical compromise proposed by Kohavi and Kunz (1997) was to use an exponentially decaying option factor. Starting with an option factor

34 Decision Trees, Regression Trees and Variants

of x, each level is assigned a factor of x · ylevel, where y is the decay factor and the root node

is at level 0. Option nodes were further restricted to the top five levels.

Although, in a way, option trees resemble ensembles of models, the additional structure that they provide gives two important advantages: it avoids replication, and pinpoints the uncertainty in selection decisions. Option trees are deterministic and do not rely on random bootstrap samples. Coupled with an interactive tree visualizer option trees can provide an excellent exploratory tool. The only negative aspect of learning option trees is the increased resource consumption and processing time per example.