Juegos modulares recreativos infantiles

The search methods described above ( Co, CT and MP) must effectively search all potential trees to find the tree(s) which is (are) optimal, for the optimality criterion used. Suppose this optimality criterion is evaluated for tree T by a function M(T).

There are

(2n - 5)!!

(2n - 5)(2n -

7) .. . (3)(

1

) rv

(2/e)n nn-2

possible unrooted binary trees, so if it is necessary to evaluate M on each of these, the search will take an impractically long time for even moderate values of

n,

but there are alternatives in some cases. If M can be broken down into a linear or other simple combination of functions which can be evaluated on parts of a tree, for example on the edges, then we can use a branch and bound algorithm to reduce the number of trees that need be examined. For a general description of branch and bound methods, see

(28].

CT, Co and MP are amenable to this treatment, which can speed up the process considerably

(43],

depending on the input data. The next section describes the

')ranch and bound method used in this study for the compatibility and closest .;ree methods, and the subsequent section describes the branch and bound method 1sed for maximum parsimony. Note that both techniques effectively consider all )Otential trees, so are equivalent to exhaustive searches. Exhaustive search, even .vith branch and bound techniques, is generally not possible for n more than 20-30, :lepending on the data and the implementation.

Note also that the input vector may be either a set of observed bipartition :requencies, the bipartition frequencies inferred by the Hadamard conjugation, or ')ipartition frequencies inferred from distances. The algorithms are the same for �ach type of input data: the input vector is simply referred to here as q'.

Branch and b ound for compatibility and closest tree

With the closest tree methods the quantity to be maximised,

will be maximised when

M(T) =

L

q�2-

2 (:L q�

q�) 2

eE internal edges of T eET is maximised. Note that

:L q�

q�

:L q�

q�.

eET eE internal edges of T eE pendant edges of T

The latter two terms in the above expression are calculated at the outset, as they are independent of T, and their sum included in the algorithms as the variable

"essentials" .

In the compatibility methods, the quantity to be maximised is

:Lq�,

eET which is maximised when

M(T) =

eE internal edges of T is maximised.

2.5. Search methods 33

The first tree considered, say T0, is found by a "greedy algorithm" : The· vector of inferred edge lengths is sorted in descending order, discarding the pendant edges (in practise, the input edge lengths q' are copied to a temporary array, and the

pendant edge lengths set to -1 in this array). Let the set of internal edges of the

current tree (initially T0) be S. The edge whose input length is largest is included in the internal edges of T0, so is the first element of S. Subsequent edges from the sorted list are included in S if they are compatible with all the edges already in S.

This constructs a set of (n - 3) compatible edges, which are the internal edges of

T0. Let the greatest value of 1\f(T) found be B, so B is initially M(T0 ) .

Note that the lengths of the internal edges of T0 are stored in decreasing order. The general principle of the branch and bound process for closest tree and compatibility is of keeping a 'core' array of edges S and stepping a candidate edge, say e , through the sorted array of inferred edge lengths (of those edges not already in the 'core') . At each step, an upper bound, say b, on the value of M(T) with (S U { e } ) E E(T) is calculated. For brevity, let

lv.f(S) = max(M(T) : S � E(T)) , with the maximization over all possible trees T.

If b < B, or if no compatible edge with positive inferred length can be found, the last element of S is removed, and stepped through the array of inferred edge lengths, as above. If b 2: B and I S I < (n -- 3), e is appended to S. If b > B and

I S I = (n - 3) (i.e., the tree is completely resolved) , then S is stored (as "bestS" )

and B is set to b. When ! S I = 0, the search is complete.

(In practise, the entries of S are the positions of the edge labels in the sorted vector of inferred edge lengths. )

Note that when the edge lengths are short, and few multiple changes of character state occur on the internal edges of Ta, the inferred q?s which do not correspond to the internal edges of Ta will be small. Hence the entries in the sorted q' vector will decrease rapidly after the first (n - 3). This means that the branch and bound process will be able to reject larger sets of candidate trees than if the edge lengths were longer and the decrease in the sorted q' vector less rapid. Therefore with shorter edge lengths, we expect the branch and bound search to be relatively faster than with long edge lengths.

Algorithm 2.2 describes the branch and bound process for closest tree and compatibility. The saving in computational cost for an example case is shown in

2.5. Search methods 35

Algorithm 2 . 2 : Branch and bound for Co and CT

variables: b,

In document 9796 pdf (página 123-131)