• No se han encontrado resultados

La función electoral en el Estado se regirá por las disposiciones siguientes:

CAPÍTULO IV DEL PODER JUDICIAL

Artículo 66. La función electoral en el Estado se regirá por las disposiciones siguientes:

One of the most efficient algorithms which appeared after Apriori was FP-growth [45]. The most distinct feature of the algorithm is the use of a very complicated but compact data structure for storing the necessary part of the database. This structure, called FP-tree, allows an efficient procedure of extracting the frequent patterns.

Let us first focus on the FP-tree. It is a tree-like structure which can be defined as follows:

1. The FP-tree consists of: a root labelled by null, nodes connected in a tree structure starting from the root, a header table.

Apriori(D):

L1=all frequent 1-itemsets; for(k=2; Lk−16= ∅; k++) Ck=apriori-gen(Lk−1); forall transactions t∈ D Ct k=subset(Ck,t); forall candidates c∈ Ct k c.count++; Lk={c ∈ Ck|c.count ≥ minsup}; result=SkLk;

Figure 5.2: The Apriori Algorithm

link. The item name is the item represented by the node, the count represents the number of transactions contained in the path from the root to the node. The node link is a link to the next node with the same item in the tree or null if such a node does not exist.

3. Each of the rows of the header table contains an item name and a node link which points to the first node in the tree labelled by the same item. The structure is created using the FP-tree procedure which implements the following algorithm in two steps:

1. The frequent 1-itemsets are discovered by a scan of the database. The set is sorted in descending value of their support – it is denoted by L. 2. The root of the tree, R, is created. The database is scanned again and for

each transaction the items in the transaction are sorted following the order of L and inserted in the tree using the procedure insert-tree(t|T ,R) on the root R and the transaction t|T where t is the first item in the sorted transaction and T is the rest.

The insert-tree procedure considers the first item in the transaction t and checks if R has a child N labelled by the same item. If so, the count of that child is incremented, otherwise a new child N is created and assigned count 1. Further, the header table and the node links are updated if necessary. If the rest of the transaction T is not empty then the procedure is called recursively on T and N : insert-tree(T ,N ).

Clearly two database scans are necessary for the generation of the tree. Afterwards the information contained in this tree is enough for the process of frequent patterns mining.

An example will give a clearer picture of the structure of the tree and the building procedure. Table 5.1 shows a small transaction database which we

TID transaction 1 a b d 2 b c d f g h 3 a b d g 4 d e f 5 b d f 6 a f

Table 5.1: The example transaction database

use for constructing an FP-tree. The full tree is given in figure 5.3 where the parent-child links are denoted by solid lines and the node links by dashed lines. The first step is to scan the database and find the frequent 1-itemsets. Let us have a minimal support threshold of 2 which means that we are only interested in itemsets that appear in at least two transactions. The frequent 1-itemsets2 (sorted in decreasing value of the support) are: (d : 5), (b : 4), (f : 4), (a : 3), (g : 2), where the numbers after the colon signs are the support values. The portion of the database that contains only the frequent 1-itemsets is shown in table 5.2 where the transactions are also sorted in decreasing support.

We start with the first transaction{d, b, a}. The tree contains only the root and the transaction will be inserted as a branch with counts equal to 1 for each node. The second transaction is {d, b, f, g}. Starting from the root, we try to insert d. But the root already has a child labelled with d, thus we only increase the count of the node to 2 and proceed by trying to insert b in the subtree originating at the d-node. However, there is again a child with the label b and we increase the count of the b-node to 2 and proceed with f . The b-node this time has no child carrying the label f and we create a new f -node as a child with count 1. From that new node we try to insert g which results in creating a child with item g and count 1.

The insertion of the third transaction {d, b, a, g} results in increasing the counts in the branch d→ b → a and creating a new child of the a-node labelled with g. Proceeding in the same way we generate the full FP-tree given in figure 5.3.

The FP-tree is a highly condensed way of representing the relevant part of the database. This is due to the “collapsing” of transactions having the same prefix – the prefixes of such transactions are stored together and further processed together and only the count indicates that more than one transaction is present in this branch. In the extreme case of having a database of identical transactions, the tree will contain only one path corresponding to all the transactions.

2

In the following we reserve the round brackets notation for itemsets that are found to be frequent and appear in the output of the algorithm. The curly brackets will be used for transactions in a (conditional) data base.

TID transaction 1 d b a 2 d b f g 3 d b a g 4 d f 5 d b f 6 f a

Table 5.2: The sorted database containing only items originating from frequent 1-itemsets

d:5

b:4

f:4

a:3

g:2

d:5

b:4

a:2

g:1

root

f:2

g:1

f:1

f:1

a:1

FP-growth(T ree,α):

if T ree consists of a single path P

for each combination β of nodes in P

generate β∪ α with support= min support of a node in β; else for each ai in the header of T ree

generate β = ai∪ α with support= ai.support; construct β’s conditional pattern base;

construct β’s conditional FP-tree T reeβ; if T reeβ6= ∅

FP-growth(T reeβ,β);

Figure 5.4: The FP-growth procedure for mining the frequent itemsets

By sorting the items in decreasing support we influence the structure of the tree. The goal is to minimize the number of nodes by starting with the most frequent ones and to avoid generating new nodes by “collapsing” the identical ones.

The special structure of the FP-tree will obviously need specific procedures for extracting the frequent itemsets. This procedure is called FP-growth and is given in figure 5.4. It takes as an input an FP-tree and a (possibly empty) set of items that are called the prefix on the current step (denoted by α). At the first call of the procedure the parameters are the full FP-tree generated from the data set and an empty set for α.

The algorithm starts by checking if the tree consists of a single path. If this is the case each subset of it is a frequent itemset which reveals another efficiency feature of FP-growth – it is not necessary to mine the tree further. In the extreme case of a database containing only identical transactions the tree will contain a single path and the algorithm will simply terminate at this step. If the tree contains more than one path, then the current frequent itemsets are generated and the algorithm goes into recursion with each of them and their corresponding conditional FP-trees. The conditional FP-trees are generated using the FP-tree procedure discussed earlier based on the conditional pattern base. The conditional pattern base of an item ai is the collection of all prefix paths of the ai-nodes. They are extracted by starting from the head of the node-link for aiin the header table. From the first node reached we extract the path connecting it to the root and we attach to it the support of the ai-node. We go on following the node-links to the next ai-node to extract the next path. Let us look again at the example from figure 5.3. The mining process starts from the header table and the items are considered from the bottom (in other words we move from the leaves of the tree upwards). However, since the pro- cessing step of an item is independent from the processing of the other items in

the header table, the order does not influence the result.

First we output the pattern (g : 2) and then, starting from the header entry for g and following the node links, we extract the conditional pattern base for g. It contains two paths (considered here as transactions although there is no one-to-one correspondence with the original transaction database): {d, b, a} and {d, b, f }. They both have support 1 which is the support of the corresponding g-node connected to them. We count the support of the items and discover that a and f are not frequent (together with g). The remaining parts of the transactions are used to generate a conditional FP-tree. It contains a single path d→ b with support 2, thus we output all combinations of items from the path postfixed with g. The patterns are: (b, g : 2), (d, g : 2), (d, b, g : 2).

Note that the four patterns generated in this first step are the only ones in which the item g participates. Therefore there is no need to consider this item later in the mining process.

We go one step back in the recursion to the original tree and go on with the item a in the header table. First we output the pattern (a : 3). The conditional pattern base consists of two paths/transactions: {d, b : 2} and {f : 1} which shows that f is not frequent and the conditional FP-tree again contains a single path d → b. As a result we output the following patterns: (d, a : 2), (b, a : 2), (d, b, a : 2).

The algorithm goes on in a similar way with the items f , b and d.