• No se han encontrado resultados

In this section, we suggest the concept of weighted sequential patterns, and show their properties. A sequential pattern is a weighted infrequent sequential pattern if, following pruning, condition 6.1 or condition 6.2 is satisfied. If the sequential pattern does not satisfy both of these, the sequential pattern is called a weighted frequent sequential pattern.

Pruning condition 6.1 (support < min_sup && weight < min_weight)

The support of a sequential pattern is less than a minimum support and the sequence weight is less than a minimum weight constraint.

Pruning condition 6.2 (support * MaxW < min_sup)

In a sequence database, the value of multiplying the support of a sequential pattern with a maximum weight among items in the sequence database is less than a minimum support. In projected sequential databases, the value of multiplying the support of a sequence with a maximum weight of items in the projected sequential databases is less than a minimum support. Note that MaxW is used to maintain downward closure property.

Lemma 6.1 When two conditions are applied to prune weighted infrequent sequential

patterns, the case in which only pruning condition 6.1, but not pruning condition 6.2, is satisfied for pruning weighted infrequent sequential patterns, is that a MaxW of a sequence database should be greater than one.

Proof: In this case, pruning condition 6.1, but not pruning condition 6.2 in the above,

support of a sequence is less than a min_sup and the weight of the sequential pattern is less than a min_weight. However, the value of multiplying the support with a MaxW of a sequential pattern should be no less than a minimum support. We can see that the following two formulas should be satisfied.

Formula 1: support < min_sup

Formula 2: support * MaxW min_sup

We know that the MaxW of a sequence database or projected databases must be no less than one in order to satisfy both of the formulas. For example, assume that a minimum support is 5, a minimum weight is 0.8, a support of a sequence is 4, the weight of the sequential pattern is 0.7 and the MaxW of a sequential pattern in SDB is 1.3. We know that pruning condition 6.1 is satisfied but pruning condition 6.2 is not satisfied. Therefore, this sequential pattern is pruned by condition 6.1.

Lemma 6.2 There is no limitation on using pruning condition 6.2. That is, pruning

condition 6.2 (support * MaxW < min_sup) can be applied without limitation.

When only pruning condition 6.2, but not pruning condition 6.1 is satisfied to prune weighted infrequent sequential patterns, a MaxW (Maximum Weight) of a sequence database or projected databases can be any value.

Proof: In this case, a sequential pattern is pruned since pruning condition 6.2 is satisfied

although condition 6.1 is not satisfied. We see that the following two formulas should be satisfied.

Formula 3: (support min_sup || weight min_weight) Formula 4: (support * MaxW < min_sup)

If a support of a sequential pattern is no less than a minimum support in Formula 3, MaxW should be less than one to satisfy Formula 4. However, if a weight of a sequential pattern is greater than or equal to a minimum weight threshold in formula 3 and the support of a sequential pattern is less than a minimum support, there is no relationship between Formula 3 and Formula 4. In other words, pruning condition 6.2 (support * MaxW < min_sup) can be applied without limitation.

Lemma 6.3 When two pruning conditions are applied to prune weighted infrequent

sequential patterns, the method always prunes more than the approach of using only a minimum support when a MaxW of the transaction database or projected databases is less than one.

Proof: In normal frequent sequential pattern mining, every item has the same priority.

That is, their weights are 1.0. If pruning condition 6.2 is only considered, we can see that more sequential patterns will be pruned when weights of items are set to less than one. For example, assume that a minimum support is 4 and the support of a sequential pattern is 5. In normal sequential pattern mining, the sequential pattern is not pruned since weights of items in the sequence are 1.0 and the support (5) of the sequential pattern is greater than a minimum support (4). However, the sequential pattern is pruned when the weight of the sequential pattern is 0.7 by condition 6.2 in section 6.2.3.

Example 6.3: The columns in Table 12 show the set of weighted sequential patterns

after pruning weighted infrequent sequential patterns using pruning condition 6.2 by applying different WRs. For example, when WR3 is applied and a minimum support is 2,

pattern’s support (3) with a MaxW (0.6) in the SDB is less than minimum support (2), so a pattern “f” in each sequence in SDB can be removed. Meanwhile, the number of weighted sequential patterns can be increased when WR1 is used as a weight range. The

support of a pattern “g” in the sequence database is 2. However, a maximum weight is 1.3 so the value (2.6) of multiplying pattern’s support (2) with a MaxW (1.3) of a pattern is greater than a minimum support (2), so the pattern “g” is not pruned in the weighted sequence list.

Table 12. Weighted sequential patterns with different weight ranges

SID Weighted Sequence List (0.7 WR1 1.3) Weighted Sequence List (0.7 WR2 0.9) Weighted Sequence List (0.2 WR3 0.6) 10 <a(abc)(ac)d(cf)> <a(abc)(ac)d(cf)> <a(abc)(ac)dc>

20 <(ad)c(bc)(ae)bc> <(ad)c(bc)(ae)bc> <(ad)c(bc)(ae)bc>

30 <(ef)(ab)(df)cb> <(ef)(ab)(df)cb> <e(ab)dcb>

40 <eg(af)cbc> <e(af)cbc> <eacbc>

50 <a(ab)(cd)egh> <a(ab)(cd)e> <a(ab)(cd)e>

60 <a(abd)bc> <a(abd)bc> <a(abd)bc>

Example 6.4: Let us show another example by changing a minimum weight. In this

example, Table 10 and Table 11 are used for a sequence database and the weight range respectively. Assume that a weight range is 0.2 WR3 0.6 and a minimum support is

3. Then, the pruning condition 6.1 is applied as follows. If a minimum weight is 0.6, items “g” and “h” in each sequence are pruned. If a minimum weight is 0.4, the item “h” in each sequence is only pruned. Meanwhile, no item in each sequence is pruned if a

minimum weight is less than 0.4. In a similar way, the number of weighted sequential patterns can also be adjusted by using a minimum weight.