¿Qué aporta esta información?
PASO 6. Trazando la teoría de cambio de la organización
Although the RTP framework focuses the search on temporal patterns that are potentially important for predicting the class variable, not all frequent RTPs are important for clas- sification. Besides, many RTPs may be spurious (see Section 3.4) as we illustrate in the following example.
Example 17. Assume that having elevated creatinine level (creatinine=High) is an impor-
tant indicator of renal failure. If we denote this pattern by P, we expect conf(P ⇒ renal- failure) to be much higher than the renal-failure prior in the entire population of patients.
Algorithm 2: A high-level description of candidate generation. Input: All frequent k-RTPs: Fk; all frequent states: L
Output: Candidate (k+1)-patterns: Cand, with their p-RTP-l ists 1 Cand =Φ;
2 foreach P ∈ Fkdo 3 foreach S ∈ L do
4 C = extend_backward(P, S); (Algorithm1) 5 for q = 1 to | C | do
6 C[q].p-RTP-l ist = P.RTP-list ∩ S.list; 7 if ( | C[q].p-RTP-l ist | ≥ σy ) then 8 Cand = Cand ∪ {C[q]}; 9 end 10 end 11 end 12 end 13 return Cand
Now consider patternP0that extendsP backward with a state indicating a normal value for white blood cell counts:P0: WBC=Normal before creatinine=High. Assume that observing P0does not change our belief about the presence of renal failure compared to observingP. As we discussed in Section 3.4, conf(P0⇒ renal-failure) ≈ conf(P ⇒ renal-failure). Intuitively, the instances covered by P0 can be seen as a random sample of the instances covered by P. So if the proportion of renal failure forP is relatively high, we expect the proportion of renal failure forP0to be high as well. The problem is that if we evaluateP0by itself, we may falsely think that it is an important pattern for predicting renal failure, where in fact this happens only becauseP0contains the real predictive patternP.
In general, spurious RTPs are formed by adding irrelevant states to other simpler pre- dictive RTPs. Having spurious RTPs in the result is undesirable because they overwhelm the user and prevent him/her from understanding the important patterns in data. In order
to filter out such spurious patterns, we extend the minimal predictive patterns framework (Section3.5.2) to the temporal domain.
Definition 11. A temporal pattern P is aMinimal Predictive Recent Temporal Pattern
(MPRTP) with respect to class label y if P predicts y significantly better than all of its suffix subpatterns.
∀S such that Suffix(S, P) : BS(P ⇒ y,GS) ≥ δ
WhereBS is the Bayesian score we defined in Section3.5.1.2,GS is the group of MSS in the data whereS is an RTP andδ is a user specified significance parameter.
The algorithm in Section5.4 describes how to mine all frequent RTPs from data Dy. In order to mine MPRTPs, the algorithm requires another input: D¬y, the MSS in the data that do not belong to class y. Mining MPRTPs is integrated with frequent RTP mining using an algorithm similar to the one described in Section 3.5.3 for mining MPPs. The algorithm utilizes the predictiveness of RTPs to prune the search space using a lossless pruning technique and a lossy pruning technique.
lossless pruning: This technique is similar to the lossless pruning used for MPP min-
ing (Section 3.5.4.1). The idea is to prune a frequent RTP P if we guarantee that none of its backward-extension superpatterns is going to be an MPRTP. We know that for any backward-extension superpattern P0, the following holds according to Corollary1:
RTP-supg(P0, Dy) ≤ RTP-supg(P, Dy) ∧ RTP-supg(P0, D¬y) ≤ RTP-supg(P, D¬y)
We now define the optimal backward-extension superpattern of P with respect to class y, denoted as P∗, to be a hypothetical temporal pattern that is an RTP in all instances from y, but not in any instance from from the other classes:
RTP-supg(P∗, Dy) = RTP-supg(P, Dy) ∧ RTP-supg(P∗, D¬y) = 0
P∗ is the best possible backward-extension superpattern for predicting y that P can gener- ate. Now, we safely prune P if P∗ does not satisfy the MPRTP definition.
lossy pruning: This technique is similar to the lossy pruning used for MPP mining
(Section3.5.4.2). The idea is that if we are mining MPRTPs for class y, we prune RTP P if we have evidence that the underlying probability of y in GP (the group of MSS in Dy where
P is an RTP) is lower than the probability of y in the entire data. To decide whether this is the case, we apply our Bayesian score to evaluate rule P ⇒ y compared to Gφ and we prune
P if model Ml is the most likely model (see Section3.5.1.2).
The rationale behind this heuristic is that if the probability of y in the MSS covered by P is low, we also expect the probability of y in the MSS covered by its backward-extension superpattern P0 to be low as well. Thus, P0 is unlikely to be an MPRTP. Note that this heuristic is lossy in the sense that it speeds up the mining, but at the risk of missing some MPRTPs.