• No se han encontrado resultados

Capítulo 3. Las Universidad de los Mayores

III- Resultados

2. Análisis descriptivo de los instrumentos

– Extract common nodes (event types). – Calculate node (n) similarity using:

sim(L1, L2) = 2 ∗

number of (n1∩ n2) number of (n1+ n2)

(5.1)

– Extract common edges using the method of 2-gram extraction between two consec- utive nodes.

– Calculate edge (e) similarity using:

sim(L1, L2) = 2 ∗

number of (e1∩ e2) number of (e1+ e2)

(5.2)

– Calculate total similarity score as:

Total sim(L1, L2) =

node similarity score + edge simialirty score

2 (5.3)

3. Take the average score of the symmetric matrix that holds cross state similarity of all sub logs as a score of that model.

5.4.4

How to calculate state importance

State importance is a simple metric and is calculated as:

State importance = number of cases in a state ∗ 100

number of all cases ≥ threshold (5.4)

This metric returns how many non-significant states a model has.

5.5

Criteria Properties

The investigation of the property of the proposed criteria aiming at understanding how these criteria can be used to cope with the issues found in HMMs. It should be noted that, state importance is a constraint criteria, since it has a predefined threshold, and will play a penalty role in our multi-objective function. Therefore, we explore here the property of unconstrained criteria which are; linearity, state compactness and cross state similarity.

The figures used here are related to event logs that were used in experiments 1,2 and 4 in Chapter 4. The event log of experiment 3 is discarded to avoid redundancy since it has trained with same number of hidden states in experiment 2.

Models in X axis in all figures started from the smallest number of hidden states to the largest for, instance, model number 1 is trained with 2 hidden states and model number 2 is trained with 3 hidden states and so on.

5.5.1

Linearity

High linearity models are likely to be models with a small number of states. The more states the lower the linearity will be. Figure 5.4 shows that, high linear process is found in model number 3 in (a), model number 2 in (b) and model number 1 in (c) where these models have 4, 3 and 2 hidden states respectively.

(a) Synthetic process (b) Emergency room process (c) Colorectal cancer process

Figure 5.4: Linearity criteria property

5.5.2

State compactness

Models with highest compactness are likely to be models with a large number of states. The higher the number of states, the better compactness where the distance between processes inside a state approaches 0. The compactness keep improves as we go positively in X axis as shown in Figure 5.5. The best compactness is model number 5 , 8 and 10 for (a) synthetic process, (b) emergency process and (c) colorectal cancer process respectively.

(a) Synthetic process (b) Emergency room process (c) Colorectal cancer process

Figure 5.5: State compactness criteria property

5.5.3

Cross state similarity

Cross state similarity has also a fluctuating pattern where it starts with high cross similarity score, then drops with low score after that some high scores might be found again. Generally, models with desirable low cross state similarity are likely to be captured before approaching

97 5.5. Criteria Properties

best compactness points.

It can be clearly seen in Figure 5.6 that several minimum points of cross state similarity are reached before approaching the point of best compactness. For example, in (a) synthetic process, the cross state similarity has dropped in model number 4 before model number 5, which has best compactness.

Also, in (b) emergency process, low similarity between states is detected in model number 3, 4 and 6 before model number 8, that has best compactness. Likewise in (c) colorectal cancer process has several models with a very low cross state similarity score such as model number 6, 7 and 9 before best compactness model, model number 10.

(a) Synthetic process (b) Emergency room process (c) Colorectal cancer process

Figure 5.6: Cross state similarity criteria property

5.5.4

Discussion

Exploring the criteria properties helps in understanding the potential impact of the proposed criteria on HMMs issues. For more clarification, the empirical results in Chapter 4 have shown that the higher number of states in a model, the more likely of detecting strong connected components. Also, high linearity model is likely to be a model with few number of states. Thus, linearity criteria may act as an effective factor that helps in choosing the model with less number of connected components.

On the other hand, results of previous experiments reported that similar states issue is not only restricted to models with high number of states but can be found in models of few number of states likewise, see Figure 5.6. This might be because of the bad initialization which is a consequence of the lack of knowledge about the data. Adopting a metric that prefers a model with high dissimilarity between states might help in coping with this similar states issue. Hence, the criteria of cross state similarity could play a role for addressing such issue.

An attention should be paid for preferring a model with high dissimilarity between states since it may result in choosing a model with a large number of states where each state holds one single event type. Therefore, the risk of preferring a model with high dissimilarity between states can be mitigated by favouring a model with a reasonable states variance along with dissimilar states. State compactness here may play a role in preferring a model of moderate

states variance.

Moving to the third issue that is related to the presence of unimportant states where this problem seems to be resulted form insufficient data along with bad initialization. Some concerns may rise regarding the EM algorithm, that is used for learning HMM models, which is not smart enough to decide if a state is significant or not. Therefore, we intend to use a penalty term where a process miner can set a threshold for state importance that should be presented in the model.