Postura del Estado colombiano - Posturas encontradas entre los diferentes actores en el context

6 Extractivismo, minería y problemática en Santurbán; una mirada a consecuente

6.3 Posturas encontradas entre los diferentes actores en el contexto colombiano

6.3.1 Postura del Estado colombiano

Examples of prune trees for training one dataset and the combination of two datasets using J48 are given in Figures 4.16 and 4.17. It is found that the prune tree for training one dataset is dierent from the prune tree of the combination of two datasets for training.

Figure 4.16: J48 prune tree for training one dataset

In Figures 4.16 and 4.17, the abbreviations J aroM, J aro, M on, Lev, AbbT okS, T okSynT, T okAbbSynT, T okT means JaroMeasure, JaroWinkler, MongeElkan, Lev-

enshtein functions, abbreviation and tokenization of source, tokenization and synonym of target, tokenization, abbreviation and synonym of target, and tokenization of target respectively. The values 0.9, 0.8, 0.7, 0.2, 0.1 are thresholds. As an example, the rule Jaro_TokT <= 0.9 means if the value of JaroWinkler function applied to the tokenization of target is less than or equal to the threshold value 0.9, then the conclusion is FALSE.

Figure 4.17: J48 prune tree for training the combination of two datasets In Figure 4.16,

The TRUE conditions are:

• JaroM_AbbTokS <=0.8 and JaroM_TokAbbSynT > 0.7 and Mon_TokSynT >

0.8 and Mon_TokS > 0.2

• JaroM_AbbTokS >0.8 and Jaro_AbbTokS > 0.9

• JaroM_AbbTokS >0.8 and Jaro_AbbTokS <= 0.9 and Jaro_TokT > 0.9

For example, if JaroM_AbbTokS >0.8 and Jaro_AbbTokS > 0.9 then TRUE. The FALSE conditions are:

• JaroM_AbbTokS <=0.8 and JaroM_TokAbbSynT > 0.7 and Mon_TokSynT >

0.8 and Mon_TokS <= 0.2

• JaroM_AbbTokS <=0.8 and JaroM_TokAbbSynT > 0.7 and Mon_TokSynT

<= 0.8

• JaroM_AbbTokS <=0.8 and JaroM_TokAbbSynT <= 0.7

• JaroM_AbbTokS >0.8 and Jaro_AbbTokS <= 0.9 and Jaro_TokT <= 0.9

For example, if JaroM_AbbTokS <=0.8 and JaroM_TokAbbSynT <= 0.7 then FALSE.

In Figure 4.17,

The TRUE conditions are:

• JaroM_AbbTokS > 0.9

• JaroM_AbbTokS <= 0.9 and Lev_TokSynT <= 0.7 and Jaro_TokT > 0.8 and

Jaro_TokT > 0.9

• JaroM_AbbTokS <= 0.9 and Lev_TokSynT > 0.7 and Jaro_AbbTokT > 0.2 • JaroM_AbbTokS <= 0.9 and Lev_TokSynT > 0.7 and Jaro_AbbTokT <= 0.2

and Lev_TokT <= 0.1

For example, if JaroM_AbbTokS > 0.9 then TRUE. The FALSE conditions are:

• JaroM_AbbTokS <= 0.9 and Lev_TokSynT <= 0.7 and Jaro_TokT > 0.8 and

Jaro_TokT <= 0.9

• JaroM_AbbTokS <= 0.9 and Lev_TokSynT > 0.7 and Jaro_TokT <= 0.8 • JaroM_AbbTokS <= 0.9 and Lev_TokSynT > 0.7 and Jaro_AbbTokT <= 0.2

and Lev_TokT > 0.1

For example, if JaroM_AbbTokS <= 0.9 and Lev_TokSynT > 0.7 and Jaro_TokT <= 0.8 then FALSE.

Example of KB of CPR based RDR

An example of a KB of CPR based RDR created for the datasets is given in Ta- ble 4.10. In the table, the columns RID, P ID, RT ype, Condition, Conclusion and CaseID mean rule id, parent rule id, types of rules condition for the rules, conclusion

produced by rules and the classied case id respectively. Rule types GB and R repre- sent a ground breaking rule and a rene rule respectively. A ground breaking rule is used as an alternative rule, and the conclusion of this rule is either TRUE or FALSE.

A rene rule is used as a censor rule, and the conclusion of this rule is NULL. The abbreviations Lev, S, T, AbbS, SynT, T okS, AbbT okS, AbbT okT, T okSynT, M on, Smith, N eedle, J aroW, J aroM mean Levenshtein function, source schema, target

schema, abbreviation of source, synonym of target, tokenization of source, abbreviation and tokenization of source, abbreviation and tokenization of target, tokenization and synonym of target, MongeElkan, SmithWaterman, NeedlemanWunsch, JaroWin- kler, JaroMeasure functions respectively. The values 1.0, 0.9, 0.7 are thresholds. For example, Lev_ST==0.8 means if the value of Levenshtein function applied to the source and target is equal to the threshold value 0.8, then the conclusion is TRUE.

Table 4.10: An example of KB for creating rules using CPR based RDR RID PID RType Condition Conclusion Classied

Cases

1 0 0 0 0 0

2 1 GB Lev_ST == 1.0 TRUE 1033

3 1 GB Source== AbbT TRUE 1

4 1 GB JaroW_SynT == 1.0 TRUE 240 5 1 GB Mon_TokS == 1.0 TRUE 204 6 1 GB Lev_TokT== 1.0 TRUE 489 7 1 GB Needle_ST == 0.9 FALSE 828 8 1 GB Lev_AbbTokS >= 0.9 and JaroW_AbbTokT >= 0.9 TRUE 269 9 1 GB Smith_TokSynT >= 0.9 TRUE 125 10 9 R Mon_AbbTokS == 0.2 and JaroW_TokS <= 0.3 NULL 699 11 6 R Lev_ST == 0.8 NULL 85 12 1 GB Lev_TokSynT == 0.8 TRUE 90 13 1 GB JaroM_ST <= 0.7 FALSE 2 14 5 R Smith_TokSynT== 0.8 NULL 383 15 1 GB Lev_AbbS == 0.8 TRUE 1183

In Table 4.10, rule id 1 (RID=1) is an entry rule in the KB. It is always true. For example, the rules 2 to 9 are used to classify cases of one dataset. The rules 2 to 9 are

applied to classify other datasets. In order to solve incorrect classications of other datasets, the censor rules 10 and 11 are created by the knowledge acquisition process of the CPR based RDR approach to make the classication NULL. Following this, the alternative rules 12 and 13 are created to classify the cases as TRUE and FALSE respectively. In such a way, the rules up to 15 are created. The additions of these rules over time mean the KB is incrementally increased as new knowledge is added.

Comparison between Prune Trees and KB

I compare the prune trees of J48 (Figures 4.16 and 4.17) and KB of CPR based RDR (Table 4.10). I nd that rules of the prune tree created for training one dataset are dierent from the rules of prune tree created for training the combinations of two datasets. If schema data changes over time, then it is necessary to recreate a training model by J48, and this model is completely dierent from the previous models. For the CPR based RDR approach, some rules are created to classify one dataset and the same rules are later reused to classify another new dataset. If the rules create incorrect classications, censor and alternative rules are added for making correct classications. So the KB is not completely dierent; rather the same KB with incrementally added rules are reused to classify new schema datasets. This approach incrementally in- creases performance in terms of precision, recall and F-measure, and it decreases rules additions.

In document Eco oro ante el Estado colombiano: Incidencia de los factores que llevan a la demanda, desde una perspectiva de inversión extranjera (página 85-87)