ISLA DE EL HIERRO

SECCIÓN II. OTRAS DISPOSICIONES SOBRE RÉGIMEN JURÍDICO

The output generated by the baseline system was evaluated against the gold-standard (see Section 3.2.1). As shown in Figure 5, the gold-standard only contains the evaluation of 16,826 mappings. However, with 32,246 snort messages mapped to a maximum

Tag Number of Mappings

Correct 9,222

Acceptable 5,496

Incorrect 2,108

Total 16,826

Table 5: Statistics of the Gold-standard Built by Two Cyber Security Analysts

of 6 CAPEC ﬁelds, the total number of possible mappings is 193,476 (6 × 32,246), therefore the evaluation of the outputs of the baseline system was made only on the overlapping answers; mappings provided by the system that were not included in the gold-standard were therefore not evaluated. For each overlap, three measures were recorded: Correct Mapping, Acceptable Mapping and Incorrect Mapping, depending on how the mapping quality was judged in the gold-standard dataset. Following the advice of our security analysts, recall was deemed more important than precision. Indeed, in this domain, it is preferable to alert clients too often with false alarms than to miss potential cyber threats. To account for this, two types of precision were computed: strict precision (PS_{) and lenient precision (}_PL_{) which are deﬁned as:}

Strict Precision: PS ₌ Correct M appings

(Correct + Acceptable + Incorrect) Mappings

Lenient Precision: PL = _{(Correct + Acceptable + Incorrect) Mappings}(Correct + Acceptable) Mappings

as well as two types of recall: strict recall (RS_{) and lenient recall (}_RL_):

Strict Recall: RS

= _CorrectCorrect M appings_{+ Acceptable + Incorrect}

Lenient Recall: RL₌ (Correct + Acceptable) Mappings Correct + Acceptable

Finally, we also calculated a series of F-Measures, which are a weighted combination of precision and recall. F-Measure is deﬁned as Fβ = (β

2 _{+ 1) × P × R}

β2 × P + R . If β = 1,

then precision and recall have the same importance; if β < 1, it means that recall

is favored; if β > 1, then precision is more important. In these experiments, we set

the weight beta to 0.5 (F_0.5), 1 (F₁) and 2 (F₂) respectively and also computed two versions: lenient F-Measures and strict F-Measures. These F-Measures are deﬁned as:

Strict FS 0.5 = (0.5 2 _{+ 1) × P}S _{× R}S 0.52 _{× P}S _{+ R}S = 1.25P S_RS 0.25PS _{+ R}S Lenient FL 0.5 = (0.5 2 _{+ 1) × P}L _{× R}L 0.52 _{× P}L _{+ R}L = 1.25P L_RL 0.25PL _{+ R}L Strict FS 1 = (1 2 _{+ 1) × P}S _{× R}S 12 _{× P}S _{+ R}S = 2P SRS PS + RS Lenient F₁L = (12₁+ 1) × P2 _{× P}L _{+ R}L × RL L = 2P L_RL PL + RL Strict FS 2 = (2 2 _{+ 1) × P}S _{× R}S 22 _{× P}S _{+ R}S = 5P SRS 4PS _{+ R}S Lenient FL 2 = (2 2 _{+ 1) × P}L _{× R}L 22 _{× P}L _{+ R}L = 5P L_RL 4PL _{+ R}L

Table 6 shows the default parameters indicated in Section 3.1.2 that we used to evaluate the baseline system. SimM IN represents the minimum similarity threshold to

match messages. With SimM IN = 0, this means that as long as the snort message

and attack ﬁeld are not completely orthogonal, they are considered similar. Expan-

sion indicates the use of snort rule name description to extend snort messages (see

Section 3.1.1). As Table 7 shows, the number of acceptable mapping is quite high as it accounts for 94% (5,178 / 5,496) of the total acceptable mappings, whereas only 1% of the correct mappings were found. The PL _{was 97.96% because of the contribution}

of acceptable mappings while the RL _{was only 35.22%. Table 8 shows that the} _FL

0.5

and F₁L were 72.23% and 51.81% respectively.

System Sim_MIN DF TV Expansion Nb of Features

Baseline 0 40 0.98 Yes 140

Table 6: Description of Input Parameters in Baseline System

System Number of Mappings Lenient Strict Correct Acceptable Incorrect PL RL PS RS

Baseline 108 5,178 6 97.96% 35.22% 0.11% 0.07% Table 7: Precision and Recall of the Baseline System

As we can see, although the mapping rate is 98.94%, the mapping quality is low because only 1% of the correct mappings were found. In next three chapters, we will describe several approaches to address this problem.

System Lenient Strict FL_0.5 FL₁ FL₂ FS_0.5 FS₁ FS₂

Baseline 72.23% 51.81% 40.40% 0.09% 0.08% 0.07% Table 8: F-Measure of the Baseline System

In this chapter, we have described the workﬂow of the baseline system and the attempt of [Scarabeo et al., 2015] to improve it through snort rule expansion. In addition, we explained how the mapping rate was initially evaluated (see Section 3.1.2) and how the measurement did not measure the quality of the mapping. We then described our work to evaluate the quality of the baseline’s output by creating a gold- standard and using the standard metrics of precision, recall and F-measure. In order to enhance the performance of the baseline system, the next chapters investigate three approaches:

1. Feature Selection and Snort Messages Supplement.

2. Pre-clustering Snort Messages.

3. Semantic Mapping by Latent Semantic Analysis.

In the next chapter, we will provide a detailed description of the snort messages supplement methodology as well as an analysis of the evaluation of the outputs.

Chapter 4 Feature Selection and Snort

Messages Supplement

Table 7 in Chapter 3 showed that the recall of the baseline system was only 35%. In order to improve the system performance, we experimented with three approaches:

1. Feature Selection and Snort Messages Supplement.

2. Pre-clustering Snort Messages.

3. Semantic Mapping by Latent Semantic Analysis.

In this chapter, we describe the first approach: n-grams feature selection to analyze the feature distribution and snort messages supplement. Section 4.1 describes our experiments with the use of a variety of feature sets and their effect on the evaluation of the system. After analyzing the feature distribution, we noticed that many snort messages suffered from a sparse representation. Indeed, although the snort rule descriptions extend the length of original snort messages (see Section 3.1.1), most of these messages are still quite short (below 15 words). To address this issue, we investigated the use of entities in the Common Vulnerabilities and Exposures (CVE) (see Section 2.1.1) to further supplement snort messages (see Section 4.2). The effect of this strategy is analyzed in Section 4.2.3.

4.1 Feature Selection

In the baseline system (see Chapter 3), snort messages and CAPEC ﬁelds are repre- sented by a mixture of unigrams, bigrams and trigrams. However, the contribution of each type of n-gram was not clear. To measure the usefulness of each type of n-gram, three experiments were performed: the use of unigrams only, bigrams only and trigrams only. Sections 4.1.1, 4.1.2 and 4.1.3 describe these experiments; while Section 4.1.4 provides an overall evaluation.

In document DISPOSICIÓN DEROGATORIA ÚNICA. DISPOSICIÓN FINAL PRIMERA. DISPOSICIÓN FINAL SEGUNDA. ANEXO (página 167-173)