SALA COMEDOR - ANALISIS DE ÁREAS 1 COCINA

Actividades diarias

3.1 ESTUDIO DE NECESIDADES Y ACTIVIDADES DEL ESTUDIANTE UNIVERSITARIO

3.1.6 ANALISIS DE ÁREAS 1 COCINA

3.1.6.2 SALA COMEDOR

This section presents an experimental comparison of theEx-

plainerto the state of the art algorithms by [25], [26] and [28] on 36 classification datasets from the UCI machine learning repository [36], with the number of features ranging from 4 to 5000 and number of samples from 22 up to nearly 100k.

Since there is a lack of the prior art on explaining anomalies, the methodology to measure quality of an explanation has not been established yet. Therefore, this work measures how the area under the receiver operating characteristic (AUC) of an anomaly detector improves if the anomaly detection uses only features used in the explanation in comparison to the baseline when all features are used. The rationale behind is that the explanation should contain only features in which the anomaly is most different from the remaining data. Therefore, the AUC in the reduced space should be comparable or even higher than the AUC computed on the full space. Also recall that all recent prior

art to which theExplaineris compared stop the explanation

after identifying the set of features in which anomaly is visible, which further justifies the choice of using AUC for the comparison.

The algorithms are also compared on the basis of the number of returned features and scalability, which is important for contemporary large data.

4.1 Experimental settings

The Explaineris compared to the algorithms of [25], [28] and [26]. The implementations of the first two algorithms

were kindly provided by their authors, while the last algorithm was implemented by ourselves. The compared algorithms tried to explain anomalies identified by Local Outlier

Factor (LOF) by [23], ak−nearest neighbor (knn) classifier

by [38] and the Isolation Forest (iForest) by [31]. To avoid the influence of incorrect detections, the perfect detector scenario was simulated by explanation of the ground truth (GT) anomalies.

As mentioned above, the performance of anomaly detectors was measured by the area under the receiver operating characteristic (AUC) of an anomaly detection algorithm using only the features selected by the explanation algorithm. The reported AUC values were measured by the anomaly detector under investigation, e.g. LOF was used to measure AUC in all LOF related experiments. For the evaluation of experiments with ground truth anomalies the Isolation forest detector was used because it is the fastest detector with very good overall performance.

The anomaly detection algorithms were set as follows. The Isolation forest was set to create 100 trees using 256 samples for each, the thresholds between normal and anomalous samples were estimated from data, separately for each dataset. The size of local neighbourhood for the LOF detecter was set to 10 and detection threshold was set to 2, in accordance with the recommendation in [23]. The

k-nn used onlyk= 4, while the threshold on a distance to

the average of theknearest neighbours was estimated from

data, separately for each dataset.

TheExplainerused training sets of size|T |= 100. The

number of trained trees was set to nT = 5for minimal

explanation, whereas it is determined automatically during the run of the algorithm for maximal explanation. The other algorithms were set up as recommended by the authors in

the respective papers. In [25], the neighbouring set sizek

varied from 10 to 40, in [26], the regularization parameter

was set toα= 0.1and the neighbourhood sizes varied from

5 to 25. Finally, [28] has two parameters and according to the authors, the algorithm should be robust to their setting,

which our experiments withαvaried from 0.1 to 0.7 and

kfrom 20 to 50 have confirmed. The reported results were

obtained with the settingα= 0.5andk= 50.

The experimental comparison used 36 different datasets from UCI machine learning repository, [36], covering many application domains. Since they are designed for supervised classification, they were converted to anomaly detection problems as recommended by [11]. The LOF alone showed to be the worst of all tested detection algorithms as it did not found any outliers in 6 out of 36 datasets. Therefore, the comparison was done using only 30 datasets.

4.2 Experimental results

Quality of explanation measured by the AUC At first, the quality of explanation was compared by measuring the aforementioned improvement in AUC. Due to the large number of problems on which the algorithms are compared, the results are summarized by critical difference diagrams as proposed by [12]. Figure 3 shows the average rank of each algorithm over all 36 problems, groups of algorithms that are not significantly different according to the Nemenyi

post-hoc analysis are connected. Table??contains all results

8 1 2 3 4 5 Explainermin Explainermax Dang 2014 Dang 2013 Micenkova

(a) ground truth

1 2 3 4 5 Explainermin Dang 2014 Explainermax Dang 2013 Micenkova (b) Isolation forest 1 2 3 4 5 Explainermin Dang 2013

Dang 2014 Explainermax Micenkova (c)k-nn 1 2 3 4 5 Explainermin Explainermax Dang 2014 Dang 2013 Micenkova (d) LOF

Fig. 3. Critical comparison of explainers when ground truth anomalies (top-left) and anomalies discovered by Isolation forest (top-right),k-nn (bottom-left) and LOF (bottom-right) were explained.

TheExplainerminperforms consistently the best regard-

less the anomaly detector used to identify the anomalies, though the margin of the improvement over the prior art depends on that detector. For anomalies identified by

Isolation Forest andk-nn, the statistical test did not reject

the hypothesis that the Explainermin and Dang 2013 and

Dang 2014 are statistically different. Still, in both cases theExplainermin achieved the lowest rank. For anomalies

identified by the LOF, theExplainerwas significantly better

than prior methods.

When the effect of an anomaly detector is removed

by explaining the ground truth, both theExplainerminand

Explainermax are significantly better than all of the prior

art according to the [39] test and the post-hoc tests with the corrections for simultaneous hypotheses testing by [40]. Precise results are shown in Table 3.

The algorithm of [28] didn’t work well with thek-nn

detector as it didn’t return any features as an explanation in 17 out of 36 cases, despite the algorithm has been executed with a wide range of parameters.

Further investigation, of the cases where the Ex-

plainerminperformed worse revelled that in multiple fea-

tures, waveform-1 and waveform-2 datasets in combination

withk-nn anomaly detector, theExplainerminhas removed

features responsible for clustering normal samples together,

which misleads thek-nn detector to create new anomalies

that were not present in the full space version of the dataset. Yet authors believe that the explanation was correct and the worse performance is due to the detector rather than due to theExplainer. TheExplainerminexcels on high dimensional

datasets like gissete, madelon, libras and on noisy datasets. theExplainermaxseems to be more susceptible to mistakes

in detection than theExplainermin. While the explanations

were reasonable, anomaly detection on more features was typically worse (see Table 6) since more features increase the signal to noise ration to which anomaly detectors are sensitive, and which explains the worse AUC.

To conclude this comparison, the performance of [28] algorithm strongly depends on the particular dataset. The [26] improved the [25] and is the closest prior art to the

Explainerin the terms of identification of features in which the anomaly is detectable.

The compactness of explanation by number of features

TABLE 3

Results of the statistical test of [39] and its post-hoc tests with the correction for simultaneous hypotheses testing by [40]. The rejection thresholds are computed for the family-wise significance levelα= 0.05

algorithms p rejection threshold Explainerminvs. Micenkova 2013 1.52e-15 0.005

Explainerminvs. Dang 2013 2.45e-12 0.008

Explainermaxvs. Micenkova 2013 6.15e-12 0.008

Explainermaxvs. Dang 2013 1.81e-12 0.008

Explainerminvs. Dang 2014 1.27e-6 0.008

Dang 2014 vs. Micenkova 0.0017 0.013 Explainerminvs. Dang 2014 0.0022 0.013

Dang 2013 vs. Dang 2014 0.0306 0.017 Explainerminvs.Explainermax 0.0736 0.025

Dang 2013 vs. Micenkova 0.3325 0.050

Figure 4a shows the median of selected features used in

explanation of the ground truth anomalies. Since theEx-

plainerminwas designed to select as few features as possi-

ble, it consistently selects the minimal number of features among all tested methods. Taking into account that it also outperforms all prior art in terms of the AUC, it can be con- cluded that its explanations are more compact and therefore more precisely specifying root causes of anomalies, which

was its goal. Surprisingly, even though the Explainermax

should return all relevant features (and therefore returns

more features than theExplainermin), their number is still

considerably lower than in case of all prior art methods, especially for large datasets.

The comparison of running times The last comparison was done with respect to the computational complexity, measured as a total time to explain all ground- truth anomalies. As in the previous experiment, the last two datasets are missing for both Dang’s algorithms and one for Micenkov´a, because of their excessive memory and computational requirements. Figure 4b show running times averaged over 20 runs. On smaller datasets, the algorithm of [26] clearly outperformed all other algorithms, but for the

larger datasets, it is evident that theExplainer methods are

much more efficient. The break point is approximately 1000 samples for low dimensional datasets or 200 samples for high dimensional datasets.

Comparison with Isolation forest:

The Explainer and Isolation forest are both based on a random forests trained only on a small sub sample of data, so they may seem similar. This section shows that not only their main purpose but also their outcomes differ substantially.

Isolation forest is an ensemble of random trees. More specifically, both the feature and threshold are selected randomly for each splitting node. The main idea is that anomalies differ strongly from the normal samples, therefore, they should be separated with fewer splits. There is no focus on identifying key features or any other way of explanation or description of findings.

The main goal of theExplainer is to identify features

responsible for the anomalous nature of the sample under analysis. Therefore, each split uses the feature and threshold that produces the best possible separation of an anomaly and normal samples. Features used in spilt nodes are then assembled into classification rules.

9 5 10 15 20 25 30 35 100 101 102 103 104 datasets n um b er of features All features Explainermin Explainermax Dang 2013 Dang 2014 Micenkova

(a) number of features

76 200 240 570 1528 3000 45586 10−2 10−1 100 101 102 103 104 105 number of samples time [s] Explainermin Explainermax Dang 2013 Dang 2014 Micenkova (b) running times

Fig. 4. The comparison of the number of selected features per dataset. The curve marked as all features shows the number of features of a particular dataset (left). The comparison of the running times (right). Please notice that number of features as well as running times are plotted in logarithmic scale.

One of the basic properties indicating the comprehensibility of a rule is its length. A comparison of the average rule length for each dataset is depicted at Figure 5 and sum- marised in Table 4.

TABLE 4

The average length of rules extracted fromExplainermin,Explainermax

and Isolation forest over all datasets. Explainermin Explainermax Isolation forest

2.47 4.06 5.79

The rules produced by Isolation forests are not only

longer (even against the Explainermax) but they have also

other problems, caused by the random nature of this method. First, extracted rules are inconsistent and used features vary every time a forest is trained. Secondly, rules extracted from a random tree can contain a rule on one feature many times, even in succession, or contain rules without any impact on resulting set. Therefore, rules extracted from Isolation forests are not suitable for the explanation.

5 10 15 20 25 30 35 0 5 10 15 datasets av erage rule length Explainermin Explainermax Isolation forest

Fig. 5. Comparison of the average length of the rules extracted from the

Explainerand the Isolation forest anomaly detector.

2 4 6 8 10 12 14 0.5 0.6 0.7 0.8 0.9 1 datasets A UC |T |=10 |T |=20 |T |=50 |T |=100 |T |=200 |T |=500 |T |=1000

(a) ground truth

2 4 6 8 10 12 14 0.5 0.6 0.7 0.8 0.9 datasets A UC |T |=10 |T=20 |T |=50 |T |=100 |T |=200 |T |=500 |T |=1000 (b) Isolation forest 5 10 15 20 25 30 35 0.5 0.6 0.7 0.8 0.9 1 datasets A UC nT=1 nT=2 nT=5 nT=10 nT=20 nT=50 (c) ground truth 5 10 15 20 25 30 35 0.5 0.6 0.7 0.8 0.9 datasets A UC nT=1 nT=2 nT=5 nT=10 nT=20 nT=50 (d) Isolation forest Fig. 6. AUCs of anomaly detection on datasets with more than 1000 samples (14 of all used dataset) using only features suggested by the

Explainerwith varying training set sizes when ground truth anomalies were explained (top-left) and when anomalies identified by Isolation forest were explained (top-right). The reported values are the average

of 10 runs training 10 trees per anomaly. TheExplainer was used

to explain the ground truth anomalies (bottom-left) and the anomalies identified by the Isolation forest detector (bottom-right). The AUCs are average of 10 runs using only features suggested by theExplainerwith the varying number of trees trained per anomaly and fixed training set size to 100 samples.The results for each setting were sorted according to their AUC, showing general performance rather than performance on specific datasets. The interpretation is simple, the line of the setting leading to better average AUCs will be higher than the line representing the setting leading to lower average AUCs.

4.3 Sensitivity to parameters

TheExplaineris controlled by two parameters: by the train-

ing set size |T | and by the number of trees trained per

anomaly nT. In this section we show the robustness of

theExplainerto the setting of those parameters, effectively

10 TABLE 5

Results of the Friedman’s statistical test with the correction for simultaneous hypotheses testing by [40], calculated while testing the training set size of theExplainerminfor explanation of the ground truth

anomalies. The rejection thresholds are computed for the family-wise significance levelα= 0.05

settings p rejection threshold

|T |=200 vs.|T |=1000 0.325 0.0023 |T |=10 vs.|T |=1000 0.407 0.0025 |T |=20 vs.|T |=1000 0.437 0.0026 |T |=100 vs.|T |=1000 0.468 0.0028 |T |=50 vs.|T |=200 0.501 0.0029 |T |=200 vs.|T |=500 0.534 0.0031 |T |=10 vs.|T |=50 0.604 0.0033 |T |=20 vs.|T |=50 0.641 0.0036 |T |=10 vs.|T |=500 0.641 0.0038 |T |=50 vs.|T |=100 0.678 0.0042 |T |=20 vs.|T |=500 0.678 0.0045 |T |=100 vs.|T |=500 0.717 0.0050 |T |=500 vs.|T |=1000 0.717 0.0056 |T |=50 vs.|T |=1000 0.756 0.0063 |T |=100 vs.|T |=200 0.795 0.0071 |T |=20 vs.|T |=200 0.835 0.0083 |T |=10 vs.|T |=200 0.876 0.0100 |T |=10 vs.|T |=100 0.917 0.0125 |T |=50 vs.|T |=500 0.958 0.0167 |T |=10 vs.|T |=20 0.958 0.0250 |T |=20 vs.|T |=100 0.958 0.0500

At first, theExplainerminwas executed with the number

of trees per anomaly nT = 5while varying the training

set size T ∈ {10,20,50,100,500,1000}, for that reason the experiment was performed only on datasets with more than 1000 samples. AUCs of different configurations, measured by the Isolation forest detector, are shown for ground truth anomalies in Figure 6a and anomalies discovered by the Isolation forest 6b. According to these results, the

Explainerdoes not seem to be sensitive to the size of the training set, as all AUCs are very similar. Results of the Friedman’s test and its post-hoc tests are all in 5. When explaining ground truth, training sets as small as 50 - 100 samples were sufficient. When explaining anomalies identified by a real anomaly detector, larger training sets like 100– 200 samples seems to be more appropriate. The finding that even a small training set is sufficient for good explanations is especially important for data-streams, where all samples needed for explanation have to be stored in memory.

Fixing the size of the training set to T = 100,theEx-

plainerminwas executed with different number of trees per

anomalynT ∈ {1,2,5,10,20,50}. AUCs of all configura-

tions, measured by the Isolation forest detector, are shown in Figure 6c for ground truth anomalies and in Figure 6d for anomalies identified by Isolation Forests. Similarly to the

size of the training dataset, theExplainerdoes not seem to

be sensitive to this setting. Results of the Friedman’s test and its post-hoc tests are all in the Appendix C. According to the results using 5-10 trees per anomaly is recommended because its stable while still being computationally efficient. 5 CONCLUSION

This paper has presented a novel approach to explaining the deviation of a sample, identified by an arbitrary anomaly detection algorithm, from the rest of data. The proposed

approach relies on specifically trained random forests to identify rules explaining the anomaly. Those rules are then assembled into the explanation in the form of a set of association rules or in the disjunctive normal form and presented to the user. The set of association rules is easy to read and understand by humans, whereas the disjunctive normal form was chosen because it is compact and practical for further machine processing.

The proposed approach has been extensively compared to the relevant prior art on 36 UCI machine learning repository datasets while explaining anomalies identified by multiple anomaly detection algorithms. The results show a superiority of the proposed solution, as it provides tighter and more accurate explanations. In addition, our testing has shown that it is also superior with respect to the time complexity, especially on the high dimensional datasets.

As to the theoretical fundamentals of our approach, we proved that our simplified splitting criterion used during the specific training of random forest is for this kind of random forest equivalent to the common best practice splitting criteria based on information gain or Gini impurity. APPENDIX

LetHdenotes the space of all possible splitting rules defined

as: H={hj,θ|j∈ {1, . . . , d}, θ∈R}, where hj,θ(x) = ( +1 ifxj> θ −1 otherwise

with xj being the jth feature of xand θ representing a

threshold. The most popular criteria to select the splitting

functionh∈ Hof a new internal node areinformation gain

defined as IG(S, h) =H(S)− X b∈{L,R} |Sb₍_h₎_| |S| H(S b₎ ₍₇₎

andGini impuritydefined as

GI(S) = 1− X

b∈{L,R}

f2₍

Sb₍_h₎₎_, ₍₈₎

whereSis the subset of the training setTreaching the leaf

being split,SL₍_h_{) =}_{_x_{∈ S|}_h₍_x_{) = +1}_}_and_SR₍_h_{) =}_{_x_∈

S|h(x) =−1},H(S)is the Shanon entropy ofSandfis

the fraction of samples of the specified class in the set. In the following lemma, we show that when training set

T contains exactly one anomaly, both criteria (7) and (8) are

equivalent and equal to

arg min h∈H|S

(h)|, (9)

where|Sa_|_{is the size of the set containing the anomaly after}

splitting. This splitting criterion is easy to interpret and can be calculated more effectively than information gain or Gini impurity.

Lemma A.1. Denote Tq(p1, . . . , pk)the Havrda-Charv´at entropy with indexq≥1, up to a multiplicative constant coinciding with the more recently introduced Tsallis entropy, [41]) for a finite-dimensional probability distribution(p1, . . . , pk), i.e.,

In document DISEÑO INTERIOR RECICLANDO CONTEINERS CON SISTEMAS DE ILUMINACION; CALENTAMIENTO DE GUA, MEDIANTE LA ENERGIA SOLAR Y MOBILIARIO MULTIFUNCIONAL DIRIGIDO A ESTUDIANTES UNIVERSITARIOS (página 146-151)