Durenia IV Sociedad Anónima, modificando el pacto social
A., modifica cláusula sexta de la representación, cláusula segunda,
Retraining
As confirmed in the previous subsection, one can see that a larger significance level leads to larger trees which correspond with a better fit to target concept. In this subsection, the effect of the initial operator set of triangular norms and conorms on the tree’s performance and size is investigated. To this end, we compare the accuracy and the number of retraining steps during the induction of an eFPT in its four different variations. eFPT is used with two initial sets for retaining operators (see Subsection 4.4.3), G¨odel and Lukasiewicz, for both significance levels α = 0.1 and α = 0.25.
Employing the same synthetic data sets used in the previous subsections, we present the performance in Figures 4.10 and 4.11, and the number of times an operator was retrained in Figures 4.12 and 4.13. As a result, one can observe the following:
Hoeffding tree
Hoeffding adaptive tree LUK Opt,α=.1
LUK Opt,α=.25
LUK, α=.1
LUK, α=.25
hyperplane, concept drift
RBF, concept drift
random trees, concept drift
Figure 4.9: Tree size of eFPT and Hoeffding trees when learning from synthetic data streams with simulated concept drifts.
• As expected, a larger significance level corresponds to larger trees (see Fig-
ures 4.10 and 4.11) and thus a better performance, provided the complexity and stability of the target concept to be learned.
• A small significance level leads to fewer changes and a smaller number of suc-
cessful operator retrainings.
• From Figures 4.10 and 4.11, one can see that the set of G¨odel operators seems
to be outperformed by the other type of operators.
• For the significance level α = 0.25, G¨odel operators are more prone to be
outperformed by other operators, and thus being replaced. This phenomena can be explained by the fact that G¨odel operators just consider the MIN/MAX values, which ignore any possible interactions between the operands.
LUK, α=.1 LUK, α=.25 Gödel, α=.1 Gödel, α=.25
hyperplane
random trees
RBF
Figure 4.10: Performance comparison between different eFPT parametrizations (sig- nificance level and retaining operators) when learning from synthetic data streams.
4.6
Summary and Conclusion
In this chapter, an evolving version of the fuzzy pattern tree classifier is proposed; this eFPT meets the requirements of adaptive learning on data streams. The key idea of eFPT is to maintain the current model and a set of neighbor trees that can replace the current model if the performance of the latter is no longer optimal. Thus, a modification of the current model is realized implicitly in the form of a replacement by an alternative tree. A replacement decision is made on the basis of the performance of all models, which is monitored continuously on a sliding window of fixed length.
Fuzzy pattern trees form an attractive model class of interpretable representation, besides the fact that they are universal approximators [144].
In an experimental study, we compared eFPT with the two versions of the Ho- effding trees and with IBLStreams on real and synthetic data. The obtained results
LUK, α=.1
LUK, α=.25
Gödel, α=.1
Gödel, α=.25
hyperplane, concept drift
RBF, concept drift
random trees, concept drift
Figure 4.11: Performance comparison between different eFPT parametrizations (sig- nificance level and retaining operators) when learning from synthetic data streams with simulated concept drifts.
are quite promising, despite the failure to learn on the RBF data. They suggest that eFPT is competitive in terms of accuracy, while being less affected by concept drift and producing smaller, more compact models. These criteria are of course interre- lated: The smaller a model is, the more easily and quickly it can be adapted in the case of a concept drift; besides, compactness of a model is of course desirable from an understandability point of view. On the other hand, producing large models can be advantageous in cases where the target concept to be learned is complex and the data generating process sufficiently stable; in our experiments, Hoeffding trees and IBLStreams performed comparatively well, especially in these cases.
LUK, α=.1 LUK, α=.25 Gödel, α=.1 Gödel, α=.25
hyperplane
RBF
random trees
Figure 4.12: Number of retrained operators for the different eFPT parametrizations (significance level and retaining operators) when learning from synthetic data streams.
LUK,α=.1
LUK,α=.25
Gödel,α=.1
Gödel,α=.25
hyperplane, concept drift
RBF, concept drift
random trees, concept drift
Figure 4.13: Number of retrained operators for the different eFPT parametrizations (significance level and retaining operators) when learning from synthetic data streams with simulated concept drifts.
Chapter 5
Survival Analysis on Event
Streams
This chapter introduces a method for survival analysis on data streams; survival analysis is an established statistical method for the study of temporal events or, more specifically, questions regarding the temporal distribution of the occurrence of events and their dependence on the features of the data sources.
To the best of our knowledge, survival analysis has not yet been considered in the stream setting so far. This is arguably surprising for several reasons. Most notably, the temporal nature of event data naturally fits the data stream model and event data is naturally produced by many data sources. Moreover, survival analysis is widely applicable and routinely employed in many application fields. Survival analysis, a term commonly used in medical studies, is also referred to as event history analysis in sociology, reliability analysis in engineering and duration analysis in economics.