7. Análisis de los Programas de Seguridad del Paciente en Consulta Externa en Institución 1,
7.3 Estrategias para mejorar la seguridad del paciente en la IPS Javesalud
2.3.1.
2.3.3 Evolutionary Approach
Seipone and Bullinaria (2005) developed an approach to incremental learning using evolving neural networks. An evolutionary algorithm is used to evolve some MLP parameters, such as the learning rates, initial weight distributions and error tolerance. The evolutionary process aims at evolving the parameters to produce networks with better incremental abilities. In each generation, the neural networks with the parameters codified by the evolutionary algorithms are trained using first the training examples of a particular chunk of data, then using another one and so on. While a neural network is being trained with a particular data chunk, the other chunks are not used in the training. However, each generation uses all the available chunks. The error produced by testing the neural networks with a validation set is used as a fitness measure.
Successful points
The approach is able to cope with new classes of data. Besides, it has shown to outperform Learn++ and a traditional MLP on a benchmark data set from the UCI Machine Learning Repository (Optical Digits) (Newman et al.; 2010).
Problems
Although each neural network is trained with only one data chunk at time, the evolutionary algorithm considers all the chunks in each generation of the evolution. So, even though the networks satisfy all the necessary criteria to incremental learning, the evolutionary algorithm does not.
2.4
Online Ensemble Approaches for Stable Concept Learning
There are several ensemble learning approaches which are proposed to deal with online envi- ronments, but do not have mechanisms specifically designed to handle concept drift. According to Fern and Givan (2003), online ensemble algorithms can be classified as sequential/parallel- generation approaches and single/multiple-update approaches. An algorithm takes a sequential- generation approach “if it generates the ensemble members one at a time, ceasing to update each member once the next one is started”. Otherwise, it takes a parallel-generation approach. An online ensemble algorithm takes a single-update approach “if it updates only one ensem- ble member for each training instance encountered”. Otherwise, it takes a multiple-update approach.
2.4 Online Ensemble Approaches for Stable Concept Learning
present some problems. When the approach is sequential-generation, it is necessary to design a method to determine when each ensemble member should stop being trained, which is a difficult task. When the approach is single-update, it is likely to have slower convergence to the target concept, as only one of the ensemble members is adjusted at each time step. Offline methods such as bagging and boosting have the property that a single training example can contribute to the training of many ensemble members. Such a property is believed to be essential for obtaining a fast convergence to the desired target concept (Fern and Givan; 2003). So, in fact, approaches that are both sequential-generation and single update share some of the problems presented by many incremental learning approaches.
Some approaches originally developed to offline learning might also be claimed for use in online learning, e.g., Boosting by Filtering (Schapire; 1990) and Adaptive Boosting (AdaBoost) (Freund and Schapire; 1996a, 1997). In order to do so, a certain number of examples could be selected to train a first classifier. After that, a second classifier would be created by filtering new examples based on the previous classifier. The same process could be applied to create more classifiers. However, these approaches would be sequential-generation single-update approaches and would discard large amounts of data in the process of drawing the desired distribution of data to each ensemble member. So, they would need a long time to get new training examples that can be used for learning.
Modified Adaptive Boosting (MadaBoost) (Domingo and Watanabe; 2000) is a a modifi- cation of AdaBoost that can be used in the filtering framework without having extremely high execution time. This is achieved by bounding the weight that is attributed to the training exam- ples. In this way, the time necessary for the filter to choose an example to be used for learning is reduced. However, MadaBoost is still a sequential-generation single-update approach. Other examples of sequential-generation single-update approaches are Pasting Small Votes (Breiman; 1999) and Streaming Random Forests (Abdulsalam et al.; 2007).
The problems presented by sequential-generation and single-update approaches are even more serious in the presence of concept drifts if no additional strategy is adopted to handle them. An ensemble which maintains immutable members (sequential-generation) would have difficulties to adapt to new concepts. Similarly, an ensemble in which only one of the members can be trained with each new training example (single-update) would have slow recovery from drifts. Nevertheless, even parallel-generation multiple-update approaches present slow adaptation to new concepts if no specific strategy to deal with concept drift is adopted, as shown in the concept drift literature (e.g., Kolter and Maloof (2003); Gama et al. (2004)) and in section 5.3. In the thesis, we further divide parallel-generation multiple-update approaches into ap- proaches which send the same sequence of training examples to each ensemble member and approaches which send a different sequence. Some of the approaches which send the same se- quence consist of different ways of assigning weights for each ensemble member in an online way, e.g., Winnow (Littlestone; 1988) and Weight Majority (Littlestone and Warmuth; 1994). Others
2.4 Online Ensemble Approaches for Stable Concept Learning
are concerned with finding alternatives to create ensembles with different members when the same sequence of training examples is sent, e.g., Lim and Harrison (2003)’s and Kotsiantis and Pintelas (2004)’s work. Even though these approaches are parallel-generation multiple-update approaches, they have the problem that model diversity has to be built a priori rather than emerging from data itself.
The approaches which send a different sequence of training examples to each ensemble member are basically concerned with how to determine this sequence in an online way, so that model diversity does not need to be built a priori. Many of these approaches are online versions of well established offline approaches. As examples, we can cite online versions of boosting (Oza and Russell; 2001a,b, 2005; Fern and Givan; 2000, 2003) and bagging (Oza and Russell; 2001a,b, 2005; Fern and Givan; 2000, 2003; Lee and Clyde; 2004).
Among these approaches, online bagging was chosen to be used in the online diversity study presented in chapter 5. As further explained in that chapter, one of the reasons for opting to use this approach, instead of online versions of boosting, is that a simple modification including an additional parameter allows us to tune diversity. Besides, this parameter is the only source of diversity apart from the data set itself. So, it is possible to consistently increase or reduce diversity by varying it.
Oza and Russell (2001a,b, 2005)’s, Fern and Givan (2000, 2003)’s and Lee and Clyde (2004)’s online bagging algorithms are very similar to each other and are explained in section 2.4.1. Oza and Russell (2001a,b, 2005)’s and Fern and Givan (2000, 2003)’s online boosting algorithms are also similar to each other and are explained in section 2.4.2.
2.4.1 Online Bagging
Oza and Russell (2001a,b, 2005)’s online bagging is based on the fact that, when the number of training examples tends to infinity in offline bagging (Breiman; 1996a), each ensemble member hm contains W copies of each original training example d, where the distribution of W tends
to a P oisson(1) distribution. Offline bagging’s sampling distribution is M ultinomial(N, 1/N ), for N draws. Online bagging’s sampling distribution is P oisson(N ), for N draws. Oza and Russell (2001b) proves that, as the number of examples N tends to infinity, the probability- generating function of offline bagging’s sampling distribution converges to the same function as the probability-generating function of online bagging’s sampling distribution. That means that they have the same expectation when N tends to infinity.
For that reason, in online bagging, whenever a training example is available, for each en- semble member, a weight W is drawn from a P oisson(1) distribution and the example is learnt in association to this weight. As P oisson(1) provides discrete values, instead of presenting each training example once with a weight W to the base learning algorithm, it can be presented W times, so that the base learner does not need to deal with weights (algorithm 2.1). The
2.4 Online Ensemble Approaches for Stable Concept Learning
classification is done by unweighted majority vote, as in offline bagging. Algorithm 2.1 Oza and Russell (2001a,b, 2005)’s Online Bagging
Inputs: ensemble h, ensemble size M , training example d and online learning algorithm for the ensemble members OnlineBaseLearningAlg.
1: for m ← 1 to M do 2: W ← P oisson(1) 3: for w ← 1 to W do 4: hm ← OnlineBaseLearningAlg(hm, d) 5: end for 6: end for
Output: updated ensemble h.
Fern and Givan (2000, 2003) proposed a very similar online bagging algorithm. The only difference is that it allows the choice between either using a Poisson(1) distribution or associating each example to W = 1 with probability Pu and W = 0 with probability (1 − Pu), where Pu is
chosen by the user.
Lee and Clyde (2004) proposed a lossless2 online bayesian bagging algorithm. The algorithm works in a similar way to Oza and Russell (2001a,b, 2005)’s and Fern and Givan (2000, 2003) ’s, but it draws weights from a Gamma(1, 1) distribution. As the weights provided by Gamma(1, 1) are continuous-valued on (0, 1), the base learning algorithms need to be able to cope with weighted training examples, instead of receiving a certain number of repetitions of training examples.
2.4.2 Online Boosting
Oza and Russell (2001a,b, 2005)’s online boosting works in a similar way to online bagging, but it uses a P oisson(λ) distribution, as shown in algorithm 2.2. The parameter λ associated to a training example d is increased when presented to the next ensemble member if the current ensemble member misclassifies the example (line 19). Otherwise, it is decreased (line 15).
In this way, λ has the same role as the weight of the example d in AdaBoost (Freund and Schapire; 1996a, 1997): the examples misclassified by one stage (one classifier) are given half the total weight in the next stage and the correctly classified examples are given the remaining half of the weight. This can be seen by the following (Oza and Russell; 2001b). According to algorithm 2.2, λsc
m and λswm are the total weights of correctly and incorrectly classified examples
for each base model m. We want to find the factors fmc and fmw that scale λscm and λswm to half
the total weight:
2A lossless online learning algorithm returns a hypothesis identical to that returned by the corresponding
2.4 Online Ensemble Approaches for Stable Concept Learning
Algorithm 2.2 Oza and Russell (2001a,b, 2005)’s Online Boosting Algorithm
Inputs: ensemble h, ensemble size M , training example d = (x, y) and online learning algorithm for the ensemble members OnlineBaseLearningAlg.
1: if d is the first training example then
2: for m ← 1 to M do
3: static λscm ← λswm ← 0 /* initialize the total weights of correctly and incorrectly classified
examples for each base model */
4: end for 5: end if 6: λ ← 1 7: for m ← 1 to M do 8: W ← P oisson(λ) 9: for w ← 1 to W do 10: hm ← OnlineBaseLearningAlg(hm, d) 11: end for 12: if hm(x) == y then 13: λsc m← λscm+ λ 14: ξm ← λ sw m λsc m+λ sw m 15: λ ← λ2(1−ξ1 m) 16: else 17: λsw m ← λswm + λ 18: ξm ← λ sw m λsc m+λ sw m 19: λ ← λ2ξ1m 20: end if 21: end for
Output: updated ensemble h.
λsc mfmc = λsc m+ λswm 2 ⇒ f c m= λsc m+ λswm 2λsc m = 1 2( λscm λsc m+λ sw m ) = 1 2(1 − ξm) λswmfmw= λ sc m+ λswm 2 ⇒ f w m= λsc m+ λswm 2λsw m = 1 2( λswm λsc m+λswm ) = 1 2ξm
As we can see, these are exactly the factors used in lines 19 and 15 of algorithm 2.2.
To use the system, the classification made by the whole ensemble is by weighted majority vote, with weights based on the accuracy of the ensemble members.
Fern and Givan (2000, 2003)’s online boosting operates in a similar way. However, the weights associated to the training examples are based on Arc-x4 (Breiman; 1996b), instead of AdaBoost. They are calculated as 1 + m4
t, where mtis the number of previous hypotheses that