PROBLEMÁTICAS Y ESTRATEGIAS DE LEÓN
II.2. PROYECTOS Y PROGRAMAS LEÓN
9. TICs e información al turista Planear zonas con acceso a WIF
The most recent development in outlier detection technology is hy- brid systems. The hybrid systems discussed in this section incorporate algorithms from at least two of the preceding sections (statistical, neu- ral or machine learning methods). Hybridisation is used variously to overcome deficiencies with one particular classification algorithm, to
exploit the advantages of multiple approaches while overcoming their weaknesses or using a meta-classifier to reconcile the outputs from multiple classifiers to handle all situations. We describe approaches where an additional algorithm is incorporated to overcome weaknesses with the primary algorithm next.
The MLP described in section 3 interpolates well but cannot extrap- olate to unseen instances. Therefore, Bishop (Bishop, 1994) augments his MLP with a Parzen window novelty recogniser. The MLP clas- sifies new items similar to those already seen with high confidence. The Parzen window provides an estimate of the probability density to induce a confidence estimate. Any completely novel instances will lie in a low density and the MLP prediction confidence will be low. Parzen windows are kernel-based algorithms which use various kernel functions to represent the data. Bishop uses one Gaussian kernel (as defined by equation 8) per input vector with the kernel centred on the attribute values of the vector. The Parzen window method can provide a hard boundary for outlier detection by classifying new exemplar as normal if they belong to a kernel and an outlier otherwise. It can also provide a soft boundary by calculating the probability that the new exemplar belongs to a Gaussian kernel. Nairac et al. (Nairac et al., 1999) similarly incorporate a k-means module with the MLP mod- ule in an aircraft engine fault diagnosis system. The k-means module partitions the data and models graph shape normality. It can detect anomalies in the overall vibration shape of new signatures and the MLP detects transitions within the vibration graphs through one-step ahead prediction described in section 3.
Smyth (Smyth, 1994) stabilises the output from an MLP to monitor space probe data for fault detection using a Hidden Markov Model (HMM) as the MLP is susceptible to rapid prediction changes many of which are extraneous. The approach uses time-series data, a snapshot of the space probe transmitted periodically with equal intervals between transmissions. The feed-forward MLP trained using back propagation aims to model the probe and predicts the next state of the probe. How- ever, the MLP has a tendency to switch between states at an artificially high rate so Smyth uses a HMM to correlate the state estimations and smooth the outputs producing a predictor with high accuracy. Hollmen & Tresp (Hollmen and Tresp, 1999) incorporate hierarchical HMMs with EM which helps determine the parameters for time series analysis in the task of cell-phone fraud detection. The authors use the HMM to predict one-step ahead values for a fraud variable in the time series given the current and previous states of a user’s account and the EM
algorithm is used to provide optimal parameter settings for the HMM to enable accurate one-step ahead prediction.
Hickinbotham & Austin (Hickinbotham and Austin, 2000) introduced a strain gauge fault detection system to detect gauges which are not reading correctly. The system assimilates Gaussian Basis Function Net- works (GBFN) and Principal Components. The technique produces a Frequency of Occurrence Matrix (FOOM) from data taken from aircraft strain gauges operating under normal conditions. The FOOM represents the normal conditions expected during a flight with readings weighted according to their likelihood of occurrence. The approach extracts two unlikeliness features and two Principal Components from the FOOM and feeds them into a series of GBFN, each trained with different parameter settings. The system selects the GBFN with high- est classification accuracy for the data and uses this to classify new FOOMs by thresholding for novelty. A high novelty reading indicates a strain gauge error. Hollier & Austin (Hollier and Austin, 2002) have incorporated a further component into the strain gauge system to detect another form of strain gauge error not covered by the previ- ous approach. They incorporate auxiliary data comprising a vector of accelerations encountered during a flight. Hollier & Austin find the maximally correlated components between the FOOM and the mean and first principal component of the auxiliary data. By establishing a relationship between the auxiliary data and normal FOOMs, they can detect corrupted FOOMs when the relationship breaks down.
Authors are beginning to examine the possibilities of ensemble classi- fiers where a number of different classification algorithms are used and the results from each classifier amalgamated using a suitable metric. From the preceding discussion, it is apparent that all classifiers have differing strengths and weaknesses so a combined approach can exploit the strengths of each classifier while mitigating their weaknesses. The JAM system (Java agents for meta-learning) being developed at Columbia University (Stolfo et al., 1997) incorporates five machine- learning techniques into a single modular architecture. JAM uses meta- learning to assimilate the results from the various modules and produce a single system output. JAM modules are based on the ID3 decision tree (Quinlan, 1986), its successors CART (Breiman et al., 1984) and C4.5 (Quinlan, 1993), Ripper (a rule based learner (Cohen, 1995)) and a naive Bayes classifier. In experiments, Prodromidis and Stolfo (Prodromidis and Stolfo, 1998) demonstrated that these classifiers were complimentary, each performing well on some data sets and compara-
tively poorly on other data sets. By identifying which approaches excel on which data sets and under which conditions the meta-learner can assess the degree of accuracy and confidence for the outputs from the various classifiers and select the appropriate technique for the data. Ensemble classifiers also permit sub-division of the training data with individual classifier modules trained on subsets of the data. By using multiple data models, JAM mitigates for any loss of accuracy in any one model due to data subdivision. It allows parallel training and clas- sification with each module classifying a new instance in parallel and the meta-learner metric assimilating the results.
Brodley and Friedl (Brodley and Friedl, 1996) also use an ensemble approach for identifying misclassified instances in a training set of pixels from satellite images of land where each pixel needs to be classified as for example, grassland, woodland etc. They use three classifiers: a decision tree, a nearest neighbour and a linear machine. The system combines the outputs from the three classifiers using consensus voting. Consensus voting is a more conservative approach than the widely used majority voting. In consensus voting, the system only eliminates an instance from the data set if all three classifiers identify it as an outlier whereas a majority voting system eliminates an instance if the majority of classifiers identify it as an outlier. Consensus voting favours false negatives over false positives with respect to identifying outliers. The three classifiers each use a different distribution model so if all three agree that a point is an outlier then the user can be confident about the decision.
Combining multiple classifiers must be done judiciously. An ensemble should obey Occam’s Razor, it should be as simple as possible with minimal redundancy, as superfluous modules waste resources, increase complexity and slow processing. Paradoxically, identifying the best combination of classifiers is a combinatorial problem.