10. Un mundo: El servicio debe traspasar barreras y estar disponible para todo el mundo.
3.3 MICROSOFT: HARDWARE Y SOFTWARE
There are many symbolic and ANN learning algorithms [Sestito 1994] that address the same problem of learning from a classified set of examples, and yet to categorise the strengths and weaknesses of them all would not be possible here.
However, Shavlik et. al. (1991) have compiled an experimental comparison of three algorithms that have been performed on five large, real world data sets. Shavlik et. al. compared the ID3 [Quinlan 1996] symbolic learning algorithm with the perceptron [Hecht-Nielsen 1989], [Picton 1994] and BP [Rumelhart 1986] learning algorithms. All three systems were tested on five large data sets, namely soybean, chess,
audiology, heart disease, and NETtalk data. Four of these data sets were previously
used to test different symbolic learning systems, and one was used to test BP. Shavlik
et. al determined that BP performs slightly better overall than the other two
algorithms in terms ol classification accuracy on new examples, but it takes much longer to train. One encouraging suggestion from their experiments indicated that BP performs slightly better on data sets containing continuous (i.e. numerical) data. This was a key point that supported the selection of a BP MLP network in the chemical species modeling in the first place for the work of this thesis.
Shavlik et. al. performed an empirical analysis to address three issues, namely (i) the amount of training data; (ii) imperfect training examples; and (iii) encoding of the desired outputs. BP was only slightly superior to the other two systems when given a relatively small training data set. It was also able to cope with noisy or incomplete data better than ID3. Also, BP is best at utilising a 'distributed' output encoding.
Towel 1 et. al (1994) have employed Knowledge Based Artificial Neural Networks (KBANN) as a more effective hybrid learning system built on top of connectionist learning techniques. The essence of a KBANN is that it maps problem-specific domain theories that are represented in propositional logic into ANN's, and then refines the reformulated knowledge using BP. It effectively combines a hybrid of learning algorithms. Towell et. al. successfully evaluated KBANN via empirical testing on two molecular biology problems. The rules extracted from their empirical test were more accurate, more superior and more humanly comprehensible to those rules generated from refining symbolic methods or techniques that extracted rules from trained ANN's.
Piechelt (1995) has provided some benchmarks for ANN learning algorithms since new rule extraction learning algorithms are being developed to explicitly extract knowledge embedded in trained network models. There are four essential requirements that will improve how new learning algorithms can be categorised. Namely, (1) volume - must use several problems to broaden its applicability; (2) validity: common errors invalidating the results must be avoided; (3) reproducibility - propei documentation to make it reproducible (4) comparability - have a direct comparison with the results achieved by others using different algorithms if possible. These are important points that have been addressed in this thesis where it was feasible. The lour essential requirements are the premise for categorising recent rule extraction techniques developed later in this review.
Recent techniques to improve the generalisation performance of ANN's has been explored by Agyepong et. al. (1997) and Setiono et. al. (1997). Their techniques aimed to enhance the capabilities of the trained networks to either classify data in a different data set, or to provide a premise for rule extraction.
Agyepong et. al. (1997) have investigated the effects of including selected lateral connections in a feedforward ANN architecture in an attempt to control the hidden layer capacity of the network. Their method facilitated the controlled role assignment and specialisation of hidden layer units. Essentially the network behaved like a network growing algorithm without the explicit need to add hidden units, and acted like soft weight sharing [Hinton 1986] due to functionally identical units in the centre
of the hidden layer. The selective specialisation of hidden units properties were illustrated using one classification and one function approximation problem. The improved generalisation of the network was illustrated through a simple function approximation example and with a real world data set (yearly sun spot data set).
Some learning algorithms will employ a feature selection process as an integral part of their makeup. A machine learning method is likely to have a higher predictive accuracy it it can select only the relevant data attributes from a data set that contains irrelevant and redundant attributes. I his is the essence of feature selection in that it can screen out redundant information. As the dimensionality of the input data space grows, the difficulty of building effective pattern recognition or nonlinear mapping systems increases significantly. I here fore, pre-processing of a large input data space to extract relevant features can be very useful.
There are many advantages involved in using only relevant features of the data to be classified. These include (i) overfitting of data is reduced and so the classifier has bettei predictive capabilities; (ii) once relevant features are identified, the cost of future data collection can be reduced; (iii) excluding irrelevant attributes means a simpler classifier is obtained, and the time required to classify new patterns can be reduced.
Setiono et. al. (1997) have developed a network pruning algorithm that performs feature selection using a three layer feedforward ANN. By adding a penalty term to the error function of the network (i.e. their augmented error function), the redundant connections can be distinguished from the relevant connections by their small weights, once network training is completed. A simple criterion (based on the network accuracy rate) was developed to remove a redundant attribute. After removal of the attribute, the network was retrained, and the selection process was repeated until no attribute meets the criterion for removal.
I he method was tested on four real world problems and two artificial problem data sets. The experimental results showed that the algorithm removed a large number of attributes from the original attribute sets, and improved the predictive accuracy of the ANN's. This suggested that Setiono et. aVs method works very well on a variety of classification problems. Their method was not suitable, however, for the spectral modeling of OES data to process parameters presented in this thesis, since the number °f input attributes employed were already relatively small. The small vector of input
attributes were significant data to incorporate into training of the ANN model. Once the network had been trained then the extraction of information would ultimately relate OES spectral line size to all six input process parameters. Extracting definitive relationships from the trained species ANN model was the most discernible technique implemented in this thesis.