Reacciones de transferencia intramolecular de carga
5.3 Transferencia intramolecular de carga en aminobenzonitrilos flexibles.
The common used approaches in NLP for WSD are the supervised approaches which classify the senses of words in context from sense-annotated corpora. These approaches provide better results in disambiguating the senses of words more than the unsupervised WSD approaches (Agirre and Edmonds, 2006; Navigli, 2009). Some of the highest-performing approaches of supervised WSD are Naïve Bayes (e.g. Mooney, 1996), neural networks (e.g. Towell and Voorhees, 1998) and k-nearest neighbor (e.g. Ng, 1997). These approaches are discussed in more detail below.
2.2.4.2.1 Naïve Bayes
The Naïve Bayes (Domingos and Pazzani, 1997; Mitchell, 1997) classifier has been widely utilised for disambiguating the senses of ambiguous words in text. It is based
on calculating the conditional probabilities for each sense of a word given a set of features (POS tags, neighbouring words, positions of words etc) in context. Naïve Bayes assumes that all features are conditionally independent given the word sense. The frequent sense that has the most features occurrence is normally chosen to be the appropriate sense in context for an ambiguous word.
Many researchers have utilised Naive Bayes classifier for disambiguating the meanings of ambiguous words in text. For instance, Mooney (1996) employs Naive Bayes and some other learning methods for the problem of disambiguating the senses of words in context. The performance of the applied models is evaluated on the line corpus (Leacock et al., 1993). Results showed that the Naive Bayes classifier has achieved better accuracy than the alternative methods such as decision trees (Quinlan, 1993) and rule based techniques (Michalski, 1983) for all training set sizes. A study addressed by Pedersen (2000) utilises Naive Bayes classifiers for WSD. The approach used in this study combines a number of Naive Bayes classifiers into an ensemble, such that each classifier is based on the co–occurrence of lexical features in a variety of context window sizes. The specified lexical features determine if a given word appears within some number of surrounding words. Pedersen's approach has been evaluated by using 5 fold cross validation4 on the line data and the interest data (Bruce
and Wiebe, 1994). It has achieved a high accuracy in disambiguating the senses which is around 88% and 89% for each data respectively.
2.2.4.2.2 Neural Networks
The neural networks (NNs) (McCulloch and Pitts 1943; Mitchell, 1997) model is constructed from a set of connected input/output units (artificial neurons) that exploits a computational model for information processing. The classification task in neural networks for WSD is based on providing the features as an input to the learning model. Then, these features are used for splitting the training contexts into non overlapping sets related to the desired responses. By processing the data, the weights of units are adjusted to produce the desired response by the output unit that has a greater activation than any other output unit.
Several studies have applied neural networks for the WSD problem. For instance, the
study presented by Towell and Voorhees (1998) develops a neural network classifier for using the semantic information available in lexicons such as WordNet to improve the performance of IE systems. The developed classifier learns independently the topical and local context features of a given target word from a set of sense-annotated example sentences. Then, it combines these features into contextual representations for WSD. The classifier has been evaluated on three data: the noun line data, the verb serve data and the adjective hard data. The experiments showed that it has achieved a high accuracy which is around 87%, 90%, and 81% for each data respectively. V´eronis and Ide's (1990) approach constructs a neural network model from the definition texts available in a machine readable dictionary. This approach assigns each word to its correct sense and the senses are connected to the words appearing in their textual definitions.
2.2.4.2.3 K-Nearest Neighbor
The k-nearest neighbor (k-NN) (Mitchell, 1997) algorithm is one of the most used methods in WSD. It is built from examples, where each example has a set of feature values. The classification of new example is based on estimating the distance between the new example and the stored examples. The set of the closest examples is selected. Then, the new example is attached to the class (i.e. sense) that assigned to the most examples within the set.
The study presented by Escudero et al. (2000) compares two supervised learning methods for WSD: Naïve Bayes and exemplar–based classifiers. In their approach, the implementation of the exemplar-based classifier is based on the k-nearest neighbor algorithm. They tested the applied learning methods on the DSO corpus (Ng and Lee, 1996), which includes 192,800 sense-annotated tokens of 191 words (i.e. 121 nouns and 70 verbs). The experiments showed that the performance of the exemplar–based approach outperforms the performance of the Naïve Bayes classifier. Ng (1997) suggested an exemplar-based learning approach called PEBLS for WSD. The PEBLS algorithm is an implementation of the k-nearest neighbor algorithm. It has been used for determining the best k (number of nearest neighbors) to use for disambiguating a word in a specific training data. Ng compares the exemplar-based classifier to the Naïve Bayes algorithm, where both algorithms have been tested on a large sense- annotated corpus (Ng and Lee, 1996). By using 10 fold cross validation, the
experimental evaluation of the algorithms performance showed that the exemplar- based algorithm has achieved higher disambiguation accuracy than the Naïve Bayes algorithm.