• No se han encontrado resultados

La clasificación de las masas

Figure 5.1: An example of a two-class classifier on linearly separable data, showing the separating hyperplane and the margins between the plane and the closest datapoint in the two classes.

parison to the number of instances. This was the case in the global extraction of corrective feedback instances. The computations will be executed using the provided implementation in the scikit-learn module in python (Pedregosa et al., 2011).

5.2 Global Extraction of Corrective Feedback

For the global extraction of corrective feedback exchanges we compiled a large list of possibly meaningful features to represent the data and used these as input into a support vector machine. The extracted features are described in Section 5.2.1. In Section 5.2.2 we describe how the accuracy of the classifier is monitored. In Section 5.2.3 we present the obtained accuracy scores. These show that corrective feedback in general is too complex of a phenomenon to be classified with the employed means.

5.2.1 Features

The task of choosing which features to extract from the textual data for presen- tation to the classification algorithm is not a trivial one. A lot of information is contained in the exchanges, so we decided to start out by compiling a large list of features before possibly reducing these again according to predictive strength. Considering that the accuracy of the classification algorithm based on these features did not meet our expectations, the intended reduction was abandoned. Also, the extraction of features will not be described in detail here. A survey of the features is presented, and more detail is given in Appendix C.

Establishing the contrast between child and parent utterance

The intuitively most striking element present in all instances of corrective feed- back is that the child and adult utterance use different means to represent the same statement. The two accounts discussed in Section 3.4 give ideas on how the relevant contrast can be established: due to a difference in form despite a converging meaning or converging semantic function. The fact that the parent utterance diverges in form from the child utterance was already represented by our preselection of those utterance pairs which exhibit non-complete overlap.

Semantic similarity between the two utterances was portrayed by computing the distance between semantic representations of each utterance. A semantic vector representation of each word was obtained using word2vec, an implemen- tation which has proven to yield competitive results on semantic similarity tasks (Mikolov et al., 2013a,b). Subsequently, the vectors for single words were com- bined into a vector for the whole utterance using addition. Despite clearly being inaccurate, due to for example commutativity, this method generates a good approximation while additionally being computationally simple (Mikolov et al., 2013b). Distance between the two obtained vectors was computed via two measures: as cosine distance, the standard measure for similarity between se- mantic vector representations dependent on the angle between the two vectors, and as euclidean distance, taking the length of the vectors into account.

Syntactic similarity between the utterances was represented via the tree edit distance of the syntactic dependency trees obtained from the MEGRASP parser. To compute this, the algorithm presented by Zhang and Shasha (1989) was used.

5.2. GLOBAL EXTRACTION OF CORRECTIVE FEEDBACK 51 Features related to added, deleted and exactly matching words The output generated by the CHIP program, concerning added, deleted and ex- actly matching words between the child and adult utterances, presents another rich source of information. First of all, the fraction of added and exactly match- ing words in the adult utterance, and the fraction of deleted words in the child utterance were computed. Next, the part of speech tags of added, deleted and matching words were extracted individually. Finally, also the semantic relations the three sets of words are involved in, as analysed by the MEGRASP parser, were obtained separately for each set.

Overall, four different feature matrices were extracted from the annotated files: with detailed or rough part of speech tags, and with binarised values for the tags and semantic relations or with frequency counts. Subsequently a development set was split from the feature matrix and label vector, to be able to evaluate the accuracy of a predictor with fine-tuned parameters on an independent test set. The test set contained approximately 20% of all instances. Next, features with zero variance were removed, as these do not contribute any information. The same is true for duplicate features, which were also removed.

5.2.2 Training and evaluation setup

The set of features described in Section 5.2.1 together with the correct class labels were used to train a support vector machine. As described in Section 5.1, this results in a two-class classifier. Here, the aim was to distinguish corrective feedback from non-corrective feedback instances.

Accuracy of the classifier was monitored using 5-fold cross validation.4 That is, the input was split into five parts. Following this, the labels for each part were predicted using the other four parts to train the classifier. These predictions were subsequently compared to the actual labels to measure the quality of the obtained prediction. As we are interested in a classifier that correctly selects instances of corrective feedback, the quality was measured via precision, recall and f-score for the corrective feedback class. Features and parameters are fine tuned to increase explanatory power of the prediction.

4The amount of available data is not extremely large and a more finegrained cross validation

To make sure that the final scores report how well the given approach gen- eralises, the set of all instances is split into a development and a test set. The test set is disregarded during the tuning of features and parameters. Finally predictive accuracy is evaluated on this wholly unseen test set. The number of available instances is not very large, hence the test set was picked to be rather small, containing slightly below 20% of all candidate exchanges. The locations from which these instances were taken were randomly selected.

5.2.3 Results

In the first round, the prediction was run using the full matrices described in Section 5.2.1 and without specifying class weights. This resulted in a classifi- cation of all instances as non-corrective feedback. Considering that this class is much larger than the corrective feedback class, this prediction yields compa- rably high accuracy scores. It was thus selected by the classifier, despite not being informative for our purposes.

Next we modified the class weights such that misclassification of corrective feedback instances as non-corrective feedback receives a higher penalty and is dispreferred. The penalty was increased by the factor 1.5, 2, 5 and 10. Addi- tionally, the classes were weighted negatively proportional to their size, which is labeled as ’auto’. In our case this lies close to multiplying the penalty for mis- classification of corrective feedback instances by 5. F-scores for these modified classifiers are presented in figure 5.2. We see that the classifier mildly increases descriptive strength compared to the previous classification of all instances as non-corrective feedback. However, overall, the obtained scores are still too poor to enable meaningful deductions from the obtained classifications.

Finally, we reduced the feature sets according to empirically derived thresh- olds for the variance in the features or the correlation between features and labels. Neither of these approaches resulted in improvements of predictive ac- curacy.

Corrective feedback is a very diverse phenomenon. Additionally, in certain cases the fact that any given exchange contains a corrective reformulation is revealed not only by syntactic or semantic but also pragmatic considerations. We therefore decided not to continue tuning the features on this general classifier

5.2. GLOBAL EXTRACTION OF CORRECTIVE FEEDBACK 53

Figure 5.2: F-score of the classifier for different classweights, for all four different feature matrices.

but instead to constrain the search space and focus on a more clearly contoured phenomenon. This procedure is described in Sections 5.3 and 5.4.