CONCLUSIONES GENERALES
2. Los diferentes ensayos que han permitido evaluar las
Statistical measures are expected to be promising features for identifying MWEs among expressions with consistent behaviour. However, the results in Table 3.2 show that our Context features are more effective in MWE classification when applied over Group 1 and also for the entire data. Using the Context features alone shows statistically significant improvements over
Likelihood+Salience+Freq, with p < 0.05 in Group 1 and p < 0.001 in all data.
Table 3.2: Classification accuracies (%) using different features over Group 1 and the whole data.
Features all data Group 1
Freq 70.77 69.20 Likelihood 72.11 70.64 Salience 73.83 72.81 Likelihood+Salience+Freq 73.90 73.29 Context (word2vec) 75.42* 74.13 Salience + Context 78.40* 80.13* Likelihood+Salience+Freq+Context 76.95* 80.07*
believe that they contain information from external arguments of the verb and the noun constituents of expressions which helps boost classification accuracy. More experiments need to be done to confirm this and to find the best suitable window size for the word context around a target expression.9
We have also trained the logistic regression with the combination of the
Context features and association measures in Table 3.2. According to the results, the combination improves the accuracy of our model in identify- ing idiomatic expressions, especially when applied to the consistent data in Group 1. The results lead us to believe that context features are even more useful in cases where we observe more consistent behaviour in the data and expect the best result from statistical measures. The better performance when using Context and statistical measures together, compared to when we use Context features alone, is also a remarkable observation visible in Table 3.2. This can be explained by the fact that, among all the data, iden- 9We have stablished through trial-and-error that a window size of two after a target
Table 3.3: Classification accuracies (%) over data in Group 2 compared to the majority baseline.
Model Group 2
Majority Baseline 59.52 Logistic regression
63.21 with Context features
Logistic regression
54.37 with Context+Salience
tification of expressions that have skewed distribution of their interpretations (i.e, those which most of the times occur as either MWE or non-MWE) can still benefit from statistical measures as features. The accuracies marked by ∗ are for the cases that we see statistically significant improvement over the
Likelihood+Salience+Freq baseline with p < 0.001.
Table 3.3 shows the results of our model for data from Group 2 compared to the majority baseline. Recall that the data instances in Group 2 are highly unpredictable in their occurrence as MWE or non-MWE. We expect that our supervised model using Context features be able to disambiguate between different instances of an expression. Here, our model (logistic regression with Context feratures) performs slightly better than the informed majority baseline.
Our experiment using the combination of Context and Salience (as the best statistical measure), for training over Group 2 expressions (Table 3.3), shows that the statistical measure is not helpful for the class of ambiguous expressions.
3.6
Summary
In this chapter, we have first described the compilation of datasets for pro- cessing verb-noun MWEs both out of context and in context. Then we have conducted experiments to rank expressions using different traditional statis- tical measures. Furthermore, we proposed a new approach for identifying the usages of idiomatic expressions in context. We applied the approach on the compiled Italian data, as explained in Section 3.3. We compared the results with baseline methodologies and outlined discussions on the experiments. We showed that in order to identify tokens of MWEs more effectively, lexical and syntactic context features derived from vector representations can be combined with traditional statistical measures.
Modelling and Evaluation of Multiword
Expressions in Context
As discussed in Section 2.3.2, automatic identification of Multiword Expres- sions (MWEs) in running text has recently received considerable attention from researchers in computational linguistics. In this chapter, we first dis- cuss the two main approaches to framing the task of predicting MWEs in context: classification and tagging. We investigate why classification is more suitable than tagging for modelling MWEs in our data. Furthermore, the wide range of reported results for the task in the literature has prompted us to take a closer look at the algorithms and evaluation methods. We focus on the importance of train and test splitting and the distribution of expression types in validating the results, and propose an alternative method to perform train and test splitting.
4.1
Modelling MWE Identification
The focus of our study is on token-based identification of MWEs. The most evident solution is to go through the running text and tag any two or more words where the co-occurrence conveys idiomatic interpretation. However, it is not always feasible to traverse the whole of a large corpus. For this reason
we have gathered a specialised dataset of concordances of particular expres- sions as presented in Chapter 3. This dataset is a collection of sequences of words, each of which includes one instance of verb-noun expression to be categorised as literal or idiomatic. This problem can straightforwardly be framed as a classification task where the input is the collection of features extracted from sequences, i.e. the target expressions along with their con- texts, and the output indicates whether the target expression is literal or idiomatic.
For manual evaluation, the difficulty of traversing the whole corpus is an obvious limitation. However, machine learning algorithms facilitate efficient investigation of each and every word in a corpus, and the resulting trained models tag sequences accordingly based on sequence labelling methodolo- gies. Recent studies on token-based identification of MWEs are heading to- wards using structured sequence tagging models. Conditional Random Fields (CRF) in the work of Constant et al. (2013), and structured perceptron in the work of Schneider et al. (2014a) are two outstanding examples.
While most of the recent work on token-based identification of MWEs apply sequence tagging approaches with the so-called IOB labelling, Legrand and Collobert (2016) frame the problem as classification. They propose a neural network based model which is able to classify representations as MWE or not by learning fixed-size representations for arbitrary sized chunks. They have shown better performance in MWE identification than the CRF based approach in Constant et al. (2013).
The choice of the model based on the data is an important issue. Our data includes occurrences of specific verb-noun expressions with the context around them. This makes it possible to have sizeable datasets annotated for a specific type of MWE, enabling a more extensive evaluation. We design an experiment to see whether our task can benefit from sequence tagging compared to sequence classification. Specifically, we compare the results of a CRF tagger with a simple Na¨ıve Bayes Classifier (NBC) in predicting the idiomaticity of the expressions. The idea behind our feature representation is similar to the model described in Chapter 3, with the difference that for CRF and NBC, we consider simple word forms (rather than the vectors) of the verb, the noun, and the two words after, as lexical context features. The experiments and results are further reported and discussed in Section 4.4.1. Having observed and discussed the benefits of modelling the task as clas- sification and in accordance with our proposed approach in Chapter 3, we continue to further develop and train models on our data using classification.