• No se han encontrado resultados

CAPITULO II: EL PROBLEMA, OBJETIVOS, HIPOTESIS Y VARIABLES

2.2 Finalidad y Objetivos de la Investigación

We propose to enrich the neural model presented in Section 6.1 with additional features. In the systems described in the previous chapter, the encoded vectors produced by an RNN, a CNN or an MC-RCNN are concatenated and passed to an MLP. We suggest concatenating these encodings with additional external features, e.g. discourse features, and then passing them to the MLP. More specifically, the score is now predicted by the following function (compare to Equation 6.5).

y=fM LP([encencencq, encencenca, xxxext], θθθs) (7.1)

where xxxext is the vector of additional features. Figure 7.1 illustrates the model with the additional discourse features. An RNN, a CNN or a MC-RCNN can be

Figure 7.1: Illustration of the architecture that incorporates additional features. The question encoder can be, for instance, an RNN, a CNN or a MC-RCNN. As the additional features, in our experiments we use the features produced by the discourse marker model presented in Jansen et al. (2014), however, other features can also be incorporated.

used as a question and an answer encoder. In the following sections we present experiments where the model is enriched with discourse features produced by the discourse marker model of Jansen et al. (2014).

7.2.1

Discourse Features

Based on the intuition that modelling question-answer structure goes beyond sen- tence level, Jansen et al. (2014) propose an answer ranking model based on dis- course markers combined with lexical semantic information. We inject the features produced by their discourse marker model (DMM) combined with their lexical se- mantics model (LS) into the neural system we described in previous chapters. The DMM model is based on the findings of Marcu (1997), who showed that certain cue phrases indicate boundaries between elementary textual units with sufficient accu- racy. These cue phrases are further referred to as discourse markers. For English, these markers include by, as, because, but, and, for and of – the full list can be

Figure 7.2: Feature generation for the discourse marker model of Jansen et al. (2014): first, the answer is searched for the discourse markers (inbold). For each discourse marker, there are several features that represent whether there is an overlap (QSEG) with the question before and after the discourse marker. The features are extracted for sentence range from 0 (the same range) to 2 (two sentences before and after).

found in Appendix B in Marcu (1997).

We illustrate the feature extraction process of Jansen et al. (2014) in Figure 7.2. First, the answer is searched for discourse markers. Each marker divides the text into two arguments: preceding and following the marker. Both arguments are searched for words overlapping with the question. Each feature denotes the discourse marker and whether there is an overlap with the question (QSEG) or not (OTHER) in the two arguments defined by the marker. The sentence range (SR) denotes the length (in sentences) of the marker’s arguments. For example,QSEG by OTHER SR0means that in the sentence containing theby marker there is an overlap with the question before the marker and there is no overlap with the question after the marker. This results in 1384 different features. To assign values to each feature, the similarity between the question and each of the two arguments is computed, and the average similarity is assigned as the value of the feature. Jansen et al. (2014) use cosine similarity over tf.idf and over the vector space built with a skip-gram model (Mikolov et al., 2013b).

7.2.1.1 Results: discourse features

In Table 7.2 we report the results for the systems enhanced with the discourse features and the discourse features on their own with an MLP (MLP-Discourse). The MLP-Discourse outperforms the random and the CR baselines for both datasets. It

also perform better than the approach of Jansen et al. (2014) who used SVMrank with a linear kernel. This might be due to the ability of the MLP to model non-linear dependencies. However, this model’s performance is below the RNN, the CNN and the MC-RCNN performances on their own without any external features.

The inclusion of the discourse features improves the performance of the LSTM, the GRU and the CNN encoders on both datasets. However, on the YA dataset, the improvements are not statistically significant. On the AU dataset, the improvements are statistically significant with p < 0.05. The improvement is especially notable when the CNN encoder is used on the AU dataset. The CNN encoder performed poorly on its own on this dataset, but its performance was drastically improved by inclusion of the discourse features. This is possibly because the discourse information helps the CNN to overcome its inability to account for long-term dependencies.

The performance of the MC-RCNN is improved by the inclusion of the discourse features only when the LSTM cell is used, and the improvement is not statistically significant. The MC-RCNN does not seem to benefit from the discourse information, suggesting that the discourse features do not provide any extra information that is not captured already by the model. However, this may also mean that some sort of feature normalisation is required, e.g. normalising the discourse vector and the output of the encoder separately and then, perhaps, normalising the concatenation. Manual error analysis shows that the improvement brought by the discourse features to most models is due to a better handling of the questions with long answers. In certain cases, where the best answer is relatively long, the RNN model assigned a higher score to a shorter answer.

Documento similar