Santana I. y cols (2009) en Brasil, evaluaron la influencia de la temperatura en la resistencia flexural y dureza, realizaron un estudio invitro en el cual
2.1 Planteamiento de Problema
In order to compare the performance of the model when pretrained on in-domain versus out-of-domain data, we infer the vectors for the two datasets using a DBOW model trained by Lau and Baldwin (2016). This model was trained on the English Wikipedia: each paragraph was treated as a separate document, resulting in 32M documents. We compare the answer ranking performance of the models where the vectors were inferred using in-domain (L6 Yahoo! Answers for the YA dataset, and Ask Ubuntu data dump for the AU dataset) versus the ones that used the vectors inferred using the model trained on Wikipedia.
Yahoo! Answers Model Corpus P@1 MRR DBOW in-domain 36.45† 56.13 DBOW Wikipedia 35.61 56.05 Jansen et al. (2014) 30.49 51.89 Fried et al. (2015) 33.01 53.96 Random baseline 15.74 37.40 CR baseline 22.63 47.17 Ask Ubuntu Model Corpus P@1 MRR DBOW in-domain 41.48* 64.33 DBOW Wikipedia 40.20 63.57 Random baseline 26.60 53.64 CR baseline 35.36 60.17 Chronological baseline 37.68 60.06
Table 5.5: Comparison of the MLP performance using the DBOW representations in- ferred using a model trained on in-domain data versus the one trained on Wikipedia. †On the YA dataset, the improvement is not statistically significant (p >0.05). *On the AU dataset, the improvement is statistically significant (p < 0.05).
Table 5.5 reports the results. The representations inferred using the model pre- trained on an in-domain corpus provide better answer ranking results for both datasets. On the YA dataset, the improvement of the model pretrained on in- domain data over the model pretrained on Wikipedia is not significant, however, it
like the one of the AU dataset, it is more important to have in-domain data for pretraining. Even though the performance of the models that use vectors inferred with the model trained on Wikipedia is not as high as when in-domain data is used, they still outperform all the baselines, even for the AU dataset, which is highly technical. This result suggests that a pretrained general purpose Paragraph Vector model could be used to infer vectors for answer ranking.
5.3
Summary
Our general approach to answer ranking requires vector representations of question- answer pairs. In this chapter we used general purpose distributed document repre- sentations provided by Paragraph Vector models to represent question-answer pairs. The main findings of our experiments are:
- representing the question-answer pair with Paragraph Vector model is clearly superior to the use of averaged word vectors;
- the use of the DBOW model is more favourable than the DM model in the task of answer ranking, especially when inferring the representations using a pretrained model;
- a smaller amount of unlabelled data taken from a similar source as the dataset is more useful for training representations than a larger out-of-domain set. In the experiments reported in this chapter we did not perform an extensive hy- perparameter search. Although the results could potentially be improved by better hyperparameter tuning, it is clear that the Paragraph Vector provides document representations suitable for the task of answer ranking.
Chapter 6
Learning Representations for
Answer Ranking
In the last chapter we performed answer ranking using the pretrained general pur- pose Paragraph Vector representations of Le and Mikolov (2014) for questions and answers. In this chapter instead of using pretrained representations, we learn them together with the task itself. To do this, we use two encoder networks that are trained together with the final MLP predictor. In contrast to the approach based on the Paragraph Vector model, learning the representations together with the rank- ing does not require pretraining and allows us to learn from the training set only. However, the models we describe here can be pretrained, and we will explore this in Chapter 7.
We use recurrent and convolutional neural networks (described in Chapter 2) as the encoder, i.e. the network that converts an object, which in our case is, a question or an answer, into a fixed-length vector. We compare the answer ranking performance of two widely used RNN architectures, the Long Short Term Mem- ory networks (Hochreiter and Schmidhuber, 1997) and the Gated Recurrent Net- works (Cho et al., 2014b) (usually abbreviated as GRU for Gated Recurrent Unit). We also compare the recurrent neural networks with the convolutional networks for the purposes of encoding questions and answers for answer ranking and propose a
novel architecture that combines the benefits of the two types of encoders.
This chapter is structured as follows: Section 6.1 introduces an approach to an- swer ranking that uses recurrent neural networks to encode questions and answers. Section 6.2 explores an approach where convolutional neural networks are used in- stead of recurrent ones. In Section 6.3 we describe our Multi-Channel Recurrent Convolutional Neural Network (MC-RCNN), a novel architecture for text encoding, that combines the recurrent and the convolutional architectures, and evaluate it on the task of answer ranking. Finally, we summarise the results and draw conclusions in Section 6.4.
6.1
RNN Encoder for Answer Ranking
We follow Bahdanau et al. (2014) and Cho et al. (2014b), and use a bidirectional1 RNN as an encoder, i.e. a network that learns fixed-length vector representations of objects. Given a question-answer pair, we use two separate RNNs with either an LSTM or a GRU cell to encode the question and the answer. Let (wwwq1, wwwq2, ..., wwwqk) be the sequence of question word embeddings and (wwwa
1, wwwa2, ..., wwwap) be the sequence of answer word embeddings. The first RNN encodes the sequence of question words into the sequence of context vectors (hhhq1, hhhq2, ..., hhhqk), i.e.
fRN Nq (wwwqi, θθθq) =hhhqi (6.1)
where θθθq denote the trainable parameters of the network. The bidirectional RNN consists of two RNNs: the forward RNN that reads the question starting from the first word until the last word and encodes it as a sequence of forward context vectors (h→hh−q1,hh→−hq2, ...,hh−→hqk), and the reverse RNN that encodes the question starting from the last word until the first word: (hh←h−qk,hh←−−hqk−1, ...,←hhh−q1). The resulting context vectors are concatenations of the forward and reverse context vectors at each step, i.e.
1We initially experimented with a unidirectional RNN too, and the bidirectional was clearly
hhhqi = [h−→hhqi,←hhh−qi].
As the encoded vector representation of the question, we use the concatenation of the last context vector of the forward RNN, i.e. corresponding to the last word, and the last context vector of the backward RNN, i.e. corresponding to the first word, as is usually done in the encoder-decoder architecture (Bahdanau et al., 2014):
encencencq = [−→hhhkq,←hhh−q1] (6.2)
The second bidirectional RNN encodes the answer in the same way:
fRN Na (wwwai, θθθa) =hhhai (6.3)
enc
encenca= [h−→hhpa,hh←h−a1] (6.4)
whereθθθa denote the trainable parameters of the network. Figure 6.1 illustrates the RNN-MLP system.