VALORES Y POLÍTICA EDUCATIVA.
2.2. Legislación educativa y modelo escolar 1 Valores en las leyes educativas.
We model each word from each meeting using a Gaussian Mixture Model (GMM) with K components, where each component ci represents a context word for the target word being modeled. We define a context word as a word which co-occurs with the target word in one or more sentences of the meeting. Thus, we obtain the following equation density function fw:
fw(~x) = K X i=1 pw,iN ( ~µw,i,σw, i) = K X i=1 pw,i Æ2π|σw,i| e−12(~x− ~µw,i)T P−1 w,i(~x− ~µw,i) (7.1)
117 7.2 Methods
where pw,i is the probability of a component modeling a certain context with K
P
i=1
pw,i = 1, µw,i is the probability of the position of the ith component and σw
models the uncertainty of the context. We learn the parameters of G M Mi,n for
word i from meeting n to compute probability of each context of each word in each meeting. This will lead to computing pw,i,nwhich is the probability of word
win its ithcontext from the nth meeting.
Finally, we formally define the CorrAv effect as follows: given the topics of your last n consecutive meetings, we would like to predict which topic continues in the(n + 1)thmeeting. Lets say we have a vocabulary v of all the words occur- ring in the first n meetings. We construct a word vector containing the average probability of presence of all words in each of their contexts in v. Finally, we use the following equation to compute the aggregate probability of word w in a certain context under the CorrAv effect:
Pw,c = N X n=1 X wi∈v BM25w,n∗ P(wi,c,n) (n) (7.2)
where n is the meeting sequence number, P(wi,c,n) is the probability of word w
under component c (i.e., context c), derived from the nthmeeting. The resulting constructed word vector is an average representation of probability of all words present in all n meetings. Finally, BM 25w,nis the weight of the word w in meeting
ncomputed using the probabilistic algorithm BM25[112] in each of the previous meetings.
Using Equation 7.2, we learn a reference vector for each word in each context across all previous meetings. As it will be explained later in Section 7.3.1, our dataset contains LDA topics extracted from the first n meetings whose continu- ation in the(n + 1)th meeting should be predicted. For each given topic whose
continuation should be predicted, we construct a new word vector derived from the reference vector. That is due to the fact that in the reference vector we have computed each word in all its contexts. However, words have different mean- ings or contexts, and an LDA topic model puts together words that often reflect the same context. Therefore, we have to select relevant contexts for each topic whose continuation is being predicted. We search through each given LDA topic and identify context words of a target word that have the highest probability in the LDA topic. Subsequently, from the reference vector we add the probability of each word in its identified topic to the new vector that we construct. We name this new vector the Vec t ort where t refers to the LDA topic whose continuation should be predicted, meaning that for each topic t we construct a unique vector
118 7.2 Methods
of contextual words. We emphasize again that Vec t ort contains all the words in v with the difference that if a word w is present in topic t we use the con- text probability of the word which best suites the context of t, and otherwise we assign it the highest context probability.
The last step is to compare a Vec t ort with a topic t to make the prediction. Other previous work [94] have used element-wise dot product of word vector distributions as an energy function which would show how similar two vectors are and could be used for predictive tasks. However, we take a different approach by computing the correlation between two vectors as a measure of similarity. This is inspired by the underlying belief in topic models and word embeddings that words have certain meanings in certain contexts. Thus, looking at the words co-variance can better capture a context similarity or difference.
As a result, we compute the Pearson correlation of Vec t ort with the topic t to make the prediction. The result of this step, is a ranked list of all of the t topics from the first n meetings based on this correlation value. According to the CorrAv effect, the topics highly correlating with the average probability of all words are those that are most likely to continue in the(n + 1)th meeting, hence the naming. We use the Pearson correlation metric defined as PX,Y =
COV(X ,Y )
σXσY ,
where PX,Y is the Pearson correlation of two populations X and Y , COV is the
co-variance and σ is the standard deviation. Pearson correlation is a measure which captures the linear dependence between two populations X and Y , and returns a correlation score ranging from ’-1’ to ’1’. In particular, it returns ’1’ if the two populations are identical, ’0’ if the two populations have no correlation, ’-1’ if the two populations are uncorrelated. In our use case, the Pearson correlation can capture the dependence between a topic t and a learned Vec t ort.
Finally, we require a threshold for accepting a topic as a continued topic which needs to be reviewed. To determine an effective threshold we use n-fold cross validation. That is, we iteratively leave out each set of four meetings (for a single group) and compute the threshold which minimizes the Mean Squared Error (MSE) of prediction on the remaining folds. Then, using the computed threshold, we evaluate the left-out fold.