1. ASPECTOS GENERALES DE LA INVESTIGACIÓN
2.2 Marco teórico
2.2.1 Implementación de políticas públicas
The skip-gram model is trained to predict the surrounding words given the current word. To understand how the skip-gram word2vec model works, consider the following example sentence:
I love green eggs and ham.
Assuming a window size of three, this sentence can be broken down into the following sets of (context, word) pairs:
([I, green], love) ([love, eggs], green) ([green, and], eggs) ...
Since the skip-gram model predicts a context word given the center word, we can convert the preceding dataset to one of (input, output) pairs. That is, given an input word, we expect the skip- gram model to predict the output word:
(love, I), (love, green), (green, love), (green, eggs), (eggs, green), (eggs, and), ...
We can also generate additional negative samples by pairing each input word with some random word in the vocabulary. For example:
(love, Sam), (love, zebra), (green, thing), ...
Finally, we generate positive and negative examples for our classifier:
((love, I), 1), ((love, green), 1), ..., ((love, Sam), 0), ((love, zebra), 0), ...
We can now train a classifier that takes in a word vector and a context vector and learns to predict one or zero depending on whether it sees a positive or negative sample. The deliverables from this trained network are the weights of the word embedding layer (the gray box in the following figure):
The skip-gram model can be built in Keras as follows. Assume that the vocabulary size is set at 5000,
the output embedding size is 300, and the window size is 1. A window size of one means that the
context for a word is the words immediately to the left and right. We first take care of the imports and set our variables to their initial values:
from keras.layers import Merge
from keras.layers.core import Dense, Reshape from keras.layers.embeddings import Embedding from keras.models import Sequential
vocab_size = 5000 embed_size = 300
We then create a sequential model for the word. The input to this model is the word ID in the vocabulary. The embedding weights are initially set to small random values. During training, the model will update these weights using backpropagation. The next layer reshapes the input to the embedding size: word_model = Sequential() word_model.add(Embedding(vocab_size, embed_size, embeddings_initializer="glorot_uniform", input_length=1)) word_model.add(Reshape((embed_size, )))
The other model that we need is a sequential model for the context words. For each of our skip-gram pairs, we have a single context word corresponding to the target word, so this model is identical to the word model:
context_model = Sequential()
context_model.add(Embedding(vocab_size, embed_size,
embeddings_initializer="glorot_uniform", input_length=1))
context_model.add(Reshape((embed_size,)))
The outputs of the two models are each a vector of size (embed_size). These outputs are merged into one
using a dot product and fed into a dense layer, which has a single output wrapped in a sigmoid activation layer. You have seen the sigmoid activation function in Chapter 1, Neural Network
and flatten out, and numbers lower than 0.5 tend rapidly to 0 and also flatten out:
model = Sequential()
model.add(Merge([word_model, context_model], mode="dot"))
model.add(Dense(1, init="glorot_uniform", activation="sigmoid")) model.compile(loss="mean_squared_error", optimizer="adam")
The loss function used is the mean_squared_error; the idea is to minimize the dot product for positive
examples and maximize it for negative examples. If you recall, the dot product multiplies
corresponding elements of two vectors and sums up the result—this causes similar vectors to have higher dot products than dissimilar vectors, since the former has more overlapping elements.
Keras provides a convenience function to extract skip-grams for a text that has been converted to a list of word indices. Here is an example of using this function to extract the first 10 of 56 skip-grams generated (both positive and negative).
We first declare the necessary imports and the text to be analyzed:
from keras.preprocessing.text import *
from keras.preprocessing.sequence import skipgrams text = "I love green eggs and ham ."
The next step is to declare the tokenizer and run the text against it. This will produce a list of word
tokens:
tokenizer = Tokenizer()
tokenizer.fit_on_texts([text])
The tokenizer creates a dictionary mapping each unique word to an integer ID and makes it available
in the word_index attribute. We extract this and create a two-way lookup table: word2id = tokenizer.word_index
id2word = {v:k for k, v in word2id.items()}
Finally, we convert our input list of words to a list of IDs and pass it to the skipgrams function. We then
print the first 10 of the 56 (pair, label) skip-gram tuples generated:
wids = [word2id[w] for w in text_to_word_sequence(text)] pairs, labels = skipgrams(wids, len(word2id))
print(len(pairs), len(labels)) for i in range(10): print("({:s} ({:d}), {:s} ({:d})) -> {:d}".format( id2word[pairs[i][0]], pairs[i][0], id2word[pairs[i][1]], pairs[i][1], labels[i]))
The results from the code is shown below. Note that your results may be different since the skip-gram method randomly samples the results from the pool of possibilities for the positive examples.
Additionally, the process of negative sampling, used for generating the negative examples, consists of randomly pairing up arbitrary tokens from the text. As the size of the input text increases, this is more likely to pick up unrelated word pairs. In our example, since our text is very short, there is a chance that it can end up generating positive examples as well.
(and (1), ham (3)) -> 0 (green (6), i (4)) -> 0 (love (2), i (4)) -> 1 (and (1), love (2)) -> 0 (love (2), eggs (5)) -> 0 (ham (3), ham (3)) -> 0 (green (6), and (1)) -> 1 (eggs (5), love (2)) -> 1 (i (4), ham (3)) -> 0 (and (1), green (6)) -> 1
The code for this example can be found in skipgram_example.py in the source code download for the