• No se han encontrado resultados

Apertura “inbound” e innovación de producto

CAPÍTULO 2: REVISIÓN EMPÍRICA, MODELO Y METODOLOGÍA DE LA INVESTIGACIÓN

2.2 REVISIÓN EMPÍRICA, HIPÓTESIS Y PLANTEAMIENTO DEL SUBMODELO 1

2.2.1 Apertura “inbound” y resultados de INNOVACIÓN

2.2.1.1 Apertura “inbound” e innovación de producto

We generalise the skipgram model in two ways, to learn tensor representations of arbitrary words. First, we change the notion of context from a linear context window to contexts fol- lowing a dependency tree, which is reminiscent of the dependency based word embeddings of Levy and Goldberg [LG14], which also uses dependency relations to generalise skipgram, but unlike their embeddings we do not incorporate the dependency labels themselves in our training model, but rather let the architecture of the model depend on these. The second modification is exactly to change the architecture according to the dependency relations of a word: for a given word w, the words d1, ..., dn that modify w with the specified (fixed)

dependency labels l1, ..., ln, determine the shape of the tensor that is being trained for w,

as well as providing the context words for training. In addition, we may impose a specific tensor shape to the model of a given word, leading to three different types of models which we formally define below.

Full Tensor Skipgram For a given word W with dependency arguments d1, d2, ..., dn (as

described above), a full tensor model trains a rank n tensor representation with the following objective function: X c2C log (W d1...dn· c) + X c2C log ( W d1...dn· c) (5.3)

When W is an adjective, the objective function of the full tensor skipgram model instantiates the adjective skipgram model of Maillard and Clark [MC15], since an adjective has only one dependency argument d1: X c2C log (W d1· c) + X c2C log ( W d1· c) (5.4)

When W is a noun, the objective function of the full tensor skipgram model reduces to the regular skipgram model with negative sampling since nouns do not have any dependency arguments:

X

c2C

log (W · c) +X

c2C

log ( W d1...dn· c) (5.5)

Partial Tensor Skipgram The full tensor skipgram model above assumes that an appropri-

ate notion of context is available after contracting the target tensor with all the vectors for its dependency arguments. Indeed, for adjectives this is the case as the adjective-noun combina- tion is again a noun and so the linear context window can be carried over from the definition of the original skipgram model. This does not hold for arbitrary words, e.g., in the case of verbs the verb-subject-object combination is not a noun or a verb, but a sentence. To remedy this issue, we define a partial model in which one dependency is left out of the composition and used as context. Now, for a word W with dependency arguments d1, d2, ..., dn, we train

a tensor that is of rank n 1, providing a decomposition of the tensor of W into n separate lower rank tensors, one for each dependency. For an arbitrary dependency di, the objective

function becomes as follows X di2C log (W d1...di 1di+1...dn· di) + X di2C log ( W d1...di 1di+1...dn· di) (5.6)

Lower Rank Tensor Skipgram If we decrease the rank of the tensor even more, we parame-

terise over the contexts as well as over the dependency arguments, leading to a choice of the dependencies to include in the context. Formally speaking, a rank n i tensor is learnt with the objective function as below:

X d1...di2C log (W di+1...dn·P+{d1, ..., di})+ X d1...di2C log (W di+1...dn·P+{d1, ..., di}) (5.7)

where P+{d1, ..., di} means we choose some positive subset of the available dependency ar-

guments as context. These models are exemplified in the following paragraphs for the case of transitive verbs.

Verb Skipgram We instantiate our tensor skipgram model on transitive verbs. These have

as dependencies both a subject and an object. We summarise all trained models by rank of the tensor and choice of context in Table 5.1, and give a detailed explanation below.

Representation Rank Context

va vector linear window

vs/vo/vb vector objects/subjects/both

VSa/VOa matrix full sentence VSo/VOs matrix objects/subjects Va cube full sentence

TABLE5.1: Our verb representations, ranging from vectors to cubes, with general or restricted context. va denotes a standard skipgram

vector, with sub- and negative sampling. For the full sentence con- texts, the arguments that are transformed by the representation are

The full rank tensor model of Equation 5.3 for a transitive verb results in learning a rank 3 tensor, i.e. a cube. Given a verb V , we denote its first argument d1, i.e. its subject, by s, and

its second argument d2, i.e. its object, by o. The full objective function of this model is as

follows: X c2C log (V os· c) +X c2C log ( V os· c) (5.8)

As all dependency arguments are transformed, the notion of context reduces to some context window, which we choose to let range over the full sentence, as the arguments themselves may occur at any position in the sentence in which the verb occurred. This model gives representation Vafrom Table 5.1.

Next, we instantiate the partial model of Equation 5.6, and train a subject matrix VSand an object matrix VO. In the former case, the matrix tries to predict the object of the verb, given a fixed subject vector, as in Equation 5.9, where O refers to the set of contexts, which are objects. In the latter case, we train a matrix that predicts the subject of the verb, given the fixed object vector (Equation 5.10, contexts are subjects in S).

X o2O log (V s· o) +X o2O log ( V s· o) (5.9) X s2S log (s· V o) +X s2S log (s· V o) (5.10)

These models we refer to as VSo and V O

s in Table 5.1, respectively. To see if this decomposition

in combination with a dependency based context produces a sensible model, we compare these with VSa and V

O

a, which still transform the subject (resp. object) vector, but predict a

full sentential context rather than a single dependency.

Last, we instantiate Equation 5.7 to produce vector representations for the verb. Given that there are two dependencies, the different choice of context leads to three models vs, vo,

vbthat respectively take only subjects, only objects, or both arguments as context. We com-

pare these to va, which is the original skipgram model with a linear context window.

Implementation Details We implemented all models in Python, using the tensorflow

package [Aba+16]2. Vectors were 100-dimensional; matrices and cubes were shaped accord-

ingly. The dependency information was extracted from a dependency parsed corpus3con-

taining ca. 130M sentences, on which the initial regular noun vectors were also trained. In the case of matrices and cubes with full context (Va, Va), a pair of networks was trained

separately for each verb, sharing the context matrix from the noun skipgram model. For the matrices with subject (resp. object) context, we trained a pair of networks with an embed- ding layer encoding all verbs. In these networks, the context matrix consists of all possible object (resp. subject) context vectors. Here we considered both a fixed context matrix (from the noun skipgram model) and a trainable context matrix and found that the trainable con- text matrix gave the best results4, so we work with the latter. Negative samples were drawn

from the distribution over possible objects (resp. subjects) of all verbs in the case of the partial tensor models. We considered k = 10 negative samples per subject (resp. object).

2Code and models will be made available online.

3We used the UKWaCkypedia corpus, available fromwacky.sslmit.unibo.it

4We argue that this is because contexts in the noun skipgram model are more general as they serve