• No se han encontrado resultados

8. ESTADOS UNIDOS: MERCADO DEL FUTURO

8.3. Requisitos de embalaje

5.5.1

Introduction

The second ‘core’ conceptual distinction that we examine is the one betweenabstractand

neuropsychological data shows they are acquired and processed differently (e.g., Crutch & Warrington, 2005). Concretely, Warrington (1975) reports data from patients suffering from brain damage who exhibit selective impairment on concrete concepts, whereas their semantic system regarding abstract concepts remains relatively intact (see also Warrington & Shallice, 1984). Other studies have focused on the importance of concreteness as a psycholinguistic variable (Paivio, 1971), or its significance on vocabulary development (Brown, 1957).

Hill, Korhonen & Bentz (2014) sought to test several claims regarding the organisation and representation of abstract and concrete concepts. More specifically, they test claims by Paivio (1971) and Hopkins & Schwanenflugel (1993) that (a) abstract concepts have more but weaker connections to other concepts than concrete ones, (b) concrete concepts are organised in the mind according to similarity, whereas abstract concepts are organised according to association, and (c) concrete representations have a high degree of feature-based structure, whereas abstract representations do not. Using data from WordNet and the University of South Florida Association Norms (see §1.3.1), they find that abstract and concrete concepts differ along these three dimensions supporting claims of a differential organisation. Considering the above discussion on animacy (§5.4), Brysbaert et al. (2013) shows that the abstract and concrete labels are better thought of as the two ends of a concreteness continuum rather than categorical differences.

The above results suggest that there are representational differences between abstract and concrete concepts, but they do not necessarily imply that these differences manifest themselves in the distributional patterns of the words. This problem was also noted above in the case of animacy, however, there we were able to show that both in English and other languages, animacy can determine word order or the choice specific syntactic paraphrases. Given that Hill et al. (2014) find support for hypothesis (a) above, that abstract concepts have more but weaker connections to other concepts than concrete concepts, we expect concreteness to be reflected distributionally. The reason for expecting this is because, in the neural network setting introduced in §2.3.1, abstract words will tend to activate more nodes in their output layer than concrete ones, albeit with weaker activations. For now, we will proceed under the assumption that abstract and concrete concepts are not only organised differently, but their distributional patterns also vary. We examine this hypothesis and its implications in §5.8.2.

The tasks

We note in §1.5 that the experiments done by Paciorek & Williams (2015) focused on the acqui- sition of the semantic preferences of novel verbs. The introduction of this novel methodology (verbs instead of determiners) prompts us to examine whether the two tasks (those reported by Williams, 2005 and Paciorek & Williams, 2015) are equivalent, hence, comparable. There

are three possible interrelated differences we can find; (a) linguistic, (b) statistical, and (c) computational. From a linguistic point of view, the difference between the two experiments is transparent; instead of an np, a vp is used as the critical phrase. While, however, this might seem like a little manipulation it has an interesting implication; in our survey above (§5.2) of learning the arguments of determiner phrases (i.e., the genders), we show that learners need prolonged exposure to the system to achieve native-like processing. On the other hand, learning the arguments of verbs should not be as hard as these semantically driven collocations should persist between languages. For example, the fact that the verbeatrequires a [±edible]

feature to be checked in its arguments should be independent of the language spoken. In a related view, these differences are interesting in the context ofbackwardsprobabilities noted above §5.2. In the case of article→noun combinations, speakers of languages with

articles should be more inclined to consider the backwards probabilities as a potential source of information. In the case of verb→argument bigrams, on the other hand,anyspeaker would

benefit from considering the backwards probabilities. This dissociation makes an interesting cross-linguistic prediction; while in the case of nouns participants who speak a language which uses articles will have an advantage, this advantage will fade in speakers of languages which do not use articles. Indeed, Paciorek & Williams (2015, Experiment 3) find that the implicit learning effect reported in Experiment 1 persists in Polish, a language without articles. §6.3 looking at the transitional probabilities of article→noun combinations between languages

presents a similar view.

Computationally, however, the changes introduced in Paciorek & Williams (2015) do not influence the structure of the learning model. The task of associating novel non-words with known nouns remains the same for the model in both cases. Moreover, while Paciorek & Williams (2015) use afalse memorytask during the testing phase, that is, the participants are asked to generalise to completely novel phrases, computationally the task remains the same as in both cases participants face two alternatives from which they need to choose. If we assume that participants abstract certain information from the input which is common to the stimuli in the training and testing phases, instead of remembering the stimuli they have seen, the model would not have any problems generalising to novel nouns.

Let us now look at a critical manipulation done by Paciorek & Williams (2015) in Experi- ments 1 and 4. Specifically, the authors explored whether semantic implicit learning effects persist even when the similarity between training and testing items is curtailed. Table 5.2 shows the differences between the two experiments. While the training lists were very similar in the two experiments, the similarity of each testing item to those lists varied between experi- ments (see also, Table 5.1). Paciorek & Williams (2015) find that the learning effect of drops fromη2=0.29 in the high similarity condition toη2 =0.09 in the low similarity condition.

Table 5.2Examples of high- and low-similarity stimuli from Paciorek & Williams (2015). In

each experiment participants saw four nouns with each verb (i.e., eight from each semantic category) and were asked to generalise to 32 novel verb→noun instances. The novel verbs

goubleandpowterare paired with abstract, whereasconellandmoutenwith concrete nouns. Phase

Training Testing

gouble force gouble impact

Experiment 1 powter status powter importance

(High similarity) conell oxygen conell potassium

mouten calcium mouten magnesium gouble force gouble surprise

Experiment 4 powter prestige powter pride

(Low similarity) conell oxygen conell glass

mouten furniture mouten bread

Note:During the training phase of each experiment participants saw the items embedded in English sentences but during testing they were only presented with two<verb, noun>

alternatives (one grammatical and one ungrammatical).

The implication of these results is that participants do not consider abstract features such as

concretenessto be relevant during the task, as if they were doing so, then the effect would be similar regardless of the semantic distance between the two sets. The above explanation does not preclude the hypothesis that the participants still base their decisions on abstract semantic features. In this case, the participants could still be guided by semantics, however, because of the specificity of the categories used (e.g., chemical elements) they base their decisions on a more constrained feature than concreteness. We use the WordNet representations to explore this hypothesis further.

As in the case of animacy, we expect that the neural embeddings would not only be able to model the performance of the participants in the tasks but also that they would be similarly impacted by the semantic distance manipulation.

5.5.2

Materials

Paciorek & Williams (2015, Experiments 1 & 4)

We construct two abstract-concrete datasets from the stimuli used by Paciorek & Williams (2015) using the same method as above (§5.4.2). Each dataset is split between non-overlapping

sets of training and testing stimuli. For each semantic distinction (i.e., abstract and concrete) the participants see eight items during training (four with each determiner) and are subse- quently tested on 16 novel ones (eight with each determiner). There is one difference between our design and the one used in the behavioural experiments. In the original experiments, in some of the sentences the nouns were preceded by a determiner (e.g.,gouble the force). Again, the model is not designed to account for this behaviour; however, in one experiment (Experiment 2) Paciorek & Williams (2015) found significant effects even after the removal of the determiner which indicates that the participants might not use such syntagmatic cues during the experiment. A complete description of the stimuli used can be found in §C.2.

5.5.3

Results and discussion

Figure 5.4 shows a two-dimensional projection of the stimuli used in Paciorek & Williams (2015). While the scales are meaningless int-sne as the algorithm chooses a random starting point, we see that in the case of the high-similarity dataset (Fig. 5.4a), the datapoints (i.e., the words) are more concentrated around the cluster centroid (i.e., the mean value of the cluster). In the case of the low-similarity dataset, we observe not only greater dispersion but also sub-groupings within the dataset. Usingt-sne instead of a variance based dimensionality reduction method, we can discover bothlocalandglobalclusterings within our data. In this case, the algorithm discovers two clusters globally (abstract and concrete) but more local clusters. For example, thecream,honey,chocolate,bread,meat, andwheatcluster, which we can calltypes of food, is detached from the rest of the concrete stimuli rendering any comparisons harder for the learner. We have already argued that thet-sne solution does not necessarily predict the performance of the participant during the testing phase as the model is tied to its initial weights and the training phase. However, the topological characteristics of the stimuli in Fig. 5.4 suggest that we should expect lower performance during the testing phase.

The generalisation gradients do not provide any useful information apart from the fact that the model quickly learns the high similarity system (peaks at 80%) while showing only a mild preference towards the grammatical alternatives (peaks at 60%) in the low similarity case. Instead, we provide the estimates of the network activations at the epoch which maximises the fit to the reported behavioural data. Figure 5.5 presents the activations of the model averaged by learner at the corresponding epochs for each model (Epoch 15 for the high similarity, Epoch 17 for the low similarity). For comparison, we also plot the behavioural results reported in Paciorek (2013) on the same dataset. While all the effects appear to be significant in the models’ estimations (something that did not happen in the original human data), qualitatively, we observe that the predictions of the model are quite close to the human performance. Concretely,

proteins acclaim force calories influence authority splendour nitrogen dopamine eminence trust hydrogen fame recognition impact prosperity strength vitamins glycogen carbohydrates sugar role significance appeal magnesium glucose status power importance potassium greatness serotonin fertilizers prestige minerals oxygen enzymes nutrients prominence value ozone calcium aerosol insulin position methane histamine esteem Abstract Concrete (a) force understanding authority honeychocolate paper esteem anger fear glue

quality fame happiness

cotton impact

prosperity

cream wood plastic paint

soil relevance surprise bread concern wheat petrol wisdom glass anxiety charm carbon prestige likelihood furniture pride success oxygen luggage grass metal value meat calcium sand reputation feeling salt Abstract Concrete (b)

Figure 5.4Two-dimensional projection of the stimuli used in Paciorek & Williams (2015). (a)

Projection of the high-similarity dataset (Experiment 1), (b) Projection of the low-similarity dataset (Experiment 4). The high-low similarity dissociation is apparent as (a) the variance is smaller in the high-similarity case, and (b) thet-sne discovers more independent subgroupings in the low similarity (e.g., bread, meat, wheat) dataset that might bias learners towards erroneous generalisations.

the model selectsconcretewords with more certainty in the high similarity dataset along with the fact that the grammatical alternatives are significantly higher than the ungrammatical ones. In the low similarity case, on the other hand, the effect vanishes for abstract concepts (as in the behavioural data), retaining somewhat a difference in the concrete concepts.

Looking at the activation of the hidden layer of the network when we present it with the stimuli from the training set sheds some light for the low similarity dataset. Figure 5.6 shows the activations for the two datasets (high and low). In Fig. 5.6a we see that the network quickly learns to implicitly distinguish between abstract and concrete concepts, enabling better generalisation. On the other hand, in Fig. 5.6b we plot the activation of the hidden layer when given the vectors of the low similarity training set. We see that even after 2000 epochs, the model cannot implicitly classify the training examples as either abstract or concrete.

High Low GrammaticalUngrammatical GrammaticalUngrammatical 0.00 0.25 0.50 0.75 1.00 Ac tiva tio n

(a)Model predictions

High Low GrammaticalUngrammatical GrammaticalUngrammatical 0.00 0.25 0.50 0.75 1.00 En do rs em en t ra tes Abstract Concrete (b)Behavioural results

Figure 5.5Model predictions at the best epoch (High similarity: Epoch 15, Low similarity:

Epoch 17) for the stimuli used in Paciorek & Williams (2015). We obtain the point estimates by maximising the fit of the predictions to the reported data (see §5.3.2). The behavioural results are reproduced from Paciorek (2013).

Presumably, the distributional features contained in the vectors are markedly different between the training and the testing sets, so the model learns some irrelevant, yet mildly predictive from its point of view, function. This result suggests that human learners, when presented with a low train-test similarity dataset, would be more prone to focus on irrelevant cues.

In §5.4.3 we noted that even though the hidden layer could not distinguish between animate and inanimate concepts, the network performs very well during training. In the high similarity case, on the other hand, the network both distinguishes internally between abstract and concrete concepts and achieves high levels of generalisation. Looking at Fig. 5.6a one more time, we first see that the network distinguishes between the two groups rather quickly. Secondly, the division between semantic groups is more clear in this case compared to Fig. 5.3c. Based on these results, we argue that the network is more certain about the regions that predict which output unit is the correct one and has no problem placing larger weights there.

Epoch 1 Epoch 100 Epoch 1000 Epoch 2000 Abstract Concrete

(a)

Epoch 1 Epoch 100 Epoch 1000 Epoch 2000

(b)

Figure 5.6Two-dimensional projection of the activation of the hidden layer given the training

stimuli used in Paciorek & Williams (2015). (a) Projection of the hidden layer given the high- similarity dataset (Experiment 1), (b) Projection of the hidden layer given the low-similarity dataset (Experiment 4). The three words in the low similarity dataset that cluster with the concrete concepts instead of the abstract wereauthority,force, andvalue.

Documento similar