Capítulo III. La falsificación de documentos en el registro
3.2. Clases de falsificación de documentos
3.2.2. Falsificación de la certificación de apertura de libro
Let us take a step back and examine what exactly is the input to the above systems. Keeping in mind the idea that learning is a by-product of information processing, the primary input to these models should be a description of an entity we come in contact with and is found in the environment as, for example, sound or light. This ‘raw’ input description will contain, however, an immense amount of information, some of which is going to be predictive for the task while the rest will be environmental ‘noise’. The models introduced above can cope with that noise to a large extent and transform the input into an internal representation useful to carry out the task. There are two questions we explore in the present section; (1) how can we distill environmental information into a format which would be usable for the system to carry out a specific computation and (2) how can we describe entities at different levels of abstraction (e.g., animal as opposed to dog)?
Take the human visual system for example. Despite the complexity of the operations involved in detecting edges, hues, shapes and orientations, the initial ‘raw’ representation which is given from the physical world is quite straightforward. Briefly, the photoreceptors in the retina detect light intensity values.¹² Knowing, therefore, the light intensity value at
¹²We make the simplifying assumption at this point that humans can see only one colour (e.g., in grayscale). This assumption is taken not to clutter the text with more complicated notation. In any way, the above logic trivially extends to multicolour recognition if we accept that there are different kinds of photoreceptors sensitive to different colours.
each point in the visual field, we can construct a ‘map’ the coordinates of which show us the intensity values. For convenience, let the matrixM∈RX×Ybe that map whereXandY are the
size of the visual field. Each elementMi jis a light intensity value ranging from 0 (i.e., white)
to 1 (i.e., black). While this representation might be cluttered with a lot of irrelevant noise, the human visual system is able to extract all the relevant information in order to recognise objects, faces and so on.
In short, theserepresentationswhich enter the system define a systematic way of describing entities or types of information (Marr, 1982) available during processing. As above, visual representations can describe light intensity; auditory representations, on the other hand, can describe the wavelengths of vibrations in the environment. There are two remarks that need to be made at this point; (1) as we noted above, environmental input is ‘noisy’. In order to not clutter the model with irrelevant information which might need more time to train or provide more local minima, the input is typically pre-processed so as to reduce the amount of superfluous information contained within it. (2) the word ‘systematic’ is of importance in the present context. Inevitably, the description we choose is going to highlight some features of the input while push back others. For example, if we train a network to distinguish dogs from cats it might be irrelevant to include in the input that some dog breeds are more susceptible to canine epilepsy than others.
While for visual and auditory recognition processes the input representations are quite straightforward, the matter of input representation is more complicated when we look at language. Do we use ‘raw’ visual input (i.e., printed texts)? or should we start with speech input? do we pre-process grammatical constructions or we somehow let the system figure them out? Choosing among the alternatives implicitly subscribes us to a level of linguistic description which might or might not be appropriate in the present context. For example,
generativistsdo not really care about the environmental input as this is too variable but focus more on a higher level of explanation once this input has been mapped to something more invariant (commonly called the I-Language, Chomsky, 1986). For a researcher subscribing to connectionism the primary environmental input can be very informative as seemingly complex rules such as the Past Tense formation which we saw above can be recovered solely on that level without appealing to additional mechanisms (Rumelhart & McClelland, 1986) (although see Tyler et al., 2004).
Staying on the topic of linguistic representations, the matters are complicated further when we considersemantics. Undoubtably, the issue of semantic representations is one of the most formidable topics in cognitive psychology. Different approaches including philosophical (Wittgenstein, 1922), psychological (Barsalou, 2008; Collins & Quillian, 1969; Rogers & Mc- Clelland, 2004) and computational (Landauer & Dumais, 1997) have sought to explore how
words relate to each other and to concepts or how, in turn, concepts relate to each other or to percepts and actions. The multimodal nature of semantic representations makes it harder for the modeller to construct a description of concepts which captures what is evoked in the brain when a target concept is seen or heard. Since semantic representations are at the heart of the present thesis, we devote a few paragraphs introducing the different ways we can represent semantics which can either emphasise the abstract conceptual structure or the associative relations between the words. This can by no means be considered a complete review of the ways we can capture semantic relations. Chapter 2 goes into detail on how to extract semantic representations by looking at word co-occurrences. Furthermore, in §3.3.2 we go into more detail on how to turn the representations below into an appropriate input for the networks.
Word Association Norms
Word Association Norms (an) focus on describing how words are associated with each other. This gives ashallowsemantic representation in that we do not necessarily take into account the
natureof the relation between the words. Generally, such association norms are compiled by asking participants to produce the first word that comes to mind that is meaningfully related to a target. For instance, a participant might encounter,
graduate . . . .
to which they might respond,student,schoolordegree, in which case these would be considered associates to the target graduate. The semantic representation can then be constructed by looking at which words relate to the target. In this case, the input to the model can be a
∣V∣-dimensional binary vector (where∣V∣is the number of words used as targets) where
the non-zero values indicate that a word is associated with the target. To get an even more accurate image of the relationship the target word has to its associates we might also want to weigh the potential responses according to how many participants gave that answer. For example, if 24 people respondedstudentin the above example, out of 148 this gives a weight of .162. The corresponding element, therefore, in the vector representation is going to be .162 instead of 1 indicating a weak relationship between the two words.
These association norms have been an indisposable tool for cognitive psychologists because of their coverage and the ease with which they can be generated. The University of South Florida Free Association Norms (Nelson, McEvoy & Schreiber, 2004) we use later on contain targets for 5019 cue words and are commonly used in designing stimuli lists (e.g., Hutchison, Balota, Neely, Cortese, Cohen-Shikora, Tse, Yap, Bengson, Niemeyer & Buchanan, 2013) or as a benchmark for evaluating systems which automatically generate such representations (Kiela, Hill & Clark, 2015b). More recently, Deyne, Navarro, Perfors & Storms (2016); Deyne,
Navarro & Storms (2012); Deyne & Storms (2008) have extended this line of work by providing a extensive database of association norms for both English (Deyne et al., 2012) and Dutch (Deyne & Storms, 2008) which lead to better predictions of lexical access and semantic relatedness, particularly for words which are weakly related.
Semantic Feature Norms
Semantic feature norms focus mainly on the relations between concepts and percepts and actions or to other concepts. Examples of such relations are that cats have tails (concept to concept) or that cats are independent. These representations do not worry about the relations betweenwordsand theconceptsthey refer to. Thisindirectrelationship to language enables them to go beyond the mere associations captured by the word ans. Similar to word ans,
semantic feature normsare collected by asking participants to list properties for a target word. Participants are instructed to include properties such as: physical, how the concept referred to by the target words looks, sounds, smells, feels or tastes.
Semantic features are commonly used or assumed to exist in several theories of categori- sation and conceptual representation. For example, in exemplar theories of categorisation (Nosofsky, 1986), participants are assumed to attend to correlations of features, and how these are predictive of the category a concept falls in. In formal modelling, minerva 2, a model of associative memory, assumes that memory is composed of empty slots which are filled with the features of the incoming probe. Incoming stimuli containing the relevant features strengthen the association of this element to the category.
In cognitive modellingsemantic feature normsare used to model a variety of psychological phenomena. Cree, McRae & McNorgan (1999) used the semantic feature norms compiled by McRae, de Sa & Seidenberg (1997) (an earlier version of McRae et al., 2005) as input to an attractor neural network, a special type of the networks introduced above where the training pattern continues to activate the output for a few timesteps until it settles to a stable pattern in the output, to successfully model semantic priming effects. Moreover, Mirman & Magnuson (2008) looking at the effect of semantic neighbourhood density on word recognition, also used the McRae norms as input to an attractor neural network measuring the time it took for the model to settle to a pattern in the output layer. Finally, Rabovsky & McRae (2014) modelled seven n400 Event-related potential (erp) component effects reported in the literature using, again, the McRae norms. Other commonly used feature norms include the ones gathered by Vinson & Vigliocco (2008) which described objects and scenes as well as those collected by Devereux, Tyler, Geertzen & Randall (2013).
WordNet
Similar to semantic feature norms another commonly used semantic description capturing concept to concept relations is WordNet (Fellbaum, 1998). WordNet is a large database where concepts are represented in terms of abstract propositions (to a great extent is-a relations), as, for example, dog is-a carnivore (see Fig. 1.4). This organisational scheme in WordNet captures the hierarchical nature of semantic relations as evidenced by developmental (Keil, 1979), reaction time (Collins & Quillian, 1969) and brain damaging data (Warrington, 1975). Again, as above, language is only indirectly addressed as all the contained words are normalised to their corresponding concepts, but contrary to the above, WordNet is hand-coded. In this way, WordNet looks more like a machine-readable dictionary/encyclopaedia than a model of semantic memory.
Because of its coverage and granularity which extends beyond what is commonly captured by semantic feature norms, WordNet is a commonly used tool both in cognitive modelling (Miller & Fellbaum, 1992) andnatural language processingtasks (Harabagiu, 1998). Budanitsky & Hirst (2006) use WordNet and various semantic similarity metrics to evaluate how close WordNet representations fit human similarity judgements. Ó Séaghdha (2007) achieved state-of-the-art results on a compound noun learning task (e.g.,steel knife) using WordNet representations. In Chapter 3 we examine whether WordNet representations arealsosuitable for modelling implicit learning tasks.
We recognise the importance and appropriateness of all the different paradigms to study the organisation of the semantic memory. Undoubtedly, they have different strengths, and they are likely to be more appropriate considering various tasks. This appropriateness stems from the fact that the representation we choose to use is bound to highlight some aspects of the input pushing others in the background. Semantic feature norms and WordNet focus on the relations thatconceptsestablish with otherconceptswhereas word association norms remain on theword-wordlevel. Furthermore, the level ofgranularitycan potentially be another issue; the scope is much more constrained in Feature Norms (fn) than in WordNet. This level of details comes, however, at a computational cost as it introduces potentially irrelevant noise.
In Chapter 2 we will be outlining a more sophisticated method to extract associative relations betweenwords, which extends the scope from a few words to every other word in the English vocabulary. This can potentially be problematic for reasons similar to the ones
generativistshave chosen to look at I-language instead of E-language. Looking at language usage instead of abstracting the underlying conceptual structure might lead us to spurious problems related to individual differences (e.g., linguistic knowledge, fatigue, dialect spoken,
canine carnivore
placental
mammal vertebrate chordate
animal organism
living thing whole
object physical entity entity
dog
flag pack
canis domestic animal
Figure 1.4WordNet hierarchy for the lemmadog(synset:dog.n.01) showing three kinds of
relations; (a) is-a relations in white (e.g., dog is a domestic animal), (b) has-a relations in dark gray (e.g., dog has a tail –called a ‘flag’ on some breeds), (c) is-member-of relations in light gray (e.g., dog is member of a pack). Also, we make two remarks regarding this figure; (a) there can be synsets where there is not necessarily one path from the synset to the root (alwaysentity.n.01). In cases where multiple paths co-exist, we follow them to the root
filtering the duplicates. (b) We also note that for readability reasons we omit the specific sense which might cause confusion with the infrequent use offlag as another word for ‘dog tail’. In constructing the feature matrices, however, the entire synset name was used (i.e.,flag.n.07).
etc.). However, the models we present later on can cope with such differences and be used as a proxy to gain insight on the structure of semantic memory. Further to that, many theories of sentence and discourse understanding (Ericsson & Kintsch, 1995; Kintsch & Mangalath, 2011) identify this level of description as an important one in the early stages of language processing.