Obtención de Inicio y Final de una Frase - Captura y preprocesamiento de la señal de voz

Capítulo IV Identificación de Hablante como una Aplicación de la Red Neuronal

4.3 Captura y preprocesamiento de la señal de voz

4.3.3 Obtención de Inicio y Final de una Frase

Elsi Kaiser

1 Introduction

This chapter provides an introduction to some of the key methods commonly used in psycholinguistic research. We will focus on three main types of methods: reaction-time-based methods, visual-attention-based methods, and brain-based methods, and also brieﬂy mention other kinds of approaches. As will become clear over the course of this chapter, each of these categories consists of multiple experimental paradigms, and choosing the“right” one often comes down to which method is most appropriate for a particular research question. It would be inaccurate to characterize one method as better than the others, since each has its own strengths and weaknesses. In what follows, we will consider each method in some depth, and comment on the ease of implementation and data analysis.

For the most part, the discussion in this chapter will focus on language comprehension, but some discussion of production methods is also included (see also Bock1996for an in-depth review of production methods). This asym-metry is a reﬂection of the greater body of prior work that exists on language comprehension. In the past, production has received considerably less attention, mostly due to methodological challenges.

This chapter focuses largely on so-called on-line methods– that is, methods that tap into real-time aspects of language processing. On-line methods play an important role in psycholinguistic research, because many of the processes under-lying human language processing are very rapid (on the order of milliseconds), transient, and not accessible to introspection. For example, data from eye-tracking has shown that during auditory language processing, listeners brieﬂy activate a set of words that overlap acoustically with the word they are hearing (e.g., hearing

“beaker” will result in “beetle” and “speaker” also being briefly activated due to the word-initial and word-final overlap respectively, Allopenna, Magnuson, and Tanenhaus 1998). These activations are very short-lived and we are not con-sciously aware of them, though they can be reliably detected by time-sensitive methods such as eye-tracking. In a different linguistic domain, research looking at Binding Theory using self-paced reading suggests that when people read reflexive pronouns (“himself,” “herself”), they not only activate the syntactically licensed antecedent (e.g., the local subject“John” in a sentence such as “Bill thought that John owed himself another chance to solve the problem”), but also briefly consider gender-matching entities that are not syntactically licensed as the

135

antecedents (e.g., the matrix subject“Bill”) (Badecker and Straub2002; but see Sturt2003for a different view). Similar intrusion effects have also been obtained for negative polarity items (e.g., expressions like “any,” “ever,” as in Vasishth et al. 2008). Similar to the transient activation of multiple lexical items, these effects are below the threshold of our conscious perception, but can be detected with the right kinds of experimental paradigms. Because information about these kinds of phenomena often plays a key role in the formulation and testing of theories and models of language processing, on-line methods can provide critical insights.

Another area where on-line methods have made important contributions has to do with the way in which the human language processing system accesses and makes use of different kinds of linguistic information, such as syntactic vs semantic information. A sentence like “The witness examined by the lawyer turned out to be unreliable” is syntactically temporarily ambiguous, because when only the ﬁrst few words are available (“The witness examined . . .”), a comprehender might be tempted to interpret“the witness” as the agentive subject of the verb“examined” – an interpretation that is subsequently shown to be false.

This kind of situation, where the parser builds a syntactic structure that is later shown to be incorrect, is called garden-pathing. However, if the comprehension system makes immediate use of semantic animacy cues, a sentence like “The evidence examined by the lawyer turned out to be unreliable” should not result in a garden path: since “the evidence” is inanimate, it cannot be the subject of

“examine.” This brings up the question of modularity, a key theme in psycholin-guistics: does the language processing system use both syntactic and semantic cues (as well as other cues) when parsing a sentence (an interactive system), or is the system modular– in particular, do early stages of processing only make use of syntactic information? A range of on-line methods have been used to investigate this question (e.g., Ferreira and Clifton1986; Trueswell, Tanenhaus, and Garnsey 1994; Clifton et al.2003), which has signiﬁcant implications for our understand-ing of the architecture of the language processunderstand-ing system.

On-line methods have also made crucial contributions to our understanding of language production. For example, Grifﬁn and Bock (2000) and Gleitman et al.

(2007) recorded speakers’ eye-movement patterns in scene-description experi-ments, to explore the temporal relationship between scene apprehension and linguistic formulation: Do speakers first process the “gist” of the scene before starting to build a linguistic representation of the event shown in the scene, or can these processes overlap in time? If they are separate processes, then there is no reason to expect people’s eye movements upon first perceiving the scene to correlate with their subsequent linguistic choices. However, Gleitman et al. (2007) found that people’s eye-movement patterns during the first 200 ms of observing the scene predict what they end up saying moments later – in contrast to the findings of Griffin and Bock (2000), who found no such correlation. Research in this area is still ongoing (see, e.g., Kuchinsky2009; Myachykov et al.2011;

Hwang2012).

In sum, on-line methods allow us to gain insights into transient effects that are often not explicitly“noticed” by language users, and also make it possible to learn about the time-course of both language production and comprehension. Because many psycholinguistic theories make explicit claims about the relative timing and relations between different aspects of language processing, on-line methods often play a crucial role in allowing us to compare competing theories.

However, it is important not to disregard off-line methods. Off-line approaches, such as questionnaires and surveys, are widely used, and provide crucial informa-tion aboutfinal interpretations (i.e., the final outcome of language processing). In addition, because people engage in real-time (on-line) processing before reaching theirfinal (off-line) interpretation, these final interpretations can yield insights into the nature of on-line processing as well. Experimental paradigms often combine both off-line and on-line measures to yield insights that would not be available from either method on its own (e.g., response-contingent analyses in eye-tracking studies, such as McMurray, Tanenhaus, and Aslin2002; Runner, Sussman, and Tanenhaus 2003). Off-line methods are discussed elsewhere in this volume (Chapters 3and6).

It is worth emphasizing that there are many components to a successful experi-ment: in addition to selecting an appropriate method, the researcher also needs to keep in mind other key issues, such as research ethics and human subjects approval, experimental design, the construction of critical items andﬁller items, well-worded instructions and the appropriate methods of data analysis. These topics are addressed in other chapters in this volume (Chapter 7on experimental design,Chapter 3on collecting judgments, andChapter 2on research ethics). It is also worth mentioning the beneﬁts of combining insights gained from experimen-tal work with other means of data collection, such as frequency patterns (and other kinds of information) computed from corpus analyses (Chapter 13), which has become increasingly common in recent years (e.g., Trueswell1996; Gibson2006;

Levy2008; Jaeger2010).

2 Reaction-time methods

One of the most widely used approaches for investigating real-time language processing involves measuring reaction times – that is, how rapidly people perform different kinds of linguistic tasks. For example, researchers have measured how quickly people read sentences, how quickly people start to produce sentences, and how quickly people recognize strings of letters (or strings of phonemes) as being real words or nonsense words. Intuitively, the idea is that reaction times provide an indication of processing complexity. It is often assumed that longer reaction times are associated with increased processing load and processing difﬁculty. For example, in a task where participants are shown words and asked to indicate whether they are real words or non-words (a lexical decision task), reaction times are sensitive to a range of word-level properties, including

word frequency: high-frequency words are recognized faster than low-frequency words (Whaley 1978), suggesting that retrieving lower-frequency words from memory carries a greater processing load. In the domain of production, when participants are shown a picture and asked to name the object, similar frequency effects arise: participants name pictures faster and more accurately when the name of the picture is a high-frequency word than when it is a low-frequency word (Oldﬁeld and Wingﬁeld 1965). On the syntactic level, it has been found that structurally more complex sentences– like sentences with relatively long syntac-tic dependencies– are read more slowly than sentences with a simpler structure (e.g., Grodner and Gibson2005).

There are a range of different methods that focus on measuring reaction times and the speed/duration of different processes, including lexical decision, self-paced reading, recording people’s eye movements during reading and, on the production side, production tasks that measure speech-onset latencies. We discuss these below.

In lexical decision tasks, participants see or hear words and are asked to indicate whether they are real words of English (or whichever language is being tested), often by pressing one key/button to indicate “yes” and another one to indicate

“no.” Normally, all critical words are real words (i.e., should trigger “yes”

responses), but the experiment as a whole also contains a number of nonsense words (usually onﬁllers trials), to prevent participants from developing a strategy of always responding“yes.” A wide range of linguistic issues have been inves-tigated using lexical decision tasks, including lexical access and syntactic pro-cessing (see Goldinger1996 for an overview on word-level research; Love and Swinney1996and Shapiro et al.2003for syntactic investigations of issues such as ellipsis and relative clauses). Many of these experiments use a method called cross-modal lexical decision, where the target words are shown in writing on the computer screen at the same time as participants hear words or sentences (hence the term “cross-modal”: both written and auditory modalities are used).

Alternatively, some experiments use a “unimodal” approach, where only one modality is involved (e.g., all stimuli are written; Gernsbacher 1990). These kinds of studies normally make use of the phenomenon of semantic priming (i.e., the fact that a target word is recognized faster if the comprehender has previously encountered a semantically associated word; Meyer and Schvaneveldt 1971). Thus, if a person has recently seen the word “nurse,”

recognition of the semantically associated word “doctor” will be facilitated, relative to a situation where presentation of“doctor” is preceded by an unrelated word (e.g., “juice”). The underlying assumption is that presentation of a word activates the word’s representation and this results in activation spreading to related concepts/words, which in turn facilitates the subsequent recognition of those words.

Some of the best-known examples of cross-modal tasks come from the early work of Swinney (1979) and Onifer and Swinney (1981) on the processing of homophones (words that sound the same but that have two different meanings,

e.g.,“ring,” “crane,” “coach”). In Swinney and Onifer’s experiment, participants listened to sentences like“The housewife’s face literally lit up as the plumber extracted her lost wedding ring from the sink trap,” which biased one meaning of the ambiguous word“ring.” As participants heard the sentence, they were asked to do a lexical decision task with target words shown in the screen. The targets included words like“bell” (related to the meaning of “ring” that is not supported by the contextual bias of the sentence),“ﬁnger” (related to the meaning of “ring”

that is contextually appropriate), as well as control words, unrelated to the mean-ing of the sentence but matched in frequency to“bell” and “ﬁnger” respectively.

Swinney and Onifer manipulated whether the target word (e.g.,“bell”) was shown on the screen right at the offset of the homophone in the auditorily presented sentence (“ring”), or 1.5 seconds after the offset of the homophone. Interestingly, their results show that, when probing right at the offset of the homophone, both meanings of the ambiguous word“ring” are initially activated (i.e., recognition of both“bell” and “finger” is facilitated), relative to the control words. Thus, even if the context biases one meaning, both meanings are briefly activated. (Swinney and Onifer show that this occurs even if the two meanings of the ambiguous words differ in frequency/dominance.) However, when the target word was shown 1.5 seconds later, only the contextually appropriate meaning was still activated. These earlyfindings suggest that initial lexical access is relatively unconstrained, but that the contextual biases kick in rapidly and suppress the irrelevant meaning. By using the cross-modal paradigm, Swinney and Onifer were able to tap into an ephem-eral, unconscious effect that would not have been detectable by off-line methods.

The lexical decision methodology has a number of advantages. Experiments are inexpensive to implement, there are a number of software options available, data analysis is fairly straightforward, and the methodology is technologically very portable– all that is needed is a computer and headphones/speakers. In addition, in the case of cross-modal lexical decision, the auditory nature of the stimuli means that this method can be used to investigate issues related to prosody and phonetics.

However, although this method has generated important insights regarding lan-guage processing, it also comes with some challenges. First, the task is arguably different from natural language processing: normally, when listening to sentences, we are not asked to simultaneously perform lexical decision tasks on words. Thus, the ecological validity of this method is not very high, and one could ask whether this could distort language processing. Second, one inherent limitation of this method is that on any one trial, only one point in time can be probed– for example, the target word can be presented 1,000 ms after the auditory presentation of the word of interest, or right at the offset of the word, but not both, at least not on the same trial. Thus, this methodology yields a“snapshot” of what is happening at a particular point in time (e.g., which meanings are activated at that point, and to what level), but it does not provide continuous information about how things change over time.

Another widely used method that relies on reaction time measurements is the self-paced reading paradigm. In essence, this method measures how much time

people spend reading words or phrases. There are a number of variants of self-paced reading, especially in terms of howﬁne-grained the temporal measurements are. In one variant, participants read entire sentences and press a button when they are done, which allows for the measurement of whole-sentence reading times. In other variants, sentences are presented clause-by-clause or word-by-word, which allows for increasingly ﬁne-grained measurements of how much time readers spend on each part of the sentence. The majority of current self-paced studies use a word-by-word moving window set-up, which means that words are displayed one by one, and each button press results in the previous word being covered by (e.g., by dashes or Xs) and the next word being revealed (e.g., changing from dashes to a word:--- --- --- => The --- --- => --- cat --- => --- --- meowed).

This allows researchers to record how much time a person spends on one word before moving on to the next word, which can help shed light on what points in a sentence are associated with increased processing load/processing difﬁculty.

For example, Stowe’s (1986) seminal work used self-paced reading to inves-tigate whether encountering a wh-expression will create an expectation for an upcoming gap (trace) where that element would have originated. She tested sentences like those in example (1). In (1a), the verb “bring” is immediately followed by the gap, whereas in (1b), the gap occurs later in the sentence. In the control condition (1c), there is no wh-expression (i.e., no reason to posit a gap).

Self-paced reading showed that readers did indeed expect a gap at the earliest possible location:“us” in (1b) is read more slowly than“us” in (1c)– that is, it causes processing difficulty (“filled-gap effect”). This suggests that this is an active/forward-looking process (Frazier and Clifton1989). When the processing system sees a “filler” (e.g., a wh-element that originated elsewhere in the sen-tence), it starts searching for a gap right away. The competing view, that gaps are posited only when there is no other possible parse available (Fodor1978) is not supported by these results.

(1) My brother wanted to know. . .

a. . . . who Ruth will bring __ home to Mom at Christmas.

b. . . . who Ruth will bring us home to __ at Christmas.

c. . . . if Ruth will bring us home to Mom at Christmas.

This example illustrates how self-paced reading can be used to assess the validity of different theories of language processing, and also highlights a key property of this method: When it comes to the interpretation of reading times, everything is relative. To know whether a word causes a slowdown in reading time, it needs to be compared to another word (e.g.,“us” in (1b) and (1c)). When designing self-paced reading studies, researchers need to ensure that they have the right kind of base-line/control conditions to compare with the experimental conditions.

Interestingly, recentﬁndings suggest that equating slower reading times with processing difﬁculty is not as straightforward as has often been assumed.

According to Hale (2003), a sudden drop in parsing uncertainty leads to a

processing slowdown because the system has further work to do to specify the representation. Thus, one might observe a reading time slowdown not because the comprehender is struggling with a particularly syntactically difﬁcult construction, but because the current word reduces uncertainty about the syntactic structure at hand (see also Levy2008on how existing results can be reanalyzed in terms of surprisal, i.e., how (un)predictable– and thus how informative – a particular word is in a given context). Many of the insights related to notions such as uncertainty, ambiguity, and information density– which have implications not only for ease of processing but also for patterns in language production– are closely tied to or generated from insights from corpus-based work (e.g., Aylett and Turk 2004;

Jaeger2010; Piantadosi, Tily, and Gibson2012).

Self-paced reading has been used to investigate a range of issues, especially in the syntactic domain, such as the processing of temporally ambiguous sentences (e.g., Garnsey et al. 1997), non-canonical word orders/scrambling (e.g., Kaan 2001; Kaiser and Trueswell2004), and unambiguous but structurally complex sentences (e.g., Gibson 1998; Grodner and Gibson 2005). Researchers have also used self-paced reading to probe the interpretation of pronouns and reﬂexives (e.g., Badecker and Straub2002; Dillon et al. 2009; He and Kaiser2012), as well as effects of syntactic priming (e.g., Traxler and Tooley2008). The widespread use of this method is at least partly due to the fact that, compared to various eye-tracking methods, self-paced reading is very inexpensive, and relatively easy to implement and analyze (see Baayen and Milin2010for a discussion of new data-analysis approaches). Self-paced reading is also highly portable– all one needs is a computer and a keyboard or, for more accurate timing, a button box. A button

In document Redes neuronales recurrentes: principios y aplicaciones (página 61-66)