• No se han encontrado resultados

Frecuencia y perfil de los estudiantes con niveles excepcionales en las

SESIÓN 20: INTRÉPIDOS

7.3. RESULTADOS

7.3.5. Frecuencia y perfil de los estudiantes con niveles excepcionales en las

So far, the Sketch Engine has mainly been run to query the corpus for potential light verbs in order to gain a list of predicate nouns, which typically become their collocates. As the next step, the predicate nouns yielded by the light verb queries are queried separately as collocation bases of verb collocates. Isolated lemmatization errors were to observe. Some of them may well be systematic errors. To give an example: all words ending with –tecken, which are (sometimes incorrectly!) tagged as indefinite neuters in plural (while they mostly are indefinite neuters in singular) will be incorrectly lemmatized as –tecke due to the rule concerning plural neuters ending with –en. e script relates them to the declension pattern äpple, möte etc., in which –n is supposed to be chopped off in indefinite plural to obtain the correct lemma form. Not so few errors arise due to tagging errors as well, which the script cannot affect. Perhaps the most serious source of lemmatization errors is the fact that the rules are case-sensitive. Words starting with a capital letter are often lemmatized separately (although the outcome may be identical). Nevertheless, the 90,34% accuracy seems to be good enough, and no further alterations of the lemmatizing scripts are foreseen for this particular task to be performed by the Sketch Engine.

15.4.3

Computing Grammatical Relations

e Sketch Engine enables the user to write a ’grammar’ of predefined queries. e queries take the form of functions with a defined number of variables. ey could be very simplistically paraphrased as follows: “I am specifying the features of a token and label it as X. When I query the corpus for a token that matches the features, find and list its collocates, whose features I am specifying under the label(s) X (and possibly Z)”. For instance: the user wants to find typical direct objects of a verb. He defines the verb by means of the part-of-speech tagging. e label will make it clear that that particular token is going to be the one that will be typed in the query field when this query is performed. e user then provides the features of the direct object ; e.g. by part of speech and by its typical left or right distance from the verb to be queried. ese pre-defined functions are called gramrels (i.e. ‘grammatical relations’). ere are several types of relations that can be formulated by the gramrels³:

• *SYMMETRIC evaluates queries also with the labels ‘1’ and ‘2’ swapped. is directive is active up to the next gramrel line.

• *DUAL is similar to *SYMMETRIC but it affect gramrels. It defines two gram- rels from the same set of gramrel queries. Gramrel names are separated by a slash (/). All queries are evaluated for the first gramrel and then for the second gramrel with the ‘1’ and ‘2’ labels swapped.

184 CHAPTER 15. PREPARATORY WORK

• *UNARY says that the following gramrel is an unary relation. Only one label is used for unary gramrel queries.

• *TRINARY is used for trinary relations. ese are translated into regular binary relations with different names. A name of a trinary gramrel should contain ‘%s’ and the respective queries should contain a third label ‘3’. A value of the word sketch base attribute on the position labeled ‘3’ is then substituted for ‘%s’ in the gramrel name.

e definitions will be exemplified with a fragment of the gramrel for finding prepositional phrases that would typically modify a noun typed into the query line when searching the corpus. e first line of the example below specifies the type of the gramrel relation. e gramrel type determines the syntax of the gramrel. is particular gramrel is a TRINARY one.

*TRINARY

=noun_prep_noun_\%s

1:any_noun_nominative 3:any_prep [(tag=”DH.*”|tag=”DI.*”|\\ tag=poss_pro|tag=number|tag=any_adv|tag=any_noun_genitive|\\ tag=any_adj)]{0,3} 2:any_noun_nominative

e second line gives the name of this gramrel. e format of the name is oblig- atory for each respective gramrel type. e name of a

*TRINARY

gramrel must end with ‘%s’ e third line contains a regular expression. e labels 1:, 2:

and 3:

introduce the three variables in this function. is particular gramrel will be triggered by a search for a word that matches the definition of

any_noun_nominative. It will list:

• all nouns that typically act as modifiers of the noun typed into the query, which will be further sorted according to by which prepositions they are introduced • all nouns that typically govern the noun typed into the query, which will be

15.4. ADJUSTING THE WORD SKETCH ENGINE FOR PAROLE 185 e line actually presents the features of a potentially relevant sequence of tokens: a nominative (i.e. non-genitive) noun is followed by a preposition. e preposition can be followed by a determiner or a possessive pronoun or a numeral or an adverb or a noun in the genitive or an adjective within the interval of zero to three tokens. is interval must be again followed by a non-genitive noun. Hence, the regular expression captures all the following examples (the query is ‘besvär’, and some of the examples are made-up):

besvär med den italienska kommunismen besvär med sina barn besvär med dem som vistas där besvär iögon besvär med sociala relationer besvär med sitt sociala liv besvär med sitt dåligt opererade knä besvär med det dåligt opererade knäet till besvär till stort besvär till oerhört stort besvär e regular expression above is partly created using the part-of-speech tags from the PAROLE corpus, partly by macros. E.g., the elements

any_noun_nominative

is itself a macro defined before as

define(‘any_noun_nominative’, ‘”N\ldotsN@..”’). e string

‘”N\ldotsN@..”’)

is part of the actual POS-tag for non-genitive nouns and personal pronouns from the PAROLE corpus. Dots stand for ’any character’ on the given position.