CONCLUSIONES PARCIALES IGLESIA DE STES CREUS.

INDICE GENERAL RESUMEN iii

Capítulo 4. CONJUNTO ARQUITECTONICO DE STES CREUS ESTUDIO ACÚSTICO.

4.1. DESCRIPCIÓN GENERAL

4.2.4. CONCLUSIONES PARCIALES IGLESIA DE STES CREUS.

In addition to the POS tags and the named entity (NE) annotations of words that appear in the objective sentences, some other important textual patterns, which occur in these objective sentences, must be specified as well. To extract these textual patterns from the objectives text, a set of rules written in JAPE grammar has been created to generate the annotations that match these patterns in text. In GATE, JAPE grammar is processed by a transducer called the JAPE transducer to perform tasks like chunking, named entity recognition, and matching a simple text string. Therefore, the JAPE transducer processing resource has been utilised to create some useful annotations that find textual patterns in the corpus of objectives. Before illustrating these annotations, a brief description of the JAPE grammar is presented below.

The JAPE transducer (Cunningham et al., 2013) processing resource is used to run handcrafted rules of information extraction grammar for identifying patterns/regular expressions in the text processed by GATE. A JAPE grammar consists of a set of phases, where each phase is composed of a set of pattern rules. The JAPE rule has two sides, left hand side (LHS) and right hand side (RHS). The LHS of the rule includes an annotation pattern to be matched to the text annotated by GATE, while the RHS of the rule includes annotation manipulation statements that identify the action to be taken on the matched pattern/sequence of text. Accordingly, a rule first matches a pattern/sequence of text annotated with the annotation pattern specified in the LHS of the rule. Then, the matched part of text is allocated a label by the rule in the LHS. The

part of text matched on the LHS of the rule is referred on the RHS by using the label specified in the LHS.

For instance, consider the following example for the grammar of the annotation called ‘Product’ which has been generalised by applying the JAPE transducer processing resource on the text of objective sentences, where this annotation describes the type of the product (e.g. PC or mobile) that appears in the provided objective sentences.

Figure 4 The Created JAPE Grammar for the ‘Product’ Annotation

As shown in Figure 4, the LHS of the rule is the part before the “-->”, whilst the RHS of the rule is the subsequent part. In this grammar example, there is only one rule for matching the patterns, and the name of this rule is ‘ProductRule’. This rule matches the text annotated with the LHS annotation pattern, which is a ‘Lookup’ annotation with a ‘majorType’ feature (discussed below in the next paragraph) of ‘prod’ (i.e. here the major type feature entitled ‘prod’). A list of product types called ‘product’ has been created and added to the gazetteer lists for finding occurrences of specific strings in the corpus text. This list includes specific keywords (e.g. ‘PC’, ‘mobile’, ‘PCs’, ‘mobiles’, ‘personal computer’, ‘personal computers’). The rule extracts the ‘Product’ annotation by matching the strings in text with the keywords defined in the created gazetteer list of product types. The matched text is assigned to a label called

‘matchText’ which is specified in the LHS of the rule. Then, the matched text on the LHS can be referred to on the RHS by utilising the label given in the LHS (‘matchText’). As a result, the matched text is annotated with the annotation called ‘Product’ which is specified in the RHS of the created rule.

In general, each JAPE grammar starts by specifying a phase name to a grammar. In the above example, the phase name is ‘Product1’ (i.e. Phase: ‘Product1’). In the ‘Input’ line, the grammar sets the annotation types (e.g. Token, Lookup, SpaceToken) which will be used when attempting a match for patterns. For instance, the annotation type for the grammar presented in the above figure is a ‘Lookup’ annotation (i.e. ‘Input: Lookup’). Generally, the major type and minor type features are specified when the Lookup annotation is used, where these features give access to all items stored in particular gazetteer lists or combinations of lists. The major type feature is specified in the JAPE grammar to match major information about patterns, while minor type feature is specified in the grammar to identify only optional information about specific patterns.

Moreover, there are different options that can be set at the start of each grammar for matching patterns in text, such as the control and debug options. One of these options must be set in the ‘Option’ line of the JAPE grammar. The control option specifies the method of rule matching and it has five different styles: ‘Brill’, ‘All’, ‘First’, ‘Once’ and ‘Appelt’. Only one of these control styles must be specified at the beginning of each JAPE grammar. If no control style is assigned to the grammar, the default is Brill.

 The ‘Brill’ style indicates that if there is more than one rule that matches the same part of the text, they are all executed without any need for a priority ordering of the rules during the execution. This style will execute all matching rules starting from a specified position in the text. The process of rule matching in this style will continue from a position in text where the longest match of a sequence of text ends.

 The ‘All’ style executes all matching rules that match the same segment of text. However, in this style the matching will advance and carry on from the next offset to the current one.

 In the ‘First’ style, a rule executes for the first match that is detected.  In the ‘Once’ style, the whole JAPE phase of the grammar finishes after

the first match, when a rule has fired.

 The ‘Applet’ style means that if there is more than one rule matching the same segment of text, only one rule can be executed for this part of text, depending on a set of priority rules.

In the ‘Applet’ style, the rule priority works in the following manner:

1. Length of the matching rule. A rule that matches the longest segment of text is executed.

2. Associating optional priority declaration. If a priority declaration is assigned to each of the matching rules, then the rule that has the highest number of priority is executed.

3. Rules ordering. If there is more than one rule has the same value of priority, the one stated first in the grammar is executed.

In the above grammar example, the option of matching patterns in text is set to ‘control’ (i.e. Options: control), while the method of rule matching is set to the ‘Appelt’ mode (i.e. Options: control = applet).

The JAPE grammar provides some regular expression operators (e.g. *, +, ?, ∣) which appear in the LHS of the rule. The ‘*’ operator is specified in the JAPE grammar for matching zero or more patterns, while the ‘+’ operator is used in the grammar for matching at least one or more patterns. However, the ‘?’ operator indicates that the matching of a pattern will be optional, while the ‘∣’ (OR) operator is used to indicate alternatives.

The JAPE transducer has also been used to generalise the annotation called ‘PrepositionalPhrase’ from the corpus of objectives text. Figure 5 presents the created JAPE grammar for detecting the ‘PrepositionalPhrase’ annotation.

Figure 5 The Created JAPE Grammar for the ‘PrepositionalPhrase’ Annotation

In the grammar presented in the above figure, the rule called

‘ProportionalPhraseRule’ is created to match a sequence of text that consists of a set of tokens based on their category features (POS tag features). More specifically, the created rule extracts propositional phrases from the text in the corpus such as an ‘IN’ (preposition) token followed by a ‘JJ’ (adjective) token (e.g. as minimum), an ‘IN’ (preposition) token followed by a ‘JJS’ (superlative adjective) token (e.g. at least), or an ‘IN’ (preposition) token followed by another ‘IN’ (preposition) token and a ‘JJS’ (superlative adjective) token (e.g. by at least).

As mentioned before, the ANNIE NE transducer is applied on the corpus of objectives text and generates the ‘Date’ annotation which matches any textual pattern that represents a temporal expression (e.g. ‘2008’, ‘9/8/2009’, ‘July 2009’, ‘June of next year’, ‘October next year’, ‘coming year’, ‘coming two years’, ‘next year’, ‘next two years’, ‘second quarter of fiscal year 2009’ etc). However, there are some textual patterns of temporal expressions (e.g. ‘following year’, ‘upcoming year’, ‘following two years’, ‘upcoming two years’) which also appear in the objective sentences and indicate dates but have not been extracted by the ‘Date’ annotation. Thus, a JAPE grammar rule has been created to extract these patterns from the objectives.

Figure 6 presents the created grammar for matching some of the textual patterns of temporal expressions (e.g. ‘following year’, ‘upcoming year’, ‘following two years’, ‘upcoming two years’) that represent dates. The grammar matches a string token in the corpus text followed by one or two ‘Lookup’ annotations and annotates them with a ‘Date’ annotation, where the matched sequence of text should indicate a date.

Figure 6 The Created JAPE Grammar for Matching Some Patterns of Temporal Expressions in the Objectives

As illustrated in the above figure, the rule called ‘DateRule’ is created to match a ‘Token’ annotation that covers the string ‘following’ followed by a ‘Lookup’ annotation with a ‘majorType’ feature of ‘date_unit’, a ‘Token’ annotation that matches the string ‘upcoming’ followed by a ‘Lookup’ annotation with a ‘majorType’ feature of ‘date_unit’, a ‘Token’ annotation that covers the string ‘following’ followed by a ‘Lookup’ annotation with a ‘majorType’ feature of ‘number’ and a ‘Lookup’ annotation with a ‘majorType’ feature of ‘date_unit’, or a ‘Token’ annotation that covers the string ‘upcoming’ followed by a ‘Lookup’ annotation with a ‘majorType’ feature of ‘number’ and a ‘Lookup’ annotation with a ‘majorType’ feature of ‘date_unit’.

In document Estudio acústico de los monasterios cistercienses masculinos del camp de Tarragona (página 120-131)