El cuestionario a gestores de la comunicación online y/o community mana gers

La gestión de la comunicación online de los restaurantes españoles con tres Estrellas Michelín Una propuesta de método de estudio

3. El cuestionario a gestores de la comunicación online y/o community mana gers

This present thesis will deal with PVs using automatic and manual methods at the same

time, in order to answer Ball’s (1994:295) call for extracting data in both ways. PVs

have drawn many researchers’ attention in the field of natural language processing

owing to the difficulty of extraction (Baldwin & Villavicencio, 2002). In computational

studies, the targets are often termed VPCs (Verb-Particle Constructions) but not phrasal

verbs, although they are referred to as similar constructions. Baldwin and Villavicencio

(2002) point out that automatic extraction of VPCs is not easy, because the verb and the

139

simultaneously generate non-targets/noise when solely automatic extraction is adopted;

these difficulties will be discussed in the following sections. In order to avoid complete

reliance on automatic software which produces noise which may distort the results, a

human manual check of the data is required. Therefore, a semi-automatic procedure

that combines manual investigation with the help of computer programs will be

employed.

5.3.1.1 Criteria of selecting research targets

A more important issue is the identification of target PVs, i.e. what constructions

qualify for investigation. As seen in Chapter 2, there are many subtypes of verb-particle

constructions, and researchers have used different conditions to define them. They

consider different groups of these subtypes as PVs; some take a freer perspective and

allow more subtypes and some tend to restrict the definition of PVs.

In the first place, the criteria deciding which groups of these verb-particle

constructions are to be studied in this thesis have to be established. Deciding what

constructions are to be included in our research is not simple. Two questions need to be

considered. The first is the acceptance/rejection of constructions which contain a

140

In Chapter 2 we have seen that prepositional verbs and phrasal verbs are mostly

dissimilar in their characteristics apart from both being in the V+P form. Waibel

(2007:63)⁠ argues that ‘prepositional verbs’ should not be taken into account when

studying phrasal verbs, because learners face very different problems dealing with

these two groups. For phrasal verbs, they have difficulties in perceiving the

idiomaticity; for prepositional verbs, the problem lies in the correct selection of the

prepositions. It seems better not to cover prepositional verbs in our analysis. However,

we also witnessed that dividing prepositional verbs and phrasal verbs is not an easy task,

since ambiguous instances can always be found (see Chapter 2), therefore I have had to

resort to the use of POS-tagging (see the details of CLAWS below). As the extraction

procedure proceeds, prepositional verbs will be left out of the analysis naturally, thus

our discussion can focus on phrasal verbs which have an adverbial particle only.

Another group of constructions which may provoke disputes are the

phrasal-prepositional verbs, i.e. constructions of three elements like come up to, give

over to, do away with, etc. It can be argued that this group should be considered

separately from PVs because these three-word strings have some common properties,

such as always occurring in continuous sequence, and all the elements are compulsorily

141

as PVs. However, such a separation of PVs and phrasal-prepositional verbs is

meaningless if the phrasal-prepositional verbs are taken as extensions of PVs. Because

our method is to extract the two-word V+P constructions first, and then study other

co-occurring words in a span, the phrasal-prepositional verbs will be discussed as

variants of usage as well, and are regarded as extended phraseological units of the PV

they contained. The term ‘phrasal verbs’ in this thesis includes the traditional ‘phrasal

verbs’ and ‘phrasal-prepositional verbs’, but excludes ‘prepositional verbs’.

The second issue concerns the idiomaticity of phrasal verbs. Those verb-particle

combinations which are idiomatic or figurative are often accepted as phrasal verbs

without much controversy, but others which are transparent or literal are liable to

dispute. The complication of the idiomaticity of PVs has also been reviewed in Chapter

2. It is evident that problems will always arise if PVs are classified by their idiomaticity

degrees. PVs can be evaluated by how opaque they are, but they cannot be put into

groups, because the degrees are relative. Waibel (2007:63) points out that free

combinations/literal PVs and idiomatic/figurative PVs cannot easily be separated,

therefore she chooses not to divide them. I agree with this conclusion; thus such a

differentiation will not be made in this present study. As a result, the free combinations

142

the verb is often an action verb, e.g. When I heard that sound I was afraid, and then I

ran out) will be included in my data, because firstly they are not easily separated from

others, and secondly they constitute a considerable portion of the data, especially in the

learner corpus. Similar criteria of identification have also been accepted by Liu

(2011:663-664), who advocates that a simple syntactic criterion (i.e. a lexical verb with

one adverbial particle) works better than an indirect and complex semantic one (i.e.

new idiomatic meaning instead of the straightforward meaning of a verb and a particle).

To conclude, the PVs examined in this research are those which have adverbial

particles (thus excluding prepositions), and the issue of idiomaticity is not considered

(thus including free combinations).

The criteria for the definition of PVs will certainly affect the quantitative results

when the overall numbers of PVs are counted, as what was examined was based on the

research question regarding tokens and types of PVs. Excluding some verb-particle

combinations will not affect/impair the qualitative analyses, as these examine selected

instances in detail. The criteria will only be applied to take out targets which are not

143

5.3.1.2 Cleaning non-targets

After deciding the target items to be included, we come to the methods of extraction.

First, the error tags which were annotated in the original CLEC have to be removed.

This is achieved by running the software mentioned in Section 5.2.2. At this point, it

seems any of the two-word constructions V+P in the corpora are ready to be culled by

simply carrying out a search of the particles. For example, if we are to probe the Verb +

OUT construction, firstly all of the combinations of Verb + OUT have to be singled

out from CLEC and LOCNESS, in order to produce a frequency list of each verb type

of the Verb + OUT construction. To retrieve all of the instances of the Verb + OUT

construction, we can start by searching for the particle: querying OUT in the corpus.

The computer program Concord will return all of the cases of OUT, which are messy

and numerous. At this point, the duplicated instances were found and deleted with the

assistance of WordSmith4.0. The results were skimmed to eliminate obvious

non-targets. Unfortunately, the data retrieved in such an approach contains too many

cases which do not meet our definition of PVs. Thousands of examples can be returned

when one particular particle is searched. The numbers of the instances are too large to

perform a detailed analysis; the worst part is most of the cases are not the true PVs we

144

Clearly it is more effective to screen the data with the assistance of POS-tagging. A

fully automatic method which uses POS tags is adopted by Gardner and Davies

(2007:341); their corpus study of PVs uses a simple but functional definition of PVs.

They search all two-word verbs which are tagged as a lexical verb and followed by an

adverbial particle, either adjacent or not. They rely entirely on the validity of the corpus

and tags, and no classification tests are done. A study which surveys a bulk of data must

be extremely time-consuming, therefore an automatic approach is often adopted when

the sheer number of distributional results (i.e. raw frequencies) is considered.

However, such an approach may save time and effort at the expense of data

adequacy. The reliability of tagging PVs by an automatic tagger is doubted by Waibel

(2007:67). The two-word constructions collected may not all be PVs in the narrowest

sense: for example, the method could generate some accidental combinations of a verb

followed by an adverbial particle. At the same time, the results may suffer from some

data loss due to the error rates of the tagging system and computer programs. Moreover,

some instances may also not be culled from the corpus, particularly when dealing with

learner corpora, because presumably learners do not show as much consistency as

145

correct tagging. Therefore, the accuracy rate of the tagging may decrease and result in

incorrect figures.

Despite these shortcomings, not using an automatic approach will demand

unaffordable time and effort, rendering this research unachievable. The compromise is

to adopt automatic and manual approaches at the points where either of them presents a

clear advantage. It is hoped to adjust for the shortcomings of each approach by doing

this. As I am fully aware that an automatic approach will have pitfalls, in order to

decrease the effects of data inaccuracy and data loss, two measures are taken in this

study. First, more precision is demanded for the qualitative analysis, therefore the data

were not only filtered by the CLAWS annotation to get rid of the large amount of

non-PVs, but each example of the filtered results was also manually and carefully

probed. The problem that irrelevant non-PVs may not be completely removed can thus

be solved, but the loss of data remains. Second, also in order to prevent the possibility

of some unwanted instances still slipping through the tagging filter, when the individual

cases of PVs are put forward for qualitative study later, a number of syntactic tests will

146

5.3.1.3 The application of CLAWS

According to our definition of PVs in this thesis, the particle has to be an adverb. To

facilitate data capture, CLEC and LOCNESS were both POS-tagged by CLAWS,

which is a tagging system developed by Lancaster University from early the 1980s. In

the tagging guide of CLAWS, the tag RP is assigned to candidate constructions, which

are termed as prepositional adverbs/particles. (This category is listed under both the

sections of adverbs and prepositions.) In this tagging guideline, the author explains:

We assign the tag RP to a preposition-type word which has no complement. Typical uses of RP are in phrasal verb constructions, or when it functions as a place adjunct.

e.g.

there’s a lot of it <w RP> about these days Don't give <w RP> up on us just yet.

After this example, the author provides a full list of possible RP words: bout, about,

along, around, back, by, down, in, off, on, out, over, round, through, thru, to, under, up.

The author also points out that the most crucial problem in assigning the tag RP

correctly is the disambiguation of prepositions (tagged as II) and prepositional

adverbs/particles (tagged as RP). The demonstration of disambiguation is given by the

147 (a) She ran <w II>down the hill.

(b) She ran <w RP>down her best friends. In (a), down is a preposition, because:

(1) An adverb could be inserted before it: She ran quickly down the hill. (But not: *She ran viciously down her best friends.)

(2) It can be moved (somewhat awkwardly) to the front of a wh-word: This is the hill <w II>down which he ran.

<w II>Down which slopes do you like ski-ing? In (b), down is an adverbial particle because:

(1) It can be placed before or after the noun phrase acting as the object of the verb:

She ran her best friends <w RP>down. (But not: *She ran the hill down.) (2) If the noun phrase is replaced by a pronoun, the pronoun has to be placed in front of the particle:

She ran them <w RP>down. (= her best friends) (But not: *She ran down them.)

Similarly: The dentist took all my teeth <w RP>out. ~ The dentist took them out.

148

The above examples and explanation made by the CLAWS researchers clearly show

the principles that work behind the scene. By using such a tagger, it is hoped that PVs

can be extracted more efficiently. The POS tag RP separates the prepositional

adverbs/particles from the general prepositions (tagged as _II), thus the unwanted

instances in our data (e.g. keep_VV0 pushing_VVG it_PPH1 up_II the_AT hill_NN1...)

can be filtered out. The purpose of this step is to discriminate and discard the

non-particles in the constructions such as those mentioned above.

How CLAWS works to automatically tag a corpus can be understood by the tagging

process of BNC. The automatic tagging process of CLAWS runs through six stages:

tokenisation, initial tag assignment, tag selection (disambiguation), idiom-tagging,

template tagger and post-processing. The first stage, tokenisation, counts the word

tokens and orthographic sentences separated by spaces and sentence boundaries. Then

the second stage, initial tag assignment, assigns one or more tags to the words

according to a reference lexicon and chooses the most probable tag. The next stage of

disambiguation also adopts a probabilistic method, Viterbi alignment, to estimate the

likelihoods of tag sequences, thus disambiguating confusions. Some special cases such

as multi-words are better tagged as one unit, so some rules will be applied at the stage of

149

designed to supplement the insufficiency of the earlier stages. The final phase,

post-processing, aims to provide ambiguity tags which allow the presence of two

possible tags. Through these procedures, CLAWS is able to produce as accurate an

output as possible (for details, see the BNC2 POS-tagging Manual online).

However, it is hard for any tagger to achieve a zero percent error rate, and so it is

with CLAWS. It is claimed to have a 96-97% accuracy rate (see the website of

CLAWS). Unfortunately, this accuracy rate is measured for common words: it is not

clear how accurately CLAWS can deal with RP tags, especially in a learner corpus with

errors. Take the particle ON for example: if we simply searched the word ON, 6504

instances were retrieved from the original data of CLEC, but when tagged with

CLAWS, only 357 instances were retained. For the LOCNESS data, 1804 instances

were found from the raw data, but the CLAWS tagged data returned only 152 instances

(c.f. Chapter 8). Most of the instances eliminated from the non-tagged data are not V +

ON constructions (i.e. PVs): in other words, in these instances, the particles function as

prepositions but not adverbs. In order to estimate the accuracy tagging rate, 100 random

instances were taken from the initial untreated corpus and the numbers of accurately

tagged instances were counted. The result shows that approximately 90% of the two

150

With the help of CLAWS, the program is believed to capture most of the probable

candidates of phrasal verbs to its best ability. However, a small number of errors may

still occur in the automatically-filtered data. For example, the two instances below both

contain particles tagged as RP by CLAWS, but the particles do not form V+P

constructions with verbs. This kind of instances are removed as there is no verb

available.

[5-1] Then_RT the_AT elated_JJ man_NN1 march_NN1 in_II procession_NN1 with_IW nothing_PN1 on_RP ._.

[5-2] From_II then_RT on_RP ,_, I_PPIS1 became_VVD like_II to_II by_II air_NN1 ._.

Also sometimes a few general prepositions were not filtered out (see example [5-3]

below), and moreover, the RP tags may sometimes contain prepositional verbs, as

evidenced by the examples [5-4] and [5-5] below taken from LOCNESS which have

the prepositional verb ‘rely on’.

[5-3] They_PPHS2 play_VV0 on_RP the_AT street_NN1 with_IW all_DB kinds_NN2 of_IO colourful.

[5-4] I giving_VVG others_NN2 support_VV0 ,_, as_II31 well_II32 as_II33 a_AT1 shoulder_NN1 to_TO rely_VVI on_RP when_CS feeling_VVG weak_JJ .

[5-5] What_DDQ will_VM our_APPGE sociel_NN1 development_NN1 rely_VV0 on_RP if_CS the_AT market_NN1 is_VBZ full_JJ of_IO fake_JJ commodities_NN2 .

151

We can see the majority of prepositional verbs can be screened by CLAWS, but a

few of them may still escape from the filtering procedure. These cases can be removed

by applying the syntactic tests proposed by Darwin and Gray (1999:77-81) (see Section

2.5.2.3 for details of the tests), but I decided not to perform a comprehensive check on

all the PVs. The reasons are twofold: firstly it is time-consuming to apply these five

tests to each example of the PVs, and scrutinising these cases will put us off the track.

Moreover, the multi-senses a PV have may severely aggravate the situation. Secondly,

the numbers of occurrences are usually fairly small and will not significantly influence

the quantitative results. In consequence, they will be kept in the frequency lists

(Appendix A-C). However, the targets selected for detailed qualitative studies have to

be real phrasal verbs, thus I will apply the tests to these targets. Therefore only those

selected for further analysis will be examined by the five syntactic tests, in order to

confirm their authenticity as true PVs.

Let us now turn to another problem: learners’ errors. Their errors have two types: the

first is grammatical errors, which have a mild influence on our analysis; the second

involves learners' creation of illegal combinations. The recognisable errors were

152

CLEC, although the original text produced by the Chinese learners is “we will be fresh

up”. Other errors or misuses which have nothing to do with PVs were not identified at

this point, but will be isolated later only if they affect our analysis. Those cases where a

PV cannot be easily recognised were removed from the list.

So far, the data has been screened but the remainder still contains the second type of

error, illegitimate combinations created by the Chinese learners. For example, the

Chinese learners invented examples such as affect on, wheel on, jump down, hit down.

In document Congreso Internacional sobre redes sociales. COMUNICA2 (página 95-99)