The study of MWS has cut across the fields of psycholinguistics, corpus linguistics, and language education, and the many terms have been used in research around the formulaicity of language. The variation in terminology also reflects the differences of researchers’ focus and different aspects of the phenomena investigated (Myles & Cordier, 2016). For example, the
term chunk is commonly used in psycholinguistic research, whereas the term clusters is widely preferred in corpus linguistics (Myles & Cordier, 2016). It is also more problematic to see that the same term may be used by different researchers to refer to constructs that they might overlap but are essentially different. The term formulaic sequences coined by Wray (2002), has been widely adopted and used by various researchers. On the one hand, some researchers adopted term formulaic sequences as an umbrella term (Weinert, 2010; Wood, 2015). On the other, some researchers used the term to refer to the idioms and idiomatic expressions that are assumed to be processed holistically (e.g. Underwood, Schmitt & Galpin, 2004). As Wray (2012) also noted, this terminological confusion is potentially problematic, especially when some claims are made about formulaic sequences in general, but the focus of the investigation was only one type of formulaic sequence. To avoid any terminological confusion, I used multi- word sequences (MWS) and chunks as an umbrella terms, which were also used by various studies with psycholinguistic focus (McCauley & Christiansen, 2017; Christiansen & Arnon, 2017). However, the current study specifically focuses on collocations and the findings of both the corpus study (see chapter 4), and the experimental study (see chapter 6) concern the collocations rather than the other type of MWS. The next section introduces the target construction of this study (see 2.2.1).
2.2.1 Target construction.
Collocations are a prominent type of MWS, that have received a special attention in both corpus-based language learning and psycholinguistic studies in the last decade. This study looks at the processing of high- and low-frequency collocations in Turkish and English, and by L1 and L2 speakers. One of the main reasons why collocations have received a particular attention is that they are lexical patterns that are shaped by more conventions within the language rather than grammatical or semantic restrictions (Wolter & Yamashita, 2015). For
61
example, it would be natural to say strong tea, but any experienced speaker would notice the comparative novelty of dark tea in English. However, in Turkish for example, the exact opposite would be true. Another reason why they have received special attention is that efficient language processing and use are, to an important extent, contingent upon the formation of systematic and meaningful links between words in the lexicon, and the knowledge of collocations help L2 learners to develop an efficient lexical network (see part 2 chapter for a review of collocational properties). In addition, collocational knowledge therefore is considered as an important component of one’s overall language competence, by many researchers (e.g. Bestgen & Granger, 2014; Ellis et al. 2008; González Fernandez & Schmitt, 2015; Hoey, 2005; Pawley & Syder, 1983; Sinclair, 1991).
Different approaches to operationalising the MWS, specifically the collocations have been noted in the literature (McEnery & Hardie, 2011, p. 122-133). The two most widely known approaches are the “phraseological”, and “frequency-based” approaches. The phraseological approach focuses on establishing the semantic relationship between two or more words, and the degree of non-compositionality of their meaning (Nesselhauf, 2004, Howarth, 1998). In this approach, the collocations are not simply free combinations of semantically transparent words, but they follow selectional restrictions (e.g. ‘slash’ one’s wrist rather than ‘cut’ one’s wrist). The empirical or frequency-based approach draws on quantitative evidence of word co- occurrence in corpora (Evert, 2008; Gablasova et al. 2017; Hoey, 1991; Gries, 2013; Sinclair, 1991) In traditional corpus linguistics, collocations have been described as the relationship between two words which occur near each other (see Sinclair 1991 for example). As the development of new generation corpus tools (e.g CQPweb, #LancsBox), this approach involved more sophisticated statistical measures, which are known as association measures (AMs), to identify the psychological association between words, which is evidenced by their
co-occurrence in corpora (see section 3.1 for a detailed discussion about association measures). Not surprisingly, there are advantages and disadvantages to both approaches. It is noteworthy that in the phraseological approach, the operationalisation of collocations - whether they are free or restricted combinations of words - is quite problematic because the criteria rely heavily on intuition. However, semantic transparency is a variable, which is likely to affect the processing of collocations. We cannot expect that free combinations such as pay a bill, which are used in literal meanings would be processed in exactly the same way as more restricted word combinations such as pay a visit, in which one of the components are used in a figurative sense. Therefore, it would be very interesting to investigate the effects of semantic transparency on collocational processing. For the present study (see section 2.3), however, it was necessary to adopt an approach that was rooted in frequency-based tradition, because this study is centrally concerned with the extent to which collocational frequency and single word frequency counts affect the mental processing of collocations in typologically different languages: English and Turkish and for L1 and L2 speakers.