DOCENTE ORIENTADOR ESCOLAR IDENTIFICACIÓN DEL CARGO: DOCENTE ORIENTADOR
IDENTIFICACIÓN DEL CARGO DOCENTE ORIENTADOR
The emergence of corpora is extremely valuable both for sociolinguistics (Baker, 2010a) and for linguistic typology (Palfreyman, Sagara and Zeshan, in press). A language corpus is a body of text carefully sampled to be
maximally representative of the variety under examination, that is, which provides us with as accurate a picture as possible of the tendencies of that variety, including their proportions (McEnery & Wilson, 2001:30).48
In addition, a corpus must be of a finite size, and machine-readable (McEnery & Wilson, 2001), which allows for the quick and accurate identification of frequencies and of common and more unusual patterns in the data (Baker, 2010a:9). Sign language corpus linguistics aims
to empirically ground sign language description in usage in order to test previous language- specific or typological claims and generate new observations (Johnston 2014:2).
Corpus linguistics is itself a relatively recent branch of linguistics, since its feasibility is dependent upon the personal computers that became widely available in the 1990s (Baker, 2010a:5). The corpus linguistics branch of sign language studies is even greener, though blossoming as scholars sample its fruits. The corpus-based documentation of sign language has become possible in part due to major technological developments on the representation and maintenance of digital video data (Schembri et al., 2013). A key example is the availability of corpus tools through ELAN software, described in section 3.5.
Written language corpora may be very large indeed, comprising over 100 million words (Baker, 2010a). Comparatively, sign language corpus studies
tend to be based on much smaller datasets […] due to technological limitations and the lack of (for nearly all signed languages) a comprehensive lexical database (Schembri et al., 2013:8).
Currently, the largest sign language corpus is that of NGT (Crasborn, Zwitserlood & Ros, 2008) which has around 143,500 tokens. Others include the Auslan corpus (Johnston, 2008) and the BSL corpus (Schembri et al., 2011), and corpora for many other sign languages are now being compiled (see the Sign Linguistics Corpora Network at www.ru.nl/slcn for details). Once enough work has been conducted on different corpora, it will be possible to compare the frequency of features cross-linguistically. A very preliminary example of such work is presented in Table 5.1 for completive markers, although more work is needed on individual corpora and on calibrating
52
annotation practices across teams of corpus linguists before detailed and accurate comparisons can be made.
Palfreyman, Sagara and Zeshan (in press) report that the vast majority of corpora are for urban sign languages in Western countries, with only a few in non-Western areas, including de Vos (2012a). This means that the corpus created for this investigation is an important resource for sign linguistic typology, as well as for the study of Indonesian varieties. While the precise meaning of terms such as ‘corpus-based’ and ‘corpus-driven’ are debated in the literature (Baker, 2010a), the current investigation is perhaps better described as ‘corpus-assisted’ (a term used by Partington, 2006). Although there is extensive quantitative and qualitative analysis of corpus data (see sections 3.1 and 3.3), other forms of data are used alongside this, as described in 3.1.2. The existence of an annotated corpus allows for several kinds of analysis, such as concordancing, collocations, keywords and dispersion (Baker, 2010a). These analyses will undoubtedly be important in future, but the use of the corpus is restricted here to answering the research questions stated in section 1.6.49 This entails looking at frequencies and, more generally, at which variants
occur for the target variables.
It is important to be aware of the limitations that a corpus has. In particular, complete paradigms of target structures do not always occur, even in large corpora of spontaneous data (Palfreyman, forthcoming). As Johnston (2014b:2) notes:
a corpus can only disprove claims that a phenomenon is categorical, obligatory or even typical; it cannot prove that some phenomenon does not or cannot occur, if such a claim were to have been made.
Additionally, while the analysis of empirical data from real life language use is essential for sociolinguistic stories (3.1.1), target structures in a sign language corpus are rarely straightforward, and the multiplicity of indeterminate contexts may be difficult to analyse (Johnston, 2014a). This is related to another point: the analysis of some contexts in a corpus may be complex enough to require introspection. This becomes especially relevant to the current investigation when dealing with the sub-functions that forms may express, and in section 5.6 it is noted that corpus methods must be supplemented by introspective judgements from deaf Indonesians who are proficient in the varieties in question before the analysis of completives can progress much further. That said, Johnston (2014b:23) suggests that the sociolinguistic situation of signers provides another reason to prioritise sign language corpora. Spoken languages tend to have ‘idealised native speakers’ who can provide judgements that reflect their linguistic competence with some reliability. However, the unique transmission patterns of sign languages,
49 Indeed, a keyword analysis of datasets from the two urban varieties currently included in the Corpus of
Indonesian Sign Language Varieties, to examine similarities between their relative frequencies, would be highly informative.
53
and the fact that the vast majority of signers were not exposed to the language from birth (1.3), may call into question the advisability of analyses that are based heavily on introspection alone. The insights afforded by corpora may therefore be even more crucial than for spoken languages.