1.11 Grammar and lexicon
A language is made up of two independent but interlocking parts—grammar and lexicon. The grammar is a little like a city centre—well-traversed thor- oughfares, feeding into each other, replete with signs and signals and short cuts. The lexicon is somewhat akin to a parking lot—full of vehicles which will leave as needed, to engage in traffic within the city.
The wherewithal of grammar consists in small systems, such as gender, case, tense, and types of complement clause. Each system is closed; that is, new members may not (save in exceptional circumstances) be added. The terms in a system may be exhaustively listed, each being fully defined by the exclusion of all others. In the three-term number system of Kayardild (see Table1.1 in §1.4), ‘dual’ can be specified as ‘neither singular nor plural’. English has seven personal pronouns. Suppose that I am thinking of a pronoun in English. It is not (quoting subject forms) we or he, she or they, it or I . What is it? It must be the second person pronoun, you, which can be defined as being complementary to the other six.
One grammatical system may depend on others. Gender is found only for 3rd person singular in English pronouns. For Tucano (see §1.5), there is a system of five evidentiality choices in past tense, just three (omitting assumed and reported) in the present, and no evidentiality specification at all in future tense. In Amele (Gum family, Papuan region), there are three past tenses and two futures within positive polarity, but just one of each for a negative clause. (See §3.19.)
Every grammatical system has limited size so that its terms can be—and should be—exhaustively listed within a statement of the grammar: all the pronouns, prepositions, articles, interrogatives, noun classes, every type of complement clause, each of the possessive constructions, ways of forming a causative, and so on.
Whereas a grammar involves closed systems, a lexicon consists of open classes—typically noun, verb, and adjective. We found it possible to specify the pronoun you by saying that it was not any of the other terms from the English pronominal system (not I , he, she, it, we, or they). This would not be possible for a lexeme. (I’m thinking of a noun and it is not aardvark, abacus, acacia, adenoids, . . . What is it? Can’t be done.) Lexical classes typically have large membership on which no upper limit may be placed. While I’m attempting to list all the nouns I know, new ones will be coming into being—created from within the language or borrowed from other tongues. The task would have no end.
A grammatical form will be fully specified within the grammar (for exam- ple, she is3rd person singular feminine subject pronoun), a lexeme partially
so. Dog, in English, is a countable noun (it can take plural -s ). But exactly the same grammatical profile is provided for cat and horse and crocodile. It is the role of lexical entries to distinguish between these various count nouns. And similarly for adjectives such as red, blue, and yellow, for verbs such as ask, request, and demand, for adverbs such as mainly, mostly, and chiefly. (Some remarks on the ideal nature of a lexicon are in Chapter 8 below.)
The description of a language has two parts. The grammar deals— in as much detail as is considered necessary—with the underlying cate- gories and structure (with a chapter on their phonological realizations). The lexicon, or dictionary, lists as many as possible of the lexical forms, which can slot into grammatical constructions. Grammar and lexicon are essentially separate components, with considerable cross-referencing between them.
A dictionary exists to deal with lexemes, those forms which are not uniquely specified by the grammar and require definitions to tell them apart. It is neither necessary nor appropriate to include in a dictionary grammatical forms, which are uniquely defined within the grammar. Some of the earliest dictionaries followed this practice, listing just lexemes and excluding fully grammatical items such as what or that or the or to. Then it became the custom to list in a dictionary every word, even those which are fully specified within a grammar of the language. Nowadays some dictionaries even include affixes, like un- (as in untie) and -th (as in truth). They list grammatical forms, but without providing information concerning the composition of the grammatical system to which the item belongs, although it is the contrast with other terms in its system which characterizes the grammatical form’s function and meaning.
Sadly, dictionary and grammar—at least for the major languages—tend to be compiled by separate groups of scholars, with different aims and methods. Ideally, dictionary and grammar should be produced in concert, with ample cross-referencing. If it is considered useful to include grammatical forms in the overall alphabetical list, all that is required is a reference to those sections of the grammar in which they are fully discussed.
Just as grammatical forms do not require ‘definitions’ in a dictionary, so a clear division should be made between lexemes and grammatical elements in, say, parsing a sentence. An unfortunate trend in modern studies of a language like English is to treat each orthographic word equally, taking no account of whether it is a lexeme or a fully grammatical unit.
For example, the phrase to the fat man (from the sentence He gave an apple to the fat man) is typically assigned a ‘tree structure’ something like:
1.11 grammar and lexicon 49 (32)
to the fat man
What we have here is two lexemes, fat and man, marked by the definite article, the, and by the preposition to. Although written as separate words, the two fully grammatical items are pronounced as proclitics (‘=’ indicates a clitic boundary):
(33) /t@=D@=f ´æt m ´æn/
The is not a constituent of the phrase; it is a grammatical form stating that fat man has definite reference. In similar fashion, to is a marker of the function of the fat man in the clause, indicating that it refers to the recipient of an act of giving.
In some languages, definiteness is shown by an affix, rather than by a clitic (written as a separate word) as in English. It would then be clear that the definite marker is not a lexical-type constituent of the phrase, in the way that fat and man are, but the realization of a grammatical category.
To the fat man is a noun phrase, marked by preposition to (which in this context indicates benefactive function). Some linguists call to the fat man a ‘prepositional phrase’, with a binary split into constituents to and the fat man (and some go further, and say that to is the ‘head’ of this ‘preposi- tional phrase’). But in Latin, for instance, ‘to the fat man’ would be vir-¯o ob¯es-¯o, where the -¯o ending on both vir- ‘man’ and ob¯es- ‘fat’ marks masculine singular dative (Latin has no grammatical category of articles). One surely wouldn’t call vir-¯o ob¯es-¯o a ‘case phrase’ (although this would be the logical extension of calling to the fat man a ‘prepositional phrase’). And one surely wouldn’t pick out the repeated ending -¯o as an immediate constituent of vir-¯o ob¯es-¯o (and as the head of this phrase!); neither should to in English be treated as a lexical-type constituent. The English phrase is most appropriately represented by something like:
(34) benefactive relator(to)[fat man]definite(the) There is further discussion of prepositions in §5.4 and §5.6.
Similar comments apply to clause linkers such as and. Some linguists repre- sent the phrase cats and dogs as having three constituents (treating and on a par with cats and dogs). Others prefer binary splits and require two constituents; there is then a problem—is it [cats] [and dogs] or [cats and] [dogs]? What we
have here, in fact, is two lexemes, linked by and (which is not a constituent, or part of a constituent, but rather a grammatical marker).
Some concepts are always dealt with through the lexicon rather than in the grammar, in every language—the contrasts between cat and dog, between laugh and cry, between white and red, and so on. Other types of information are always the province of grammar—marking a sentence as interrogative or imperative, showing what is subject and what is object, and other things of this nature.
But there are concepts which may be coded within a grammar in one language but are shown only by lexemes in another. The Australian language Yidiñ—like many languages from Africa, and elsewhere—has verbal suffixes -Nali- ‘go and do’ and -Nada- ‘come and do’. They can be illustrated with verb wuna- ‘sleep’ and imperative inflection -n in:
(35) wuna-n wuna-Nali-n wuna-Nada-n
‘sleep!’ ‘go and sleep!’ ‘come and sleep!’
That is, Yidiñ does by choice from a grammatical system what English requires lexemes, go and come, to achieve. (The ‘go and do’—or ‘do while going’—suffix is further illustrated in §4.9.)
Cross-linguistically, a number of what can be called ‘secondary concepts’ may be recognized. These are items which are coded within the grammar in some languages but dealt with through lexemes in others. They include ‘try’, ‘start’, ‘continue’, ‘cease’, ‘finish’. English, a language with a rather small set of suffixes, has all of these as verbs. It was shown in (d) of §1.10 that in (22) John began to paint the wall, begin is the syntactic main verb, taking a complement clause in O function; but semantically the complement clause verb paint is the focus of attention, with begin providing ancillary information about the activity. As mentioned in §1.10, the secondary concept ‘begin’ is expressed through a verbal suffix -yarra- in Dyirbal; added to verb baNga- ‘paint’, we get baNga-yarra- ‘begin to paint’.
Other secondary concepts include ‘want’ and ‘make’, which are again dealt with through lexemes in English but by grammatical affixes in many lan- guages. For example, Luiseño, a Uto-Aztecan language, has a desiderative suffix -viˇcu- which can be added to lexical root Née- ‘leave’, giving Née-viˇcu- ‘want to leave’. Causative suffix -ni- may also be used with this verb, yielding Née-ni- ‘make (someone) leave’. A verb may accept both suffixes—Née-viˇcu- ni ‘make (someone) want to leave’. And, indeed, there can be two instances of the desiderative flanking the causative suffix, in Née-viˇcu-ni-viˇcu- ‘want to make (someone) want to leave’.
1.11 grammar and lexicon 51 As a language develops over time, new elements will enter the grammar, often developments from lexemes. Yidiñ has lexemes gali- ‘go’ and gada- ‘come’. They would have been used with another verb, as in wuna-n gali-n (‘sleep-imperative go-imperative’) ‘go and sleep’. The two words merged, giving wunaNgali-n, which reduced to the present-day wuna-Nali-n, where -Nali- is now one of the two terms in a grammatical system of derivational suffixes to a verb. Such grammaticalization of lexemes is a pervasive tendency as a language evolves over time. A lexical item, from an open class, may develop into a grammatical element; it is likely gradually to lose the old lexical meaning, and may take on a wholly relational role.
The noun side in English has a long history, originally meaning ‘the long part of a thing’. In Middle English there developed beside (a single word) as a preposition within the grammar, later becoming besides. The original meaning ‘by the side of ’ took on a more abstract sense ‘in addition to’, as in Besides a gun, a soldier should also carry chocolate. Then besides also took on the role of clause linker (similar to moreover and however), as in I haven’t the time to see that film and, besides, I don’t like sloppy love stories.
The fact of lexemes being grammaticalized is indisputable. Some tense affixes developed from time lexemes, and case markers from things like body- part terms. However, linguists argue concerning the margin of what should be included under this label.
In order to produce an acceptable sentence, one must make a choice from a number of obligatory grammatical systems, depending on the language. The organization of a grammar determines what one must say. As mentioned in §1.5, in Tucano a speaker is obliged to state the evidence on which a statement is based by choosing one of the five terms from the evidentiality system (seen, heard, inferred, assumed, reported). In the Western Torres Strait language, a statement about the future must specify whether the time reference is to ‘later today’, ‘tomorrow’, or ‘beyond tomorrow’. In other languages, such types of information may be provided through lexemes, but as an optional matter.
To some extent, the types of categories in a grammar both reflect and motivate the way in which its speakers view the world about them. But they do not limit this. Estonian has no grammatical category of gender, yet speakers of Estonian are fully aware of differences between the sexes, and can—if they wish—add a noun ‘man’ or ‘woman’ to a basic sentence such as tema sööb ‘he/she eats’ in order to specify the physical gender of the person referred to. We can now briefly examine the recurrent classes of lexemes.