• No se han encontrado resultados

A very common means of forming new words in Finnish is compounding. In this thesis, the term "compound" is used to refer to those words which are formed by concatenating two or more words without a space between them. Compounds are most often formed from nouns, but other parts of speech can also appear in compounds. By comparison, where Finnish uses compounds, English often uses MWEs, for example, eläin=laji28 ("animal species" or "species of animal") and laki=kirja ("statute book").

Hakulinen et al. (2004, p. 388) differentiate between two main types of compounds. The most common type, determinative compounds, consists of constituents which have a

semantically non-symmetrical relationship with each other. More precisely, the latter element

27 The 30th anniversary seminar of Lingsoft, a Finnish language technology company, was held 25

November 2016, and, in fact, Kimmo Koskenniemi mentioned in his presentation that there are in all a quadrillion different word forms in Finnish.

is dominant and more significant for the meaning, whereas the former element modifies the latter part. (Hakulinen et al. (2004, p. 396). By way of illustration, ruoka=lusikka ("table spoon") is a kind of spoon and kana=keitto ("chicken soup") is a type of soup. In the above examples, the first constituent of the compound is in the nominative, but other cases can appear as well, most often the genitive which is indicated by the ending -n. This is the case, for instance, in the compounds ruoan=laitto ("cooking"; literally "food’s=making"), koiran=ilma ("bad weather"; literally "dog’s=weather"), and taivaan=sininen ("sky-blue"; literally "sky’s=blue"). It can also be the case that compound constituents differ from the basic form and never appear in the language in isolation or in an inflected form. This

phenomenon is known as "casus componens" (Hakulinen et al., 2004, pp. 393–394, 402–404). By way of illustration, this is evident in the compounds hevos=jalostus ("horse breeding") and kolmi=loikka ("triple jump") in which the first constituent never appears in isolation or inflected. Similarly, in the compounds kuusi=vuotias ("six-year-old") and vihreä=silmäinen ("green-eyed"), the second constituent does not appear in isolation or inflected. Moreover, in the compounds kansallis=mielinen ("nationalistic"; literally "national=minded") and

seitsen=kertainen ("seven-fold"), neither the first nor the second constituent appears in isolation.

The above examples comprise determinative compounds which have meanings that are more or less the sum of the compound constituents. However, there are also such

determinative compounds, where the meanings cannot easily be deduced from the sum of the meanings of the compound constituents. Examples of such compounds are tieto=kone ("computer"; literally "knowledge=machine") and potku=housut ("playsuit" (for a baby); literally "kick=trousers"). Such items are referred to as "lexicalized compounds" in this thesis.

The second common compound type is that of copulative compounds. These consist of two or more compound constituents which are in a symmetrical relationship with each other.

In other words, they represent the same part of speech and their relationship is semantically additive. A hyphen is often used to differentiate between the constituents. (Hakulinen et al., 2004, p, 416) Examples of such compounds are the noun tutkija-opettaja ("researcher and teacher") and the adjective sini=vihreä ("blue and green"). Furthermore, numerals are also often written as one single word in Finnish and thus resemble copulative compounds, for instance, viisi=tuhatta=viisi=sataa=kuusi=kymmentä=kuusi ("five thousand five hundred sixty six") (Hakulinen et al., 2004, p, 388).

Compounding is indeed a very productive means of word formation. For example, it is possible to form names for various soups by combining the names of the main ingredients with the noun keitto ("soup"). Thus, we get kala=keitto ("fish soup"), parsa=keitto ("asparagus soup"), etc. Similarly, an abundance of names for injuries can be produced by combining the names of different body parts with the noun vamma ("injury"): nilkka=vamma ("ankle injury"), kallo=vamma ("skull injury"), etc. Nor does the evident wealth of

possibilities end here. By way of illustration, one can add the noun resepti ("recipe") at the end of all different types of soups resulting in a multitude of new nouns, such as

kala=keitto=resepti("fish soup recipe"). Or one can add the noun spesialisti ("specialist") at the end of the above compounds indicating different types of injuries, again resulting in many new combinations, such as kallo=vamma=spesialisti("skull injury specialist"). This type of productivity can eventually lead to very long words, such as kala=keitto=resepti=valikoima ("selection of fish soup recipes") and kallo=vamma=spesialisti=ryhmä ("group of skull injury specialists"). Complex compounds can even correspond to complete sentences. An example of such a case by Karlsson (1999, p. 242) is the compound

prahassa=käymättömyys=kompleksi which translates into English as "a complex about not having been to Prague". It is clearly evident that the number of possible compounds is

nnumerable, and it would thus not be sensible or even possible to try to include all of them in a dictionary as entries, but only the most commonly appearing ones are included.

As pointed out above, usually a compound functions as a single word in a clause. In an elliptic compound construction, however, one or more compound constituent, either at the beginning or at the end of the compound, can be omitted and replaced by a hyphen for abbreviation purposes in a list of compounds (Hakulinen, 2004, p. 420). Examples of such compounds are:

viini=pullo ja -lasi which is abbreviated from viini=pullo ja viini=lasi ("wine bottle and wine glass")

kana- liha- tai kasvis=keitto which is abbreviated from kana=keitto, liha=keitto

taikasvis=keitto ("chicken soup, meat soup, or vegetable soup")

Elliptic compound constructions are very seldom found in dictionaries, but they appear relatively frequently in running text.

Understanding the meaning of a compound which consists of many constituents and is not included in a dictionary is not usually very difficult for a human being, since he can

intuitively split such words and look for the meaning of the constituents separately, if need be. However, this task is far more complicated for a computer, since if a word is not included in a dictionary or a lexicon, it remains unidentified. Thus, where there is a need to analyze Finnish text automatically, it is necessary to develop mechanisms which help the program to identify and process all possible instances of Finnish compounds. One such mechanism is the

"compound engine" which we developed in the Benedict project. The compound engine will be described in section 3.3.2.