By nature the study is holistic, aiming at a wide-ranging perspective within one specific problem area. As already stated, at the general level the study is about translatability and equivalence in the context of multilingual and multicultural thesauri.
When studying translatability, there has to be a starting point – a translation unit with which to start.
In order to have, on the one hand, a heterogeneous sample from the viewpoint of equivalence problems and translatability, on the other hand, a homogeneous sample from the viewpoint of the sociological context, the selection criterion is a theme. Within this group are presumed to be found several different translation problems, which occur on different levels – on the concept, term and/or on the indexing term level. As a theme family roles have been selected, which seem to represent gender related terms and more precisely, terms related to families and parenthood and also to gender in working life.
Accordingly to the idea of theoretical sampling five indexing terms were selected for further examination: family roles, breadwinners, heads of household, homemakers, and housewives.
The above terms represent several different translation problems from the viewpoint of a multilingual thesaurus constructor. They are used as a starting point when examining the Finnish and British indexing practices and several thesauri. Translation problems are expected to occur because the division into different roles seems foreign and gendered from the Finnish aspect. The case is considered to represent a typical translation problem within abstract social science discourse.
In chapter 4.2 The social background is discussed a part of the sociological background of the above mentioned terms from the point of view of gendered
participation in labour life and on paternity leave modes. In order to understand the differences in indexing practices it is essential to be aware of the sociological context (Finnish and British practices) of the studied terms. In accordance to the research questions the emphasis is on Finnish practices and therefore it is these that are stressed.
6.3.2 Thesauri
As a context and in order to achieve a broader perspective a total of nine thesauri have been selected. The aim is to find out the similarities and differences in them - using as a sample the theme described in the previous sub-chapter 6.3.1.
When selecting thesauri the aim was to find both general and specific as well as commonly used indexing and information seeking tools, which are available online. In accordance with the aim of the study – to study translatability and equivalence into Finnish - the selected thesauri are mainly focused on social sciences and English. All are available online, via the Internet. Most are designed for the use of a specific database and collection, some for a much wider audience. All are well known in the Finnish university library context.
The nine thesauri used to examine and compare the representation of the studied terms (case family roles) are:
• multilingual UNESCO Thesaurus, Eurovoc and ELSST
• monolingual HASSET (Br), SOSIG (Br), CSA Thesaurus of Sociological Indexing Terms (En-Am/In), ICPSR Subject Thesaurus (En-In), ERIC (En-Am) and YSA (Fi).
Accordingly to the research design more emphasis is given to YSA. The thesauri studied will be illustrated, in the following section, in more detailed, such as, for example, the content, the purpose, the constructor.
The empirical terminological case is supposed to represent a problematic case for the informants. In its evaluation, tools adopted from translation science are used. Their application calls for special care, since thesauri as a text type are very unique and different from that which is traditionally considered a translation object when discussing equivalence matters within translation science. Although the typologies used are not directly transferable to the new area of application, multilingual thesaurus construction, they are considered to provide the necessary analytical tools and perspectives for the analysis, which cannot be found in traditional LIS literature. Finally, it is important to keep in mind that translation units do not only represent different languages and cultures, but also different discourses and sub-cultures (see chapter 3.4 Translatability and equivalence).
YSA
The YSA Thesaurus (Yleinen suomalainen asiasanasto) is a general thesaurus in Finnish and it covers all fields of research. The Thesaurus is maintained by the National
Bibliographic Services. YSA has been used for indexing Finnish publications since 1987 in public and scientific libraries and data archives. (Helsingin yliopiston kirjasto 1999)
YSA includes approximately 14 000 preferred terms and 3 000 non-preferred terms. It has been created following the SFS 5471 standard concerning construction and maintenance of Finnish language thesaurus (Suomenkielisen tesauruksen laatimis- ja ylläpito-ohjeet). (Ibid.)
YSA is meant for the indexing and information retrieval of books, articles, electronic material and other material types. The vocabulary is stated to help information recorders and seekers to use a shared language. Its purpose is also to be a general source vocabulary when developing special vocabularies. (Ibid.)
YSA is included in VESA – Verkkosanasto (VESA-Webbthesaurus), which also includes special thesaurus for music and Swedish translation (Allärs thesaurus). (Ibid.) In the study YSA is used online.
HASSET
The UK Data Archive for use with its retrieval system BIRON has developed a Humanities And Social Science Electronic Thesaurus, HASSET. (BIRON, Bibliographic Information Retrieval Online, is a WWW interface providing access to a complete source of information about studies in the UK Data Archive's collection.)
The purpose of HASSET (see UKDA 2002) is in accordance to general aims of thesauri (cf. chapter 3.3).
HASSET includes over 4,000 preferred terms, 2,500 non-preferred terms, 260 standalone terms and 330 hierarchies, and it is based on the UNESCO thesaurus (ibid.).
It is also broadly used outside the UKDA – e.g. material used in the SOSIG’s General Social Science thesaurus50 is derived from HASSET.
ELSST
Language independent metadata browsing (LIMBER) of several European social science data-archives use the multi-lingual ELSST (European Language Social Science Thesaurus), derived and translated from the current UKDA HASSET. The aim is to reduce the present HASSET hierarchies and remove all cultural and institutional specificities. In addition, new areas such as methodology will be added. (Miller &
Matthews 2001)
“--- The resulting broad-based social science thesaurus will be suitable for use by any resource in the social science domain. Due to time limitations, a target of 1500 preferred terms from a minimum of 20 hierarchies has been set. The thesaurus will also include all synonyms to these terms and all top terms of hierarchies in the existing
50 See URL: <http://sosig.ac.uk/help/thesaurus.html>
HASSET that either map to existing thesauri or which, although not in the major 20 hierarchies, would have been present if resources were available. Each hierarchy will be sent to the CESSDA archives for evaluation of coverage and usefulness.
As each hierarchy is reduced it will be translated --- Although it is hoped that, at this broad level, one-to-one equivalence will be possible for the vast majority of terms, the format will allow for non-equivalence and different structures in each language.
Extensive use of scope notes will resolve ambiguities, translation assumptions and subject coverage of hierarchies. The translated hierarchies will be sent to the appropriate archives of CESSDA for evaluation and addition of language specific synonyms.” (Ibid.)
ELSST is divided into thematic parts, which (in the working version) are:
Addiction, Age groups, Attitudes, Disadvantaged groups, Discrimination, Ethnic groups, Equipments, Families, Family environment, Housing, Offences, Economics, Labour and employment, Political institutions, Political systems, Politics, Social problems, Social structure, Social welfare, Sociology, Analysis, Conflict, Data, Demography, Development, Emotional states, Environmental changes, Environmental sciences, Human behaviour, Identity, Life histories, Methodology, Nationality, Quality, Businesses, Consumption, Education, Educational environment, Health, Human behaviour, Human settlement, industries, membership, Population migration, Products, Property, Ownership and tenure, Religion and Resources.
In September 2001, the working version of ELSST included approximately 1,500 preferred terms, 860 non-preferred terms and 270 standalone terms divided into ten major hierarchies and thirteen additional hierarchies.
In September 2009, ELSST is being used in the Madiera portal, which provides unified access to European data resources. About the thesaurus is informed:
“The ELSST is a multilingual social science thesaurus. It is available in German, Danish, Greek, English, Spanish, Finnish, French, Norwegian and Swedish. It includes more than 3000 terms.
There are two versions of ELSST in the portal. The ELSST version matches on keywords only. The ELSST Free Text version matches on a few key text fields e.g. title, abstract, keywords, variables and subject.” (MADIERA 2009)
SOSIG General Social Science Thesaurus
SOSIG is funded by the Electronic Libraries Programme and by the Economic and Social Research Council. It is based in the Institute for Learning and Research Technology at the University of Bristol, and has been used as a model for the creation of several UK based gateways in other subject areas. (Worsfold 1999)
The Social Sciences Information Gateway (SOSIG) aims at locating "high quality sites on the Internet, which are relevant to social science education and research." It provides three different thesauri to aid in searching. In this study, the General social science Thesaurus is used. This was developed to "provide alternative terms that will generate hits in the SOSIG Internet Catalogue." The SOSIG Thesaurus is derived from
the HASSET Thesaurus, and developed by the UK Data Archive (SOSIG 2001) in co-operation with several institutions:
“SOSIG will be working with the UK Data Archive, IBSS [13] , the Centre for Economic Performance at LSE [14] and Qualidata [15] to establish a social science Thesaurus based on HASSET. Keywords used by the contributing services which are not currently held in HASSET will be submitted as candidate terms which can then be included in updates of the Thesaurus, thereby increasing the value of the Thesaurus to all users of the various services and avoiding duplication of effort.” (Hooper 1997) It is constructed according to the generally accepted principles.
Eurovoc
The Eurovoc thesaurus is published in the official languages of the European Community and it thus includes English and Finnish. All the languages have equal status. The Eurovoc Thesaurus covers the fields in which the European Communities are active, i.e. politics, international relations, European Communities, law, economics, trade, finance, social questions, education and communications, science, business and competition, employment and working conditions, transport, environment, agriculture, forestry and fisheries, agri-foodstuffs, production, technology and research, energy, industry, geography, international organisations. (EC 2001)
Eurovoc is divided into 21 fields and 127 microthesauri including altogether 6,075 descriptors (of which 508 are top terms). In English it also includes 516 scope-notes and 5,672 non-descriptors. In Finnish it includes 628 notes and 4,817 non-descriptors. (EC 2001). It is continuously being updated, and in 2009 (V4.3, last visited 29.1.2009) Eurovoc comprised 6,645 descriptors (of which 519 are top terms). The English version had 759 scope notes and 6,769 non-descriptors, while Finnish version had 859 scope notes and 5,445 non-descriptors. (Ibid.)
In addition to equal status between different languages, as regards equivalence it is stated, that “there is no equivalence between the non-descriptors in the various languages, as the richness of the vocabulary in each language varies from field to field”.
The equivalence relationship between descriptors and non-descriptors is shown by the commonly used abreviations: “UF" (Used For and “USE"). It is also stated that the equivalence relationship covers relationships of several types, such as “genuine synonymity, or identical meanings; near-synonymity, or similar meanings; antonymy, or opposite meanings; inclusion, when a descriptor embraces one or more specific concepts which are given the status of non-descriptors; because they are not often used”. Thus the numbers of non-descriptors and scope notes vary from language to language. (Ibid.)
The Eurovoc thesaurus has been compiled in accordance with the standards of the International Standards Organization, namely ISO 2788-1986 - Guidelines for the establishment and development of monolingual thesauri; and ISO 5964-1985 - Guidelines for the establishment and development of multilingual thesauri. What is noteworthy, and contary to common practice, is the preference for the singular form
-“You can look for a descriptor or a non-descriptor using an expression, a term or part of a term. Enter the term(s) in the singular, then click on "search". (Ibid.)
UNESCO Thesaurus
The UNESCO Thesaurus is developed by the United Nations Educational, Scientific and Cultural Organisation. It covers the subject fields of education, science, culture, social and human sciences, information and communication, and politics. It is constructed to facilitate subject indexing in libraries, archives and similar institutions. It was first published in 1977 and the second edition was issued in 1995. (UNESCO 2001)
UNESCO Thesaurus is widely known and used, and has also been used as a basis for other thesauri, e.g. for HASSET.
ERIC
ERIC, the Education Resources Information Center, is an online digital library of education research and information. It is sponsored by the Institute of Education Sciences (IES) of the U.S. Department of Education. “ERIC provides ready access to education literature to support the use of educational research and information to improve practice in learning, teaching, educational decision-making, and research.”
(ERIC 1999, 2009)
ERIC is constructed according to the general thesaurus construction rules and actively updated. ERIC is used world-wide.
“The Thesaurus of ERIC Descriptors (Thesaurus) is a controlled vocabulary - a carefully selected list of education-related words and phrases assigned to ERIC records to organize them by subject and make them easier to retrieve through a search.
Searching by Descriptors involves selecting relevant terms from this controlled vocabulary to locate information on your topic.”
“ERIC has an ongoing commitment to maintain the Thesaurus of ERIC Descriptors (Thesaurus). In addition to adding new terms, ERIC may modify the status of an existing Descriptor if it has been rarely used in indexing, overlaps with other terms, or becomes obsolete. ERIC may also reinstate terms if necessary. These changes, generally based on literary warrant, are considered routine maintenance, and are in accordance with standard practices of thesaurus development and maintenance as outlined and defined in ANSI/NISO Standard Z39.19-2005, Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies.
Other Thesaurus maintenance activities include updating hierarchical relationships and adding or changing scope notes.“ (ERIC 2009)
CSA Sociological Abstracts Thesaurus
The Sociological Abstracts Thesaurus contains an alphabetical listing of Main Term descriptors used for indexing and searching the SA database and printed index,
beginning with the April 1986 issue. (CSA 2001) It is produced by Cambridge Scientific Abstracts, CSA, which is a privately owned information company located in the United States with offices in Hong Kong, France and the UK.
The thesaurus is constructed according to the thesaurus construction standards.
When compared to thesauri in general it is exceptional that the Sociological Abstracts Thesaurus also includes detailed historic notes, which “provide the range of years in which a term was in use, its former Descriptor Code, and the word form if it has changed. Often they provide search instructions. History Notes appear for both Main Terms and discontinued terms” (CSA 2001).
ICPSR
Inter-University Consortium for Political and Social Research, ICPSR, was established in 1962. ICPSR is the world's largest archive of digital social science data and an active partner in social science research and instruction throughout the world. It acquires, preserves, and distributes original research data, and also provides training in its analysis. It also offers access to publications based on its data holdings. Physically it is a unit within the Institute for Social Research at the University of Michigan, but it is a multinational organisation. - ICPSR is a membership-based organization, with over 640 member colleges and universities (including Finnish Universities via Finnish National Membership, Finnish Social Science Data Archive, FSD), as well as around the world.
(ICPSR 2009a)
ICPSR provides a thesaurus, which is composed of three separate lists: Subject Thesaurus, Personal Names Authority List, and Geographic Names Thesaurus. In this study only Subject Thesaurus is used.
“Subject Thesaurus is an alphabetical listing of social science subject terms. The scope of this thesaurus is multidisciplinary and is intended to reflect the subject range of the ICPSR archive. Social science disciplines represented include: political science, sociology, history, economics, education, criminal justice, gerontology, demography, public health, law, and international relations.” (ICPSR 2009b)
ICPSR has compiled a bibliography of reference documents and thesauri that were used to prepare the ICPSR controlled vocabulary system, and links are provided to these (PDF 12K) on their website. Development of the ICPSR Thesaurus was supported by the National Science Foundation (SES-9977984). The structure and format conventions used to construct Subject Thesaurus follow the recommendations outlined in the Guidelines for the Construction, Format, and Management of Monolingual Thesauri, Z39.19-1993 (NISO 1993). (Ibid.)
The sources consulted (reported in Thesaurus refs. Sullivan Feb. 7, 2002) were various, 25 were listed, and they include several classics in the field of thesaurus construction, such as Aitchison, Gilchrist and Bawden (2000): Thesaurus Construction and use: A Practical Manual; Cleveland and Cleveland (2001): Introduction to Indexing and Abstracting; Hjorland (1997); Information Seeking and Subject Representation: an Activity-Theoretical Approach to Information Science; and Lancaster (1991) Indexing
and Abstracting in Theory and Practice. The thesauri used were also various, and totalled 26, out of which 24 are online. They represent commonly known thesauri and are mostly in the field of social sciences.
6.3.3 Dictionaries
The six English dictionaries selected were gathered from the link pages of Finnish translation science departments (most from The School of Modern Languages and Translation Studies in University of Tampere) and university libraries. – The purpose is to use common language dictionaries, which are considered useful also in academic contexts and freely available in university networks. – Links in university web-pages are thus seen as a guarantee of (sufficient) high quality and usability. In addition, one Finnish-English dictionary is studied, commonly used in the Finnish universities and provided by university libraries. The emphasis is not on dictionaries, and they are neither evaluated nor compared, but used to provide information about how the studied terms are understood in dictionaries in general. The dictionaries were studied in 2002-2003 and checked for possible changes at the end of the study, which is 6/2009.
The aim was to include British, American and international English dictionaries. In practice many of the dictionaries linked in the Finnish university web-pages are American-based, but aim at international content, which is also seen in this study. There was a great variety in information provided to the user in their web-pages. All reported the name of the publisher, but most did not inform the users of how many keywords they include, or how the dictionary is updated, and what corpus was used etc.
The studied online dictionaries, most freely available on the Internet and all commonly known and widely used, are:
1 Newbury House Dictionary of American English
<URL: http://nhd.heinle.com/>
Heinle's Newbury House Dictionary of American English contains over 40,000 entries. It represents short definitions and sample sentences and idioms. (Newbury 2003)
2 WordNet
<URL: http://wordnet.princeton.edu/>
WordNet is a large lexical database for the English language provided by Princeton University, Cognitive Science Laboratory. The most recent version is 3.0, published in 2006, in which the total of all unique noun, verb, adjective, and adverb strings is 147278. (WordNet 2003 & 2009)
3 OED Online
<http://dictionary.oed.com/>
OED Online is the Oxford English Dictionary (online version), published by Oxford University Press. It claims to represent international English. It has been available from March 2000, and provides authoritative definitions of over 500,000 words, traces the usage of words from their first recorded occurrence to the modern period through 2.5 million quotations from a wide range of international language
OED Online is the Oxford English Dictionary (online version), published by Oxford University Press. It claims to represent international English. It has been available from March 2000, and provides authoritative definitions of over 500,000 words, traces the usage of words from their first recorded occurrence to the modern period through 2.5 million quotations from a wide range of international language