Understanding indexing remains a challenge because of its cognitive complexity; it is claimed that the process “seems not to be susceptible to precise rules” (Lancaster, 2003, p. 35). There is a surprising lack of empirical research and what little that has been done deals almost exclusively with text indexing.
There is a variety of theoretical discussions of indexing. The process has been summarised as follows:
The general consensus among indexers and theoreticians is that human indexers perceive...a text, interpret the message encoded in the text as they understand it (influenced by previous experience and current personal knowledge, including their interpretations of any instructions given them), and then describe their version of the message, plus any important text or document features, in accordance to rules and patterns for the type of index they are working on. (Anderson & Perez-Caballo, 2001a, p. 233)
This describes the simplistic two-step model of subject analysis followed by translation into the system vocabulary and it is the prevailing view of the indexing process (see, for example, Lancaster, 2003; Mai, 2005). Other models elaborate on this. The three-step model divides the analysis stage into two steps, examining the item to establish its subject content and then identifying the principal concepts, followed by their translation into the indexing language (see ISO, 1985). The four- step model subdivides the translation of subject concepts into two steps, rendering into the vocabulary and formulating the entry (for example, Chowdhury, 2004, p. 74).
Mai explores indexing in more detail. He initially proposes a three-step interpretative process linked to four elements (document, subject, subject description, subject entry). These he argues can be viewed as a set of closely related interpretations which, as indexers move from novice to expert, may become almost simultaneous (Mai, 2000, pp. 294-295). Subsequently (2001) he applies Peircean semiotics31 to understanding indexing and the multiple interpretations he proposes in his model of semiotic indexing.
Mai’s model represents the complexity of indexing but it provides no direction. More recently (2005) he suggests a domain-centred approach as an alternative to document-centred indexing. This approach analyses the domain, then user needs, the indexer perspectives, and finally the document in the context of the domain and user needs (p. 607).
The few empirical investigations relate to text indexing and provide useful evidence and, assuming similar cognitive processes operate, guides to image indexing.
David et al. (1995), after an experiment with four experienced indexers, propose indexing as a problem solving activity with five stages related to specific knowledge areas: document scan (knowledge of procedures/librarianship); context analysis (domain knowledge); concept selection (domain knowledge); translation into descriptors (thesaurus or domain knowledge); and revision (knowledge of indexing policies, users, and databases).
Sauperl (2002), in a study of 12 cataloguers, identifies five stages: examine book and identify topic, identify author’s intent, infer or anticipate readers' uses, translate and relate the topic to existing collection, verify the topic in the classification and subject heading list. The process is not linear but iterative. Subsequently, Sauperl (2004) introduces a more sophisticated discussion of interpretation using Beghtol's classification theory which looks at meaning from the perspectives of author, cataloguer, and reader (Beghtol, 1986a). While Sauperl considers cataloguers in her study were aware of potentially different meanings they develop the cataloguer’s meaning. Her study reveals:
six sources of inspiration for generating subject headings: (1) the document, (2) the cataloger's previous experience, (3) the cataloging practice and the catalog of the cataloger's library, (4) the catalogs of other libraries, the Library of Congress being the most authoritative, (5) the subject headings list, and (6) reference sources. (p. 62)
Only one, the document, is shared with the author, and one, information resources, with users. Sauperl concludes that "this implies that catalogers are more oriented
toward their professional community” (p. 62). She suggests the strategy of using existing cataloguing to contain semiosis when describing a new book is further evidence that cataloguers only build common ground with other cataloguers.
Fujita et al. (2003) in a study of reading for indexing identify two different levels of comprehension: micro integration and macro understanding of the indexer's own comprehension at a metacognitive level. The indexers employ different strategies through a variety of stages during which they keep objectives in mind, make associations with the documentary language and maintain thematic coherence and global comprehension of the text. The researchers conclude the reader-indexer is more proficient than normal readers and needs linguistic knowledge, textual structure knowledge and world knowledge. Other expertise effects are shown in a study of 20 text indexers (Bertrand et al., 1996) where indexers less familiar with content identify few concepts and base decisions on surface level features in comparison to more expert indexers. Cuing and prior knowledge, including of documentary language, influence some concept choices.
A major theme in the literature is inter-indexer consistency (Olson & Wolfram, 2008, p. 602). Consistency has been judged critical to retrieving relevant items, and studies show varying degrees of inter-indexer consistency (Chan, 1989). However, consistency is not necessarily the same as correctness or quality (Fugmann, 1999; Lancaster, 2003, p. 77; Soergel, 1994, p. 593ff.). More than forty years ago Cooper (1969) made the point that inconsistency is the rule and what matters is the effect on retrieval, what he terms "indexer-requester consistency" (p. 270), and precision (p. 272). There is some evidence that visual material may produce low levels of consistency (Enser, 1995; Markey, 1984) but other evidence points to greater consensus for objective subjects.
Over a decade ago, in a review of practice in 30 US institutions, McRae (2000, p. 4) decried the lack of knowledge and practice to guide professional indexers. The continuing lack of evidence about image indexing represents a basic gap in our understanding.