Most of the guidelines (as developed in US and UK) advocate to categorize metadata elements into four categories - Required, Required if Applicable, Recommended and Optional. The basic purpose of the categorization is to identify the elements necessary for a user in a shared metadata environment. Guidelines are not format-specific; rather they identify those elements
commonly needed across all formats. An analysis of existing suggestions and guidelines shows the following categorization of metadata elements -
Required
Date Created or Date Published (dc:date)
Identifier (dc:identifier)
Institution Name (dc:publisher)
Title (dc:title)
Type of Resource (dc:type)
Required if Applicable
Creator (dc:creator)
Extent (dc:format)
Language of Resource (dc:language)
Related Item (dc:relation)
Recommended
Description (dc:description)
Access or Use Restrictions (dc:rights)
Format of Resource (dc:format)
Place of Origin (dc:coverage)
Rights Information (dc:rights)
Resource Description for OA Resources
Interoperability and
Retrieval Subject (dc:subject) Optional
Citation (dc:relation)
Collection Name
Contributor (dc:contributor)
Genre (dc:type)
Keywords or Tags (dc:subject)
Language of Metadata Record (no dc map)
Notes (dc:description)
Publisher (dc:publisher)
Application of metadata to describe OA resources are guided by four
principles that are independent of metadata schema – i) Content Standards for Metadata (to guide what information should be recorded when describing a particular type of resource and how that information should be recorded); ii) Data Value Standards for Metadata (to help to normalize data element sets to ensure consistency between records); iii) Structural Standards for Metadata (to guide in selecting fields or elements where the data resides; and iv) Syntax Standards for Metadata (to guide in encoding for data values so that they can be processed by different systems).
Content Standards for Metadata
Content Standards improve the ability to share metadata records and the discoverability of OA resources. Consistent description of metadata records helps users to understand and analyze search results efficiently. Metadata that is formatted inconsistently (e.g. names recorded both as “Last name, First name” and “First name / Last name”) impacts indexing and sorting and users suffer with confusing or incomplete results. OA content management software adopted different levels of content
Figure 5: Content Standards for Metedata DC.Creator in Greenstone
(Source: Greenstone software)
standards in describing OA resources, for example, in Greenstone digital library software includes no content standards for encoding DC.Creator (Figure 5) whereas DSpace and EPrint provides scope for giving Last Name and First Name of creator separately. EPrint (Figure 6) also provides help button
45
Resource Description for OA Resources
Figure 6: Content Standards for Metadata DC.Creator in Eprint
(Source: E-print software)
(? mark) to help submitters in encoding a particular metadata element or field. DSpace apart from maintaining contents standards provides examples and links to help file to support resource description (Figure 7).
Figure 7: Content Standards for Metedata DC.Creator in DSpace (Source: DSpace Software)
Library professionals apart, content standards provided in software may follow standards like Anglo-American Cataloguing Rules (AACR2) that covers description of different formats, and the provision of access points, Resource Description and Access (RDA) that guides content management by using FRBR principles (work/expression/manifestation/item), Cataloging Cultural Objects (CCO) that covers encoding of cultural heritage resources and Describing Archives for managing single and multi-level descriptions of archives, personal papers, and manuscripts etc.
Interoperability and
Retrieval Data Value Standards for Metadata
Standardization of data values are important for retrieval and sharing of OA contents. These standards aim to prescribe normalized list of terms to be used for certain data elements. It advocates use of controlled terms to ensure
consistency and to achieve collocation of resources related to the same topic or person through the application of thesauri, controlled vocabularies, and
authority files. The recommended data entry standardization tools are -
Getty Art and Architecture Thesauri (AAT) is a structured vocabulary for terms used to describe art, architecture, decorative arts, material culture, and archival materials.
Getty Thesaurus of Geographic Names (TGN) is a structured vocabulary for names and other information about places.
Getty Union List of Artist Names (ULAN) is a structured vocabulary for names and other information about artists.
Library of Congress Subject Headings (LCSH) comprises a thesaurus of subject headings, maintained by the United States Library of Congress.
Library of Congress Name Authorities (LCNA) includes Corporate Names, Geographic Names, Conference Names, and Personal Names.
Thesaurus of Graphic Materials I: Subject Terms (TGM-I) consists of terms and numerous cross references for the purpose of indexing topics shown or reflected in pictures.
Thesaurus of Graphic Materials II (TGM-II) is a thesaurus of terms to describe Genre and Physical Characteristic Terms.
Figure 8: Content Standards for Metadata DC.Creator in e-Print
47
Many OA repository software support data value standards, for example, e- Print software includes entire Library of Congress Subject Areas to support standard encoding of the field DC.Subject; DSpace includes research category list (although required to be activated through configuration file of DSpace) to help in populating DC.Subject field (see Figure8). These data standards are available to both cataloguer/indexer and searchers.
Structural Standards for Metadata
Metadata structure consists of elements for description of data. Structural standards define fields, scope of the fields and type of information that need to be stored (see Table 3 for DCMES). As a matter of rule it is always better to apply metadata structure that has a high level of granularity. The reason is simple – it is always easier to transfer metadata from granular structure to a more simple structure. In some cases Structural Standards mandate what Syntax Standards should be used (for example, W3C encoding rules for date and times42 based on ISO 8601). Structural standards for generic and domain- specific schemas generally follow some broad principles such as -
Fields/elements should be unambiguous; Fields/elements may be required; Some fields/elements may be repeatable; Some fields/elements may be mandatory; Some fields/elements may have unique value to identify record (e.g. use of DOI in DC.Identifier); and Some fields may have defined relationships with other fields, e.g. qualifiers or subfields. UK Metadata Guidelines for Open Access Repositories (2013) in its document entitled “Phase 1: Core Metadata (Version 0.9)” published in March 2013 prescribed following minimum fields/elements as structural standard for OA resources (M – Mandatory, R – Repeatable and O - Optional) (Figure 9):
Figure 9: Core Metadata Inclusion Types
42 http://www.w3.org/TR/NOTE-datetime
Resource Description for OA Resources
Interoperability and
Retrieval This standard mostly recommends simple DCMES for OA repositories with
Qualified DC for two instances (dc terms: issued and dc terms: Relation). These sets of recommendation also include two new elements specific to OA resources – project ID (a unique identifier normally provided by the funder) and funder name. Most of the elements have namespace 'dc' and the two new elements have ‘rioxxterms’ namespace. This UK-specific Guideline is based on the Driver project, OpenAIRE Guidelines (OpenAIRE project43) and
UKETD_DC (the metadata core set recommended by the British Library’s Electronic Theses Online Service EthOS44). Please see section 1.5.3 for
structural standards in different domains.
Syntax Standards for Metadata
These standards aim to make the metadata machine readable. Structural standards generally prescribe syntax standard(s) for fields/elements. In case structural standard does not advise syntax standard, library professionals should follow syntax that enable sharing of OA resources. Generally HTML, XML (Extensible Markup Language) and SGML (Standard Generalized Markup Language) are used as syntax standard for OA resources. UK Metadata Guidelines for Open Access Repositories (2013) recommended syntax standard for each metadata element listed in previous section. One example may be cited here for your understanding:
element: dc:creator status: mandatory
scope: The creator of a resource may be a person, organisation or service. Where there is more than one creator, use a separate dc:creator element for each one. Enter as many creators as required.
standard: The dc:creator element should take an optional attribute called “id”.
(data value) This will hold a machine-readable unique identifier, where available, for the creator. Ideally the element will include a machine-readable id and a text string in the body of the element. syntax: <dc:creator id=http://”identifier-for‐this-creator-
entity”>name‐of-this-creator-entity</dc:creator>
Where the creator is a person, the recommended format is Last Name, First Name(s) and to include an ORCID ID, if known, in its HTTP URI form, such as:
<dc:creator id=http://orcid.org/0000-0002-1395-3092>Mishra, Sanjay</dc:creator>
Note: If the creator is a person and you wish to record that person’s
affiliation, the affiliation should be recorded using the dc:contributor element.
43
http://www.openaire.eu 44
49
You may consult UK Metadata Guidelines for Open Access Repositories (2013): Phase 1- Core Metadata (Version 0.9) from rioxx.net. Other related initiatives in this direction are given as below:
CrossMark45: An initiative to support non-bibliographic metadata schema by CrossRef.
HowOpenIsIt?: An initiative of PLOS, SPARC and OASPA to set criteria to measure openness (extent of rights for different stakeholders) and quality of OA resources46.
Vocabularies for OA47(V40A): An initiative of JISC/UKOLN to
develop vocabulary control devices, category lists and authority files for OA resources.
RIOXX48: Developing Repository Metadata Guidelines: An initiative to define a standard set of bibliographic metadata for UK Institutional Repositories.
ONIX-PL49: An initiative to standardize license expression information
necessary for OA publishing.
Linked Content Coalition50: An initiative to develop rights management
metadata for OA resources.
Open Discovery Initiative51: A NISO initiative to develop library
discovery services for non-commercial and OA resources through indexed search.
Incentives, Integration, and Mediation: Sustainable Practices for Populating Repositories: An initiative of Confederation of Open Access Repositories (COAR52) to develop guidelines for populating OA
repositories including guidance for metadata management.
NISO53 Specification for Open Access Metadata and Indicators: A
NISO initiative to develop standard metadata set specifically meant for OA resources.
RSLP54: A UKOLN initiative for Collection Level Descriptions (CLDs)
as a tool for providing an overview of the content and coverage of OA collections. 45 http://www.crossref.org/crossmark/ 46 http://www.plos.org/wp-content/uploads/2012/10/OAS_English_web.pdf 47 http://www.jisc.ac.uk/whatwedo/topics/digitallibraries/pals-group/v4oa.aspx 48 http://rioxx.net/ 49 http://www.editeur.org/21/ONIX-PL 50 http://www.linkedcontentcoalition.org/ 51 http://www.niso.org/workrooms/odi/ 52 http://coar-repositories.org 53 http://www.niso.org/home/ 54 http://www.ukoln.ac.uk/metadata/rslp/schema/ Resource Description for OA Resources
Interoperability and