• No se han encontrado resultados

PREMIS (PREservation Metadata: Implementation Strategies) (PREMIS, 2005; PREMIS, 2008;

PREMIS, 2011; PREMIS 2012; PREMIS, nd) is one attempt at specifying the metadata (called semantic units in PREMIS) that is needed to support core preservation functions; in fact, it is the current de facto standard for doing so. Core preservation metadata is relevant to a wide range of digital preservation systems and contexts, and it is what “most working preservation repositories are likely to need to know” to preserve digital material over the long-term. This includes administrative metadata, but also generic technical metadata that is shared by all content types.

It permits the specification of structural relationships between entities if this is relevant for preservation functions, but users may choose to instead use the structural relationships offered by a container metadata specification.

Figure 10: The PREMIS data model (PREMIS, 2011)

PREMIS defines a common data model (illustrated in Figure 10) to encourage a shared way of thinking about and for organising preservation metadata. It has Object, Event, Agent, Right and IntellectualEntity as entities. The data dictionary permits relationships between the entities as are indicated through the arrows in Figure 10.

The semantic units that describe the entities in this data model are rigorously defined in PREMIS’s data dictionary. PREMIS supports specific implementations through guidelines for their management and use and puts an emphasis on enabling automated workflows. It makes, however, no assumptions about specific technology, architecture, content type, or preservation strategies. As a result, it is “technically neutral” and supports a wide range of implementation architectures. For example, metadata could be stored locally or in an external registry (such as a shared file format registry); it could be stored explicitly or known implicitly (e.g., all content in the

repository are newspaper articles). It does not even specify whether a semantic unit has to be implemented through a single field or through more complex data structures. Nonetheless, the PREMIS Editorial Committee maintains optional XML and RDF schemas for the convenience of the community. While PREMIS is very flexible about possible repository-internal implementations, in order to improve interoperability, it is more restrictive on cross-repository information package exchange. An example PREMIS data dictionary entry for the semantic unit size is depicted in Figure 11.

Given the wide range of institutional contexts, PREMIS cannot be an out-of-the box solution.

Users have to decide how to model their specific application, what business functions need to be supported, which semantic units need to be captured to support them, and how to implement them. In addition, they need to decide on all metadata that is necessary to manage the content that is not captured in the core preservation metadata.

Semantic unit 1.5.3. size

Semantic components None

Definition The size in bytes of the file or bitstream stored in the repository.

Rationale Size is useful for ensuring the correct number of bytes from storage has been retrieved and that an application has enough room to move or process files. It might also be used when billing for storage.

Data constraint Integer

Object category Representation File Bitstream

Applicability Not applicable Applicable Applicable

Examples 2038937

Repeatability Not repeatable Not repeatable

Obligation Optional Optional

Creation /

Maintenance notes Automatically obtained by the repository.

Usage notes Defining this semantic unit as size in bytes makes it unnecessary to record a unit of measurement. However, for the purpose of data exchange the unit of measurement should be stated or understood by both partners.

Figure 11: Example PREMIS Semantic Unit (PREMIS, 2011)

Relationship to DePICT

As in DePICT, the PREMIS data model has Representations, Files and Bitstreams as sub-classes of Objects (DePICT:PreservationObjects). The PREMIS Bitstream is restricted to a bit-stream within one file. DePICT Bitstreams are kept general and can consist of sets of bit-streams which can span several files because the bits representing Characteristics of IntellectualEntities may not necessarily align with byte boundaries (e.g. when they are extracted from a compressed file directly or if Characteristics are represented as bitmaps).

They may span several files (e.g. large files may be split with a Unix "split" command, data may be streamed into containers of a fixed file-size, such as ARC, data may be split over several files to optimise access).

DePICT distinguishes between the logical file (RepresentationBitstream) and the actual file (Bitstream). This enables users to model the logical file, with Characteristics such as the file's ideal checksum, in contrast to the individual realisations of this file, which might have, for example a Characteristic "actual checksum" which can vary from the checksum of the logical file if there is a file corruption. There may be several actual files for one logical file, if there are multiple copies held. Actual files have a location, logical files do not.

The PREMIS data model does not consider IntellectualEntities a sub-class of Objects. As a consequence, PREMIS depends on container metadata schemas for capturing, for example, the IntellectualEntity’s descriptive metadata; the data dictionary is not self-contained. It also means that the data model is not as compact and uniform as it could be, and it cannot directly specify Events or RightsStatements, or attach Agent information to IntellectualEntities.

IntellectualEntities in PREMIS are not yet fleshed out and the PREMIS Editorial Committee is currently considering how this can be improved for version 3.0. based on the work presented in DePICT.

Significant Properties in the PREMIS model exist but are not as fully defined, as SignificanceConstraints are in the DePICT model. There are, for example, no tolerance or importance factors. They can only specify individual Characteristics that must be preserved, rather than expressing Constraints, which might, for example, specify allowable modifications or post-conditions. They can only be attached to one Object at a time rather than to Environments or combinations of Environments and PreservationObjects. It is important to be able to specify for a business rule (or Constraint) under what context it applies. If one stores it with Object, which is currently the case in PREMIS, then that is the only and implicit

context. SignificanceConstraints, and in fact Constraints, should be a primary entity in the data model rather than subordinate to Object.

How Environments are dealt with in PREMIS is discussed in depth in section 2.2.5.2.1. DePICT has greatly contributed towards improving this specification by being incorporated into the work of the PREMIS Environment Working Group in 2012 (Dappert, Peyrard, Delve, Chou, 2012).

While specific Properties are modelled in some depth within PREMIS, PREMIS does not have a generic, rich specification of Properties that takes account of ValueOrigins and does not offer a meta-level on which to describe the properties of Properties and their relationships to other Properties. This is not in scope for PREMIS.

PreservationActions and PreservationRisks are outside the scope of the PREMIS data model.

The Event, Agent and Right entities of PREMIS are adopted for DePICT. They are not modelled in detail in this thesis. Several DePICT entities form digital preservation specific sub-classes to them.

It is becoming apparent, however, that there is a need for a richer Event model in PREMIS. For example, if you have an n:m migration, e.g. creating one pdf from multiple files, or creating multiple spreadsheets from one database file, it is very cumbersome and verbose in PREMIS at the moment. For succinctness’ sake, PREMIS does not always implement entities that actually are Events and Agents explicitly as such. For example, information about a creating application is modelled as Properties of an Object rather than modelled as an Agent and the information about the creating event is, similarly, modelled as a Property of an Object rather than as an Event. This provides a convenient shortcut and leads to less verbose XML implementations, but sacrifices the cleanness of the model and makes it harder to adapt the standard to new use cases or implementations. In the DePICT XML implementation such shortcuts are mostly avoided to maintain clean modelling principles. But that is not to claim that, in practical implementations, they will not have to be sacrificed occasionally.

Most of the differences listed in this section are due to the fact that PREMIS was conceived as a data dictionary to capture preservation metadata for digital information objects in OAIS (CCSDS, 2012) repositories. It was not conceived to support dynamic digital preservation actions or end-to-end life-cycles. Figure 12 summarises coarsely how the DePICT and PREMIS entities relate.

Entities depicted in blue are concepts that are largely shared, entities depicted in violet are shared concepts that have different extent or take a different role in the model and entities depicted in

1..*

Figure 12: DePICT in relationship to the PREMIS model.

Blue: largely shared entities, violet: shared entities with different emphasis; pink: DePICT-only entities

2.2.4.2 PROV-DM

“The term 'provenance' refers to the sources of information, such as people, entities, and processes, involved in producing, influencing, or delivering a piece of data or a thing in the world. In particular, the provenance of information is crucial in deciding whether information is to be trusted, how it should be integrated with other diverse information sources, and how to give credit to its originators when reusing it. In an open and inclusive environment such as the Web, users find information that is often contradictory or questionable: provenance can help those users to make trust judgments.”

(W3C, 2011a) Provenance metadata is an important category of digital preservation metadata, since its reliable application enables consumers of digital information objects to judge their degree of authenticity if they have undergone change over time. PROV-DM (the provenance data model) (W3C, 2011a) and PROV-O (the provenance ontology) (W3C, 2011b) are an effort by a W3C official Working Group to create a core data model for provenance as a provenance interchange model across

systems. It is generic and domain agnostic. Individual systems can implement their own native domain and application specific representations of provenance and can translate them into the interchange model for information exchange. The model includes the entities depicted in Figure 13.

Entity is a representation of a characterised thing. ProcessExecution represents an activity that has an effect on entities by generating or using them. Agent represents a particular entity that can be associated to activities and is capable of controlling ProcessExecutions. Qualifiers can be associated with relations. And Annotations are used to provide additional, "free-form"

information regarding any identifiable construct of the model.

Figure 13: Entities in Prov-DM (W3C, 2011a)

Relationship to DePICT

As in DePICT, ProcessExecutions (corresponding to PreservationActions) are applied to a resource, called Entity (roughly corresponding to PreservationObject and/or Environment) over time by Agents resulting in derivative Entities or PreservationObjects and/or Environments. Unlike PREMIS, this covers the whole life-cycle of the digital information object. It models Agents, such as creating applications and message digest originators, explicitly as Agents rather than as Properties that are subordinate to Objects, and, it similarly, models all Events, such as information on the creation event or the granting of rights, as Activities rather than as Properties that are subordinate to Objects. This way of modelling Agents and Events is in

instantiate the DePICT model. Unlike Prov-DM, DePICT does not restrict Agents to entities that carry responsibilities, but allows them to take on any kind of role.

2.2.4.3 StratML

Strategy Markup Language (StratML, nd) is a basic conceptual model for describing the essential contents of a strategy document. It is envisioned as an ISO standardised XML schema and vocabulary for US Federal agency strategic plans that is aligned with the Federal Enterprise Architecture, government policy, and leverages existing standards (StratML, 2006). The non-Constraint elements of Policies in DePICT can be reused from StratML.

Figure 14: An example snippet from http://xml.gov/stratml/BSAStratPlan.xml

<?xml version="1.0" encoding="UTF-8" ?>

<StrategicPlanCore StartDate="1/1/2006" EndDate="12/31/2010" Date="2007-11-27">

<Submitter FirstName="Owen" LastName="Ambur" PhoneNumber="" EmailAddress="[email protected]"/>

<Source>http://www.scouting.org/media/strategy/45-016.pdf</Source>

<Organization>

<Name>Boy Scouts of America</Name>

<Acronym>BSA</Acronym>

</Organization>

<Vision>The Boy Scouts of America will prepare every eligible youth in America to become a responsible, participating citizen and leader who is guided by the Scout Oath and Law.</Vision>

<Mission>The mission of the Boy Scouts of America is to prepare young people to make ethical and moral choices over their lifetimes by instilling in them the values of the Scout Oath and Law.</Mission>

<Goal>

<SequenceIndicator>1</SequenceIndicator>

<Name>Opportunity for Involvement</Name>

<Description>Every Eligible Youth Has an Opportunity to Be Involved in a Quality Scouting Experience</Description>

<Stakeholder />

<Objective>

<SequenceIndicator>1.1</SequenceIndicator>

<Name>Market Share</Name>

<Description>Increase market share and/or growth.</Description>

<Stakeholder />

Some top-level elements in StratML as of 2010 are as follows:

Submitter: The person submitting the policy.

Source: The Web address (URL) for the authoritative source of this document

Organization: The legal or logical entity to which the policy applies.

Vision: Vision statements are distinguished from goals in that they are the focus of constant pursuit but can never be satisfied in the sense of being met or completed.

A concise and inspirational description of a state the organisation will strive to approach over a relatively long span of years but which can ultimately never be fully achieved.

Mission: Mission statement. A brief description of the basic purpose of the organisation.

An agency's goals should flow from the mission statement.

Value: A principle that is important and helps to define the essential character of the organisation.

Goal: General goal.

A relatively broad statement of intended results to be achieved over more than one resource allocation and performance measurement cycle.

Goals define a purpose and direction and take all stakeholders and perceived present and future needs into account. Goals must be capable of being effectively pursued with measurable results over more than one budgetary execution cycle but within the

reasonably foreseeable future. Goals should be objective, quantifiable, measurable, and defined at the level to be achieved by a program activity.

Supports Mission

Objective: Performance goal.

A target level of results expressed in units against which achievement is to be measured within a single resource allocation and performance execution cycle.

Supports Goal.

Objectives are measurable subsets of Goals to be achieved within a given time period with available resources. Objectives provide the day-to-day support for achieving Goals.

Submitter, Source, Organization, Vision, Mission and Value are adopted in DePICT from StratML.

Within DePICT, these concepts are used in the following way:

StratML:Value, which expresses an (ethical) value of a stakeholder, is different from the “DePICT:Value”, which expresses the Value of a Characteristic ( assigned or derived Value).

• A StratML:Objective is roughly equivalent to a Constraint in DePICT. In StratML, an Objective is represented as a string. In order to support automated preservation planning, however, a machine-interpretable definition of the Objective / Constraint is needed. This is developed below.

The other StratML elements provide values that can be simply looked up and used by preservation services. Figure 14 shows an example snippet of a StratML document for the Boy Scouts of America.

Documento similar