Informe de revisión de AENOR - ABENGOA Informe Anual 2013

The preceding section described the benefits of well-managed metadata and the challenges in implementing and managing metadata repositories for decision environments. In the last decade, the benefits of metadata have been recognized in three important domains: the Semantic Web, knowledge management and Figure 1. Conceptual architecture for metadata repository

Source Data Elements

Data Sources Extraction and

Cleansing Rules Dependencies &

Constraints Between source Data elements Transformation Rules Metadata on ETL Modules Warehouse Data Elements Source Data Objects Intermediate Data Elements

IPMAP of dimensions and facts – from sources to staging - process and quality metadata IPMAP – Staged data to Loading including transformations - process and quality metadata

IPMAP of outputs using data in a warehouse – process and quality metadata

User preferences & Vocabulary Reporting/Application metadata A dm in is tra tio n M eta da ta In fr as tru ctu re M eta da ta

decision support. The notion of a Semantic Web is one in which different applications and Web sites can exchange information and fully exploit the data/ information accessible within the Web. Achieving this requires that resources (such as a Web page, a whole Web site or any item within that has data/ information in some form including documents (Heery, 1998)) on the Web be augmented with metadata. The Resource Description Framework (RDF), endorsed by W3C and first created in 1997, offers a vehicle for specifying Table 9. Metadata elements for TDQM in a data warehouse

Metadata Entity Metadata items

Warehouse Data Elements Date loaded, Date updated, Currency (old/current) in the warehouse, associated data sources, associated extraction, cleansing, and transformation processes, whether (still) available in the data source, associated staged data elements, staged data sources

Data Sources ID or Unique name, Format type, Frequency of update, Active

Status Source Data Objects (e.g.

tables if source is relational)

Object name, Aliases, Business Entity name, Business rules associated, Owner

Source Data Elements Element name, Units, Business rules, Computation method(s),

business name/alias, data type, data length, Range-Max, Range-Min, Date/time when it was included, [Constraint and participating source elements]

Staged Intermediates /Target Objects (these are typically relational tables or object classes)

Object name, Aliases, Business Entity name, Business rules associated, Owner, Creation date, Object Status, Administrator,

Intermediate/Target Data Elements

Element name, Units, Business rules, Computation method(s), business name/alias, data type, data length, Range-Max, Range-Min, Date/time when it was included or became effective, [Constraint and participating source elements]

Source Element to Target Element Mappings & Constraints

Derivation and business rules, assumptions on default and missing values, associations between source and target data elements

ETL Process Modules ID and/or Unique name, Creation date, Effective date, Owner,

Role/Business Unit responsible, Modification date, Modified by, reason for modification, system/platform associated, location in file system, execution commands, Run Date, Error Codes/messages Extraction Process Applicable source data element(s), extraction rules, business

restrictions/rules, Last Run Date, Error Codes/Messages, output data elements

Cleansing Process Applicable source data element(s), sanitizing rules, business

restrictions/rules, output data elements

Transformation Process Input data element(s), transformation rules, business rules, output data elements

Load Process Input data element(s), format/transformation rules, business rules,

metadata for such resources to enable interoperable Web applications (Candan et al., 2002).

To enable more efficient searching and retrieval of documents within the Semantic Web, the metadata specification has to be enriched to include semantic descriptors. Users typically annotate documents (by specifying semantic metadata) to help search/retrieve information within. Ontology-based annotation systems have been proposed to assist search/retrieval of such documents: SHOE in Luke et al. (1997), Onto broker in Decker et al. (1999), WebKB in Martin and Eklund (1999) and Quizrdf in Davies et al. (2002). Corby et al. (2000) propose a model that extends the RDF to represent semantic metadata and uses concept graphs to facilitate querying and inferring capabilities by exploiting the graphs’ formalisms. Understanding that maintaining semantic metadata for Semantic Web documents is not easy, techniques for extracting metadata from Web documents and using it to facilitate search/retrieval have also been proposed (Handschuh & Staab, 2003; Ding et al., 2003).

Creating metadata can be viewed as codifying data and creating a higher-level layer of knowledge for it. Similarly, codifying organizational knowledge within KM systems may be viewed as abstracting it into a metadata layer. Markus (2001) looks at three purposes for creating knowledge as elements that play a role in how knowledge is stored, processed and distributed:

1. Self: knowledge for self-use, where little or no attention is paid to interpretable formatting;

2. Similar others: knowledge for others with a similar skill set. Assuming the ability of other users to assimilate knowledge easily, knowledge reuse efforts focus on providing essential details, rather than shape and format; and

3. Dissimilar others: knowledge aimed for others without similar skill sets. Assuming limited ability of target users to interpret the knowledge in its raw form, more efforts will be necessary to reconstruct and formalize it. It is the last category that benefits the most from metadata.

Sheth (2003) proposes the use of metadata for capturing knowledge. His study describes a methodology for creating layers of metadata to capture not only the basic business data entities, but also to structure them into business knowledge in the form of ontology — a shared conceptualization of the world as seen by the enterprise. The ontology consists of high-level business schemas, interrelation- ships between entities, domain vocabulary and factual knowledge. Knowledge is stored using a structured document and not a relational model. Broekstra et al. (2001) propose a knowledge representation framework for the Web by extend-

ing the RDF, specifically, the RDF schema. This is accomplished by the Ontology Interface Language (OIL) as an extension of the RDF schema. This representation framework permits sharing of metadata on the Web and can be extended to other knowledge representation schemes.

Although the application of metadata for knowledge management and the Semantic Web are challenging and interesting, each is an extensive field in itself. In this chapter our focus is on metadata in complex decision environments, specifically, understanding and managing metadata for decision support. Recent research has explored metadata from the perspective of a decision support aid: Can the provision of metadata improve decision making? Quality and process metadata are shown to influence managerial decision making. In the remainder of this section, we first describe the current state of research in this area, then propose a theoretical model for evaluating the role of process and quality metadata in data-driven decision-making.

In document ABENGOA Informe Anual 2013 (página 148-178)