OLU47D1: Amnesia postraumática sufrida por Tornasol tras un TCE

GRAFICO 1: Distribución de los TCE con pérdida de

K. CONSUMO DE ALCOHOL

4- OLU47D1: Amnesia postraumática sufrida por Tornasol tras un TCE

Due to the subjective nature of quality, defining a generic dataset quality vocabulary is difficult. This is attributable to the fact that fitness for use means that different domains and use cases requires different quality measures to be assessed on the data. For instance, the W3C Data on the Web Best Practices working group defines different use cases [101] with different quality requirements.

The Dataset Quality Vocabulary4(prefix: daq), illustrated in Figure6.1, is an abstract metadata model that provides a pragmatic solution for easily extending the vocabulary with custom data quality concepts that will be interoperable with other daQ measures. Therefore, any linked dataset can have a quality metadata graph attached to it, with assessment results computed by different parties. daQ is based on two W3C standard vocabularies, the RDF Data Cube Vocabulary [40] and the Provenance Ontology (PROV-O) [100], and is a direct implementation of the quality indicators classification defined in the data quality survey in [160]. To keep the vocabulary lightweight, mainly RDF Schema was used, with a few OWL constructs. In terms of the ontology design patterns (ODP), the daQ vocabulary follows the Architectural Ontology Pattern5:

Architectural ODPs affect the overall shape of the ontology: their aim is to constrain how the ontology should look like.

In our case the constraints refer to the defined concrete quality measures based on daQ.

6.2.1 The Quality Graph

Quality metadata is intended to be represented as a daq:QualityGraph (Figure6.1– Box A) within the linked dataset itself. The semantics of a quality graph is defined in Listing6.16.

daq : Q u a l i t y G r a p h a r d f s : C l a s s , owl : C l a s s ; r d f s : s u b C l a s s O f r d f g : Graph , qb : D a t a S e t , [ r d f : t y p e owl : R e s t r i c t i o n ; owl : o n P r o p e r t y qb : s t r u c t u r e ; owl : h a s V a l u e daq : d s d ] ; r d f s : comment " D e f i n e s a q u a l i t y g r a p h w h i c h w i l l c o n t a i n a l l m e t a d a t a a b o u t q u a l i t y m e t r i c s on t h e d a t a s e t . " ; r d f s : l a b e l " Q u a l i t y Graph S t a t i s t i c s " .

Listing 6.1: The quality graph definition.

4_{Available at}_{http://purl.org/eis/vocab/daq}_{with content negotiation}

5_{http://ontologydesignpatterns.org/wiki/Category:ArchitecturalOP}_{. Date Accessed 3rd October}

2016

6.2 The Dataset Quality Vocabulary (daQ)

rdfg:Graph QualityGraph

qb:DataSet

Category Dimension Metric

rdfs:Resource hasDimension hasMetric expectedDataType requires value xsd:anySimpleType daq:ObservaCon hasObservaCon rdfs:Resource computedOn metric qb:dataSet rdfg:Graph Concept in an exisCng ontology Dataset Concept in proposed ontology Category Abstract concept in proposed ontology (Not for direct use)

Undeﬁned object: Concept or Literal Classes Subclass of Property in proposed ontology hasObservaCon ProperCes Property is of type qb:DimensionProperty computedOn Property is of type qb:MeasureProperty value Property in exisCng ontology qb:dataSet sdmx-dimension:CmePeriod xsd:boolean isEsCmate xsd:dateTime qb:ObservaCon prov:EnCty

Figure 6.1: The Dataset Quality Vocabulary (daQ).

daq:QualityGraphis a subclass of rdfg:Graph [36]. This means that quality metadata is aggreg- ated, stored and managed in a named graph (cf Section2.2.1) that is on the one hand separate from the dataset, but on the other hand embedded into the dataset itself. Furthermore, named graphs can be digitally signed, thus ensuring trust in the computed metrics and defined named graph instance [36].

The Quality Graph is also a special type of qb:DataSet, which allows us to conceive a collection of quality observations as a data cube complying with a defined dimensional structure (daq:dsd – see Listing6.2). This structure is defined as an OWL property restriction on the property qb:structure. Having a standard definition ensures that all Quality Graphs conform to a common data structure definition, and thus ensures that all datasets with attached quality metadata can be compared. Three dimensions are applied in this data structure: metric (via the daq:metric property), dataset (via the daq:computedOnproperty), and time (via the sdmx-dimension:timePeriod property). This enables data cube viewers such as CubeViz [109] or Payola [76] to visualise quality metadata according to different dimensions. The time dimension enables a view on how a dataset’s quality evolved over time. Similarly, the dataset dimension enables a view of how multiple datasets fare in one or more metrics. Finally, the metric dimension enables the view of selected metrics over the assessed datasets.

6.2.2 Quality Representation

Quality metadata comprises three levels of abstraction (as illustrated in Figure6.1– Box B): Categories, Dimensions, and Metrics. To formalise this three level abstraction, two inverse role expressions (that are

daq : d s d a qb : D a t a S t r u c t u r e D e f i n i t i o n ; # D i m e n s i o n s qb : c o m p o n e n t [ qb : d i m e n s i o n daq : m e t r i c ; qb : o r d e r 1 ; ] ; qb : c o m p o n e n t [ qb : d i m e n s i o n daq : computedOn ; qb : o r d e r 2 ; ] ; qb : c o m p o n e n t [ qb : d i m e n s i o n sdmx− d i m e n s i o n : t i m e P e r i o d ; qb : o r d e r 3 ; ] ; # M e a s u r e s qb : c o m p o n e n t [ qb : m e a s u r e daq : v a l u e ; ] .

Listing 6.2: The data structure definition.

not in the daQ schema7) are introduced for convenience:

inDimension ≡ hasMetric− (6.1)

inCategory ≡ hasDimension−

(6.2) Using these newly defined properties, we can define the three level abstraction as:

C v Category

D v Dimension u ∃ inCategory.C M v Metric u ∃ inDimension.D

(6.3)

where C is a possible quality measure category, D = {d1, d2, . . . , dy} is the set of all possible quality

measure dimensions, M = {m1, m2, . . . , mz} is the set of all possible quality measure metrics, and

x, y, z ∈ N.

Using the definition in Equation6.3, we describe the quality metadata in Equation6.4, where the graph g entails one or more instances of a category c, having one or more dimensions d, each of which defines one or more metrics m.

g : QualityGraph ; c : C ; d : D ; m : M ; c hasDimension d; d inCategory c ;

d hasMetric m; m inDimension d ;

g |= {c : C} (6.4)

Recommended Best Practice - Constraining the Grouping of Dimensions and Metrics Whilst we acknowledge that a metric might be perceived in one or more categories by different parties, the main goal of the daQ model is to create a generic schema that allows the semantification of quality measures in the abstract three level hierarchy. When defining these quality measures, a common under- standing is required (the unified view presented by the survey in [160] is a good example for this), such that these measures can be used by any framework without any prejudice. If such restrictions were not in place, doubt and ambiguity might be created between the consumers and the quality assessors (who could be a data enthusiast or the publisher himself). One fundamental aspect of assessing a dataset for its quality is to start eliminating the doubts consumers may have about the data. Therefore, if the same metric has multiple definitions and categorisations, incertitude is created for the consumer. On the other

7_{The owl:inverseOf properties were avoided in order to ensure only one standard way of defining the three level}

6.2 The Dataset Quality Vocabulary (daQ)

Term Description Usage8

Classes

daq:Category

The broadest class of quality observations; groups a number of dimensions related to each other.

:Accessibility a rdfs:Class ; rdfs:subClassOf daq:Category .

daq:Dimension

Each dimension belongs to exactly one category. Each dimension comprises a number of metrics. A dimension is linked with a category using the daq:hasDimension property.

:Availability a rdfs:Class ; rdfs:subClassOf daq:Dimension .

daq:Metric

The smallest unit of measuring quality; belongs to exactly one dimension. Each metric has one or more observations (daq:hasObservation), which records data quality assessment value resulting from a computation.

:RDFDataDump a rdfs:Class ; rdfs:subClassOf daq:Metric .

Table 6.1: Description and usage of daQ abstract classes.

hand, the quality assessor ends up in an ambiguous situation to try to identify the right quality measure structure. In Equation6.5we present these restrictions as a set of optional axioms in daQ.

C v Category

D v Dimension u ∃ inCategory.(= 1 C) M v Metric u ∃ inDimension.(= 1 D)

(6.5)

6.2.3 Abstract Classes and Properties

Abstraction– “hiding” the complex meaning of an object whilst retaining the most generic view of it – is one of the main principles in the object oriented paradigm. The rationale behind using abstraction in daQ is to have a lightweight meta-model that is the basis on which tangible concepts of categories, dimensions and metrics can be defined upon. Applications consuming quality metadata, such as Luzzu Web Interface (cf. Section4.4.5), can easily re-use these defined concepts without having to know any detail about new schemas, but by applying a number of simple SPARQL queries using the daQ schema (cf. Section6.3), to retrieve quality metadata information from assessed datasets. Abstract classes are not intended to be instantiated (rdf:type), but should only be extended by quality measure concepts using the appropriate rdfs:subClassOf properties. The idea behind this inheritance is that the semantically defined quality measures (in a schema) follows the daQ structure. Hence, the newly defined subclasses will inherit all properties. This does not inhibit the creation of new concepts and properties in the schema related to the defined quality measures. Unfortunately, this cannot be enforced since in RDF the semantics of abstraction is not defined. The daQ vocabulary defines three abstract concepts (daq:Category, daq:Dimension, daq:Metric). High-level description and usage examples of these terms are illustrated in Table 6.1. These three abstract concepts are arranged in a hierarchical manner using the properties daq:hasDimension and daq:hasMetric, described in Table6.2.

Term Description

Properties

daq:hasDimension The category concept classifies dimensions related to the measurement of quality for a specific criteria.

daq:hasMetric A dimension is an abstract concept which groups an number of more concrete metrics to measure quality of a dataset.

Table 6.2: Description of daQ main properties.

Term Description Usage Class daq:Observation

A quality observation comprises statistical and provenance information related to the assessment of the metric.

:obs1 a daq:Observation ;

Properties

daq:expectedDataType

Each metric should have an expected data type for its observed value (e.g. xsd:boolean, xsd:double, etc.).

Range: xsd:anySimpleType

:RDFDataDump

daq:expectedDataType xsd:boolean ;

daq:requires

A metric implementation might require external resources (e.g. a gold standard) to be able to measure quality. To cater for the most general requirements, this property links a metric to the required resource (e.g. a URI of the gold standard dataset used).

Range: rdfs:Resource

:RDFDataDump daq:requires <uri:SemanticFormatsFile> ;

daq:hasObservation

Each metric can have one or more observations attached to it. This property (whose inverse property is daq:metric) has the range of daq:Observation, defining quality metadata for a particular metric in a specific point in time. Range: daq:Observation

:inst1 a :RDFDataDump ; daq:hasObservation :obs1 .

Table 6.3: Description and usage of metric properties.

6.2.4 Metric Representation

The metric concept is in the domain of three properties: daq:expectedDataType, daq:requires, daq:hasObservation. The former two should be used by metric concepts extending the daq:Metric class. On the other hand, the daq:hasObservation property (described further in Section6.2.4) is used by instances of the quality measure concepts. This property has the range of daq:Observation, representing statistical (subclass of qb:Observation) and provenance information (subclass of prov:Entity) of the instantiated metric’s assessment activity. A clearer view of the properties usage, together with their description is provided in Table6.3

Metric Observations

A metric’s observation is unquestionably the most important part of a dataset’s quality metadata. Each observation resource holds values and properties related to the computed metric, with the main intent of providing an insight to possible data consumers on the use of a particular dataset. A quality assessment should be replicable and traceable. An observation can have more meta-information through a provenance chain attachment. We recommend that a daq:Observation instance should have information about the activity (e.g. the algorithm, implementation or workflow that generated the observation (using prov:wasGeneratedBy), and the agent that the observation is attributed to (using prov:wasAttributedTo). Moreover, such observations can have other provenance metadata, such as parameter settings for probabilistic techniques employed to compute certain metrics efficiently, such as those discussed in Chapter7and Chapter8.

Further provenance aware properties include the date and time of the assessment (sdmx-dimension:timePeriod). This property is one of the dimensions described in the data structure definition (cf. Listing6.2) together with daq:value and daq:metric. An observation should also have an indication whether a quality metric was assessed (daq:isEstimate) using some estimation technique. In Table6.4, the properties whose domain is daq:Observation are described together with a usage example.

In document Hergé, Tintín y la medicina (página 129-134)