• No se han encontrado resultados

3.1 ¿Qué es la Deep Web? (Echeverri Montoya, 2016) 3.1.1 Definición

3.2. Moneda electrónica (Bitcoin Wiki, 2015)

Once the new cube schema is defined, we need to link the level members with their schema definitions, i.e., populate the dimension level instances. Moreover, we link the level members with their level attribute instances and copy the observations to the new QB4OLAP graph. The first input of the step is MapChild2Parent that maps child level members to their parent level members. Next input is MapLInstance2L that maps level members to their levels. The MapLInstance2LAInstance input maps level members and level attribute instances. Then, the set of cube instances Iqb=DIqb∪Oqb, where

DIqbdefines dimension instances and Oqbdefines observations. The last input

is the complete new cube schema Sqb4o. The output of the step is the set of

new cube instances Iqb4o =LIqb4o∪Oqb4o∪LAIqb4o. The details of the step are

presented below.

Step4. Annotation of the cube instances

INPUT: MapChild2Parent, MapLInstance2L, MapLInstance2LAInstance, Iqb, Sqb4o

OUTPUT: Iqb4o

Step4.1. Copy dimension instances from Gqbas base level instances in Gqb4o:

• LIqb4o∪= {copyAsLevelInstance(di), di∈DIqb}, where copyAsLevelInstance

is a function that receives a triple di representing a dimension instance in QB and returns a triple li defining the subject (i.e., an IRI) of the triple di as a level instance in QB4OLAP, using the qb4o:LevelMember property.

Triples patternadded to LIqb4o:

liIRI a qb4o:LevelMember, where liIRI is the IRI of the level instance ob- tained from di. For instance, a triple related to the running example:

1 country:RS a qb4o:LevelMember .

Step4.2. Add coarser granularity level instances:

• LIqb4o∪= {MapChild2Parent(li), li∈LIqb4o}, where MapChild2Parent is a

mapping that, for a given dimension level instance, returns its corresponding parent level member. This object is then defined as a level instance, using the qb4o:LevelMember property.

Triples patternadded to LIqb4o:

liIRI a qb4o:LevelMember, where liIRI is the IRI of the new level instance, returned by MapChild2Parent. For instance, a triple related to the running example:

1 region:ECS a qb4o:LevelMember .

Step4.3. Copy observations:

• Oqb4o∪= {o, o∈Oqb}, where o is an observation from Oqbthat is added

1. Enrichment Methodology

Triples patternadded to Oqb4o:

oIRI a qb:Observation, oIRI qb:dataSet dsIRI, oIRI lIRI liIRI, and oIRI mIRI mvalue, where oIRI, dsIRI, lIRI, liIRI, and mIRI are the IRIs of the observation, data set, level, level instance, and measure, respectively, while mvalue is a literal representing measure value. For instance, triples related to the running example:

1 <http://worldbank.270a.info/dataset/world−bank−indicators/ 2 CM.MKT.LCAP.CD/RS/2012> 3 a qb:Observation ; 4 qb:dataSet dataset:CM.MKT.LCAP.CD ; 5 property:indicator indicator:CM.MKT.LCAP.CD ; 6 sdmx−dimension:refArea country:RS ; 7 sdmx−measure:obsValue 7450560827.04874 ;

Step4.4. Specify the level for each level instance:

• LIqb4o∪ = {linkToLevel(li, MapLInstance2L(li)), li∈LIqb4o}, where linkTo-

Level is a function that receives a pair(li, l), where li is an instance of qb4o:Level- Member and l is an instance of qb4o:LevelProperty, and produces a triple lil telling that the level member li belongs to the level l. MapLInstance2L is a mapping that, given a level instance, returns the level it belongs to.

Triples patternadded to LIqb4o:

liIRI qb4o:memberOf lIRI, where liIRI and lIRI are the IRIs of the level member and level, respectively. For instance, a triple related to the running example:

1 country:RS qb4o:memberOf sdmx−dimension:refArea .

Step4.5. Specify rollup (i.e., parent–child) relationships between level instances:

• LIqb4o∪ = {linkRollUps(MapChild2Parent(li), li), li∈LIqb4o}, where link-

RollUps is a function that receives a pair (pli, cli), where pli and cli are in- stances of qb4o:LevelMember, and produces a triple pcli telling that cli rolls-up to pli using the skos:broader property.

Triples patternadded to LIqb4o:

cliIRI skos:broader pliIRI, where cliIRI and pliIRI are the IRIs of the child and parent level instances, respectively. For instance, a triple related to the running example:

1 country:RS skos:broader region:ECS .

Step4.6. Add level attribute instances:

• LAIqb4o∪= {addLevelAttInstance(li, MapLInstance2LAInstance(li)), li∈

LIqb4o}, where addLevelAttInstance is a function that receives li and a pair

(la, lai), where li is an instance of qb4o:LevelMember, la is an instance of qb4o:Le- velAttribute, and lai is a level attribute value (IRI or literal), and produces a

triple lilalai telling that li has an attribute la with the value lai. MapLInstance2- LAInstance is a mapping that, for a given dimension level instance, returns its level attribute–level attribute value pair(s).

Triples patternadded to LAIqb4o:

liIRI laIRI laiIRI or liIRI laIRI laiLiteral, where liIRI, laIRI, and laiIRI are the IRIs of the level instance, level attribute, and level attribute value, respectively, and laiLiteral is a literal representing level attribute value. For instance, a triple related to the running example:

1 country:RS schema:capital ``Belgrade''^^xsd:string .

Step4.7. Create new cube instances:

• Iqb4o= LIqb4o∪Oqb4o∪LAIqb4o. Iqb4o represents a union of LIqb4o (i.e., the

level members), Oqb4o (i.e., observations), and LAIqb4o (i.e., the level attribute

instances) with no additional triples pattern.

The output of this step is Iqb4o. Triple examples of Iqb4oare summed up in

Example 28. This example follows our running example and is an extension of previous ones.

Example 28

Resulting triples of Step 4. 1 country:RS a qb4o:LevelMember . 2 region:ECS a qb4o:LevelMember . 3 <http://worldbank.270a.info/dataset/world−bank−indicators/ 4 CM.MKT.LCAP.CD/RS/2012> 5 a qb:Observation ; 6 qb:dataSet dataset:CM.MKT.LCAP.CD ; 7 property:indicator indicator:CM.MKT.LCAP.CD ; 8 sdmx−dimension:refArea country:RS ; 9 sdmx−measure:obsValue 7450560827.04874 ; 10

11 country:RS qb4o:memberOf sdmx−dimension:refArea . 12 country:RS skos:broader region:ECS .

13 country:RS schema:capital ``Belgrade''^^xsd:string .

Result examples of Step 4.1. and Step 4.2. are illustrated in lines 1 and 2, respectively. Lines 3 – 9 illustrate copying of the part of observation from Example 2 as result example of Step 4.3. Then, line 11 presents the result example of Step 4.4. Finally, line 12 illustrates the result example of Step 4.5. and line 13 the result example of

Chapter 5

SM4MQ: A Semantic Model

for Multidimensional

Queries

The paper is to be submitted to a conference.

Abstract

On-Line Analytical Processing (OLAP) is a data analysis approach to support decision- making. On top of that, Exploratory OLAP is a novel initiative for the convergence of OLAP and the Semantic Web (SW). This convergence enables the use of OLAP techniques on external data, such as the SW, to analyze the publicly available data in a user-friendly manner. Moreover, OLAP approaches exploit different metadata artifacts (e.g., queries) to assist the user with the analysis. However, modeling and sharing of most of these artifacts are typically overlooked. Thus, in this paper we focus on the query metadata artifact in the Exploratory OLAP context. As OLAP is based on the underlying multidimensional (MD) data model we denote such queries as MD queries and propose SM4MQ: A Semantic Model for Multidimensional Queries. SM4MQ is an RDF-based formalization of MD queries and it captures semantics of the related OLAP operations at the conceptual level. Thus, it enables sharing and reuse of these queries on the SW. Furthermore, we propose a method to automate the exploitation of queries by means of SPARQL (the standard query language for RDF). We apply our method to a use case of transforming a query from SM4MQ to a vector representation that enables computing of their similarities (e.g., using cosine similar- ity). For this use case, we also developed a prototype and used a set of MD queries to perform an evaluation. This way, we exemplify practical benefits of using SM4MQ to automate exploitation of MD queries. Overall, this paper provides foundations

for the modeling and sharing of MD queries on the SW that as well facilitate their further processing, e.g., for user assistance purposes.

1

Introduction

On-Line Analytical Processing (OLAP) is a well-established approach for data analysis to support decision-making [2]. Due to its wide acceptance and suc- cessful use by non-technical users, novel tendencies endorse broadening of its use from solutions working with in-house data sources to analysis consid- ering external and non-controlled data. A vision of such settings is presented as Exploratory OLAP [4] promoting the convergence of OLAP and the Se- mantic Web (SW). The SW provides a technology stack for publishing and sharing of data with their semantics and many public institutions, such as Eurostat, already use it to make their data publicly available. The Resource Description Framework (RDF) [28] is the backbone of the SW representing data as directed triples that form a graph where each triple has its semantics defined. Querying of RDF data is supported by SPARQL [82], the standard query language for RDF.

To facilitate data analysis, OLAP systems typically exploit different meta- data artifacts (e.g., queries) to assist the user with analysis. However, al- though extensively used, little attention is devoted to these metadata arti- facts [104]. This originates from traditional settings where very few (meta)- data are open and/or shared. Thus, [104] proposes the Analytical Metadata (AM) framework, which defines AM artifacts such as schema and queries that are used for user assistance in settings such as Exploratory OLAP. In this context, analysis should be collaborative and therefore these metadata artifacts need to be open and shared among different systems. Thus, SW technologies are good candidates to model and capture these artifacts.

A first step for (meta)data sharing among different systems is to agree about (meta)data representation, i.e., modeling. As RDF uses a triple repre- sentation that is generic, the structure of specific (meta)data models is defined via RDF vocabularies providing semantics to interpret the (meta)data. Thus, the AM artifacts are modeled in [105] proposing the SM4AM metamodel. Due to the heterogeneity of systems, the metamodel abstraction level is used to capture the common semantics and organization of AM. Then, metadata models of specific systems are defined at the model level instantiating one or more AM artifacts. For instance, the schema artifact for Exploratory OLAP can be represented using the QB4OLAP vocabulary to conform data to a mul- tidimensional (MD) data model for OLAP on the SW [106]. QB4OLAP further enables running of MD queries to perform OLAP on the SW [102]. However, the representation of these queries to support their sharing, reuse, and more extensive exploitation on the SW is yet missing. Thus, in the present paper

2. Background

we propose a model for MD queries and explain how it not only supports sharing and reuse but can also be used to facilitate metadata processing, e.g., for user assistance exploitations such as query recommendations.

In particular, the contributions of this paper are:

• We propose SM4MQ: A Semantic Model for MD Queries formalized as an RDF-based representation of typical OLAP operations. The model captures the semantics of common OLAP operations at the conceptual level and supports their sharing and reuse via the SW.

• We define a method to automate the exploitation of SM4MQ queries by means of SPARQL. The method is exemplified on a use case to trans- form a query from SM4MQ to a vector representation. The use case shows an example of generating vectors (forming a matrix) as analysis- ready data structures that are typically used in recommender systems to compare different items [8] and existing approaches such as [25] use vectors for query recommendations.

• We developed a prototype and used a set of MD queries to evaluate our approach for the chosen use case. The evaluation shows that even non-technical users can conduct this task thanks to the automatically generated SPARQL queries based on the SM4MQ semantics.

The remainder of the paper is organized as follows. The next section ex- plains the preliminaries of our approach. Then, Section 3 proposes the MD query model. Section 4 defines a method to automate the exploitation of SM4MQ queries and presents the use case of transforming these queries into a vector representation. Section 5 discusses the results of the performed eval- uation. Finally, Section 6 discusses the related work and Section 7 concludes the paper.

2

Background

For understanding our approach, we introduce the necessary preliminaries and a running example used throughout the paper. First, we explain the MD model and the most popular OLAP operations. Then, we discuss the use of SW and QB4OLAP for MD models. The formalization of QB4OLAP concepts and OLAP operations can be found in [34] and in the present paper we pro- vide the necessary intuition for understanding the proposed query model. The running example is incrementally introduced in each of the subsections.

Documento similar