• No se han encontrado resultados

Primera parte: Marco teórico

EFECTOS DEL ESTRÉS SOBRE LA SALUD Alteraciones

3.2 Variables relacionadas

3.2.7 Medidas del Rendimiento Académico

From theDWexample shown in Figures 7.1, 7.2, and 7.3, we define the corresponding data mapping diagram shown in Figure 7.9. The goal of this data mapping is to calculate the quarterly sales of the products belonging to the computer category. The result of this trans- formation is stored inComputerSalesfrom theDWCS. The transfor- mation process has been segmented in three parts: Dividing, Filter- ing, andAggregating; moreover,DividedOrdersandFilteredOrders, two «Intermediate»classes, have been defined.

Following with the data mapping example shown in Figure 7.9, attributeprod_listfromOrderstable contains the list of ordered prod- ucts with product ID and (parenthesized) quantity for each. There- fore, Dividing splits each input order according to its prod_listinto multiple orders, each with a single ordered product (prod_id) and quantity (quantity), as shown in Figure 7.10. Note that in a data mapping diagram the designer does not specify the processes, but only the data relationships. We use the one-to-many cardinality in the association relationships betweenOrders.prod_listand Divide- dOrders.prod_idandDividedOrders.quantityto indicate that one input order produces multiple output orders. We do not attach any note in this diagram because the data are not transformed, so the mapping

7.4. The Data Mapping Diagram 115

is direct.

Figure 7.10: Dividing Mapping

Filtering (Figure 7.11) filters out products not belonging to the computer category. We indicate this action with a UML note at- tached to the prod_idmapping, because it is supposed that this at- tribute is going to be used in the filtering process.

Figure 7.11: Filtering Mapping

Finally,Aggregating(Figure 7.12) computes the quarterly sales for each product. We use the many-to-one cardinality to indicate that many input items are needed to calculate a single output item. More- over, aUMLnote indicates how theComputerSales.salesattribute is calculated from FilteredOrders.quantityand Products.price. The car- dinality of the association relationship between Products.price and ComputerSales.sales is one-to-many because the same price is used

in different quarters, but to calculate the total sales of a particular product in a quarter we only need one price (we consider that the price of a product never changes along time).

Figure 7.12: Aggregating Mapping

At this point we would like to come back to our original statements at the introductory section and discuss briefly our gains by adopting attribute-level modeling.

• We can easily detect inconsistencies, either though some com- putational engine or even by simple observation of the diagram. For example, if the attributes of aDWtable are not populated, then, an inconsistency occurs.

• We can treat the design artifact as a graph [135], where provider and consumer relationships are treated as incoming and outgo- ing edges. In this sense, we can even measure the properties of our modeling in a straightforward fashion. For example, we can measure the aforementioned inconsistencies or we can even highlight hot-spots in our design: for example, in Figure 7.10 we can observe that the attributeOrders.prod_listis responsible for populating more than one target attribute in theDW.

• Both the visualization and the measurement of the design prop- erties can significantly aid the impact analysis that needs to be performed in the presence of changes in the design. As an exam- ple, assume the case where a source attribute is to be deleted, or

7.5. Conclusions 117

the definition of a primary key to be altered. Our data mapping diagrams can easily depict and measure the affected attributes. Hot-spots are really important, in that sense.

7.5

Conclusions

In this chapter, we have presented a framework for the design of the

DWback-stage (and the respectiveETLprocesses) based on the key observation that this task fundamentally involves dealing with the specificities of information at very low levels of granularity. Specif- ically, we have presented a disciplined framework for the modeling of the relationships between sources and targets in different levels of granularity (i.e., from coarse mappings at the database level to de- tailed inter-attribute mappings at the attribute level). Unfortunately, standard modeling languages like theERmodel orUMLare funda- mentally handicapped in treating low granule entities (i.e., attributes) as first class modeling elements. Therefore, in order to formally ac- complish the aforementioned goal, we have extendedUMLto model attributes as first-class citizens. In our attempt to provide comple- mentary views of the design artifacts in different levels of detail, we have based our framework on a principled approach in the usage of

UMLpackages, to allow zooming in and out the design of a scenario. Although we have developed the representation of attributes as first-class modeling elements in UML in the context of data ware- housing, we believe that our solution can be applied in other appli- cation domains as well, e.g., definition of indexes and materialized views in databases, modeling of XML documents, specification of web services, etc.

Part II

Logical Level

Chapter 8

Logical Modeling of Data

Sources and Data

Warehouses

In this chapter, we address the design of the Source Logical Schema, theData Warehouse Logical Schema, and theClient Logical Schema. These diagrams can be defined independently, or they can be derived from the corresponding conceptual models (Source Conceptual Schema,Data Warehouse Conceptual Schema, and Client Conceptual Schema). We use the UML Profile for

Database Designto model these diagrams that define the database struc-

tures.

Contents

8.1 Introduction . . . 123 8.2 The UML Profile for Database Design . 124 8.3 Mapping Classes to Tables . . . 126

8.3.1 Many-to-many Associations . . . 126 8.3.2 Inheritance Hierarchy . . . 126

8.4 Mapping Attributes to Columns . . . . 129 8.5 Mapping Types to Datatypes . . . 129 8.6 Conclusions . . . 131

8.1. Introduction 123

8.1

Introduction

In the previous chapters, we have tackled the conceptual modeling of the data sources and the DW itself. In this chapter, the modeling effort transitions from the conceptual analysis to the logical design of the database. In the following, we will focus on relational databases [26], the most popularDBMSnowadays, and we will leave the study of otherDBMSfor the future.

TheUMLoffers some advantages for the logical database design that are not generally considered in traditional notations. For ex- ample, the UMLprovides full support for modeling generalization and specialization relationships or stored procedures. Moreover, the

UML provides the concept of packages, which logically group the elements of a model in different units.

To achieve the logical modeling of the data sources (Source Log- ical Schema), the DW (Data Warehouse Logical Schema), and the structures used by the final users (Client Logical Schema) we apply theUML Profile for Database Design [90].

There are two basic directions on where to go next. One direction is to build the logical models from the conceptual models by means of a mapping between the different diagrams. The other direction is to build the logical models independently from the conceptual models; however, we advise against this last direction because the advantages of starting from the conceptual level and maintaining a coherent map- ping between the two levels are lost. Therefore, we recommend to build the logical models based on the conceptual models.

There are multiple ways to map models. In our approach, the classes are mapped to tables, attributes to columns, types to datatypes, and associations to relationships. In this mapping process, some sit- uations have to be considered: not all elements in each model will be mapped, e.g., some attributes from the conceptual model may not be represented in the logical model because they are not stored in the database. For example, an attribute called Total_Sales, which represents the sum of multiple columns in the database is not stored because it is just a calculation in the application (it is a derived at- tribute).

The remainder of this chapter is structured as follows: Section 8.2 introduces theUML Profile for Database Design; Section 8.3 focuses on mapping classes to tables; then, Section 8.4 moves into mapping attributes of a class to columns of a table; and Section 8.5 discusses mapping types to datatypes. Finally, Section 8.6 points out some conclusions.

8.2

The UML Profile for Database Design