LA APLICACIÓN DE LA LEY PENAL EN EL TIEMPO

The Dynamic Data Vault is an operational Data Vault with dynamic adaptation to the structure. In other words, the tables, columns, indexes, and keys are all subject to change – automatically. Of course to achieve this state requires a constant vigilant watch on the metadata, including but not limited to incoming structures. The incoming structures may include XSD, XML, staging tables, or other metadata (including queue based or process metadata) that describe the structure of the incoming data set.

The dynamic nature of the Data Vault means: new attributes may be added to Satellites, new Links and new Hubs may be formed on the fly. ETL /ELT loading code will be adjusted automatically, and BI Query views will also inherit certain changes. At the end of all the automatic model changes, emails of the changes are sent to the IT staff for review in the morning.

Supe

er Charge Yo

an Linstedt 2

Common

Data Vault st king, and que ughout the H uence numbe ord creation d st of these fie erated/maint itable as they mbers are two Data Vault w ehouse (in a mp). The load

trollable syste ciple does no

our Data Ware

2010-2011, a

n Attributes

tructures (ta erying. The c Hubs, Links a ers (line item dates, and re elds are EDW tained; as a y do not exist o cases that a works on the

single batch) d dates enfor em date time ot apply is du

ehouse

all rights rese

s

bles) contain common attr

nd Satellites m numbers), l ecord sources W (enterprise d

result, the da t in the sourc are auditable principles sim ) is stamped rce audit trai e available to uring real-tim

Figure 3-1: T

erved n standard at

ibutes in the s. The comm oad dates, lo s.

data wareho ata in these c ce system. H e particularly milar to geolo

with a “geolo ls and record o the EDW loa e feed proce

Time Series B

ttributes that Data Vault a on attributes oad end-date

use) system columns are However, reco

when they e ogical layerin ogical time b d history bas ading routine essing.

Batch Loaded

http:/

t assist with t are defined h s include: seq es, last seen

defined, and “reference d ord creation exist in the so ng where data based layer” ( sed on the on

d EDW system data” and are

dates and lin ource system a arriving in t (a load date

The gle batch with son) it is nece oval, replace eate the aud blems are not

l-time data lo val time. Rea ncy of arrival

sactions per

l-time data a gruent with ti gle time const ehouse. As d s between co

ssists with a h the same lo

essary to exa ement, or aug

it trail of the t often discov oads are trea al-time latenc being less th r second. An

rrival time-st ime intervals tant does not data loads sh onstant time

uditability an oad date time amine all row

gmentation to data for that vered for wee ated different

cy is typically han one min example ima

Figure 3-2 R tamping appe s or time-span t represent a hift to incorpo (batch loads eks or month tly. Real-time defined as m ute. Real-Tim age of real-tim

Real-Time Arr ears similar t ns. It can be any fixed laye orate real-tim ) and continu

ility by stamp he loading pr

oaded during me Loading is me data stam

ival, Data Ge to layers of p e grouped tog er of informat me data feeds uous time (re

ping all partic rocess fails m g that proces e only mecha side note the have occurre are stamped

val in a data s commonly mps is shown

eology pebbles on th

gether for ana tion in the en s (also known eal-time loads

cipating rows mechanically ss; resulting i anism availab ese mechanic

ed.

d based on m loading que defined in te n in Figure

3-he beach. Da alytic purpos nterprise data

n as trickle fe s) blurs.

Super Charge Your Data Warehouse Page 45 of 152

Sequence numbers are required by relational database management systems (RDBMS) in order to process joins quickly and efficiently. Without sequence numbers the joins across huge amounts of information would operate comparatively slowly (compared to character based joins). The use of sequence numbers as primary keys for Hubs and Links also eliminates any possible issues maintaining multi-part cascading keys in Satellites or nested Link tables.

Staging area sequences are stored within the staging area. These sequences should be restarted and set to cycle over for each load to a specific table. Staging sequence numbers are utilized only to identify loaded duplicates. Staging area sequences should not ever leave the staging area, and should not be moved forward into the Data Vault.

Duplicates are rows that have 100% completely the same data - from the keys, to the nulls, to the descriptive fields. When the data is 100% duplicate, there needs to be a way to delete the rows from the staging table in order to proceed with loading only one unique copy to the target Data Vault.

Without a sequence number, there is no unique identifier on each row. With a sequence number it is easy to “pick” the first or last row as the candidate to leave in place and delete the rest.

Before deleting the duplicates – the Metrics Vault should record a history of how many duplicates there are per staging table per business key. By counting the duplicates auditability can be maintained if the IT staff is ever asked to reproduce the source load. The number of duplicates multiplied by one row provides the recreation with an accurate picture. In other words, a Cartesian join product is applied in order to reproduce the original duplicate row set.

Hub and Link sequence numbers are created 1 for 1 with each unique business key and unique association inserted to the respective table. Satellite sequence numbers are generally parent table sequence numbers, in other words they are inherited from the Hub or Link parent table.

It is a recommended practice to setup sequence numbers to be number(12). In Oracle there

appears to be no byte-storage difference between a number(12) and a number(38). Most sequence numbers will fit within this length, and will not require double or floating point math to resolve at query time.

Sequence numbers in the Data Vault should never be shown to business users, and must not leave the Data Vault going forward. First, sequence numbers are meaningless numbers which are there simply to provide uniqueness to the rows they represent. Second, the numbers are there merely for JOIN purposes at high rates of speed. Third, if I ask you: “please tell me what number 5 means to you?” Can you define it? Can you make sense of it? No. It’s a meaningless NUMBER. There is no context.

The sin of this is that once you expose the sequence number to the business – they will forever attach that “customer/product/employee/service” or what-ever-it-is to the number you give them.

Meaning that they give it context, they force it to mean something to the business! Now, you (as IT) no longer have the right or the ability to “change/alter/destroy and rebuild” that number, nor are you allowed to attach different rows to that number.

This will cause problems for future re-loading, re-building, or even fixing the Data Warehouse, regardless of the data modeling technique you choose! DON’T DO THIS, DON’T EXPOSE SEQUENCE NUMBERS TO THE BUSINESS… EVER!

In document DERECHO PENAL I, PRIMER PARCIAL (página 27-31)