• No se han encontrado resultados

Now, on to the heart of darkness in normalization: dependencies. Explaining dependencies is best done by example, so I will manipulate some of the examples of this book into examples of dependencies. The ones you've seen so far have no unwanted dependencies.

First, some background. Codd developed the original approach to normalization in some very early relational theoretical articles [Codd 1970, 1972]. The basic idea those articles developed was to provide guidance for relation design in the presence of certain data manipulation anomalies caused by poor structure.

Note

The whole normalization approach starts with a complete database design in place. You need to have all your attributes defined somewhere to be able to normalize the tables in which they appear through analyzing the dependencies. While this is a very formal, consistent, and mathematical approach, it provides little guidance to the database designer creating a database. UML and the methods that use it, on the other hand, guide you the whole way through the process with design methods, prescriptions, patterns, and metrics (one of which is the degree of normalization).

When you determined that a dependency between data elements existed, you would break out the dependent data into a separate table, eliminating the dependency. Codd called this process decomposition and structured it as a series of normal forms. As the computer scientists and mathematicians began developing these early ideas, they grew into ever-more generalized forms. Ultimately, this results in the end of history: the proof that the "join" dependency is the most general form of dependency, and that therefore fifth normal form is the highest form of normalization [Fagin 1979]. Which is just as well, since people who could explain fourth normal form barely adequately began having real trouble with fifth normal form. Having admitted my limitations, let's forge ahead. A functional dependency occurs when a column value determines the value of another column in the same table. You've already seen the most common functional dependency: the key. A primary key determines the values of the other columns in its table, as does every candidate key. So every table you create with object identity, implicit or explicit, exhibits a functional dependency.

A slightly more subtle but very common functional dependency occurs when you represent a to-many relationship in a single table along with all the related data. For example, take the relationship between Person and Identification in

Figure 11-1. If you've been following along, you know that the standard way to represent this relationship is to create two tables, a Person table and an Identification table (and all the subclasses as separate tables as well). Instead, consider what a single Person table might look like if you included the information in the Driver License and Passport tables in it.

CREATE TABLE Person (

PersonID INTEGER PRIMARY KEY, Name VARCHAR2(100) NOT NULL,

Sex CHAR(1) NOT NULL CHECK (Sex IN ('M', 'F')), DateOfBirth DATE NOT NULL,

DateOfDeath DATE, Height NUMBER, Weight NUMBER,

Marital Status CHAR(1) CHECK(Marital Status in ('S', 'M', 'D', 'W')) Comment VARCHAR2(2000), DriverLicenseNumber NUMBER, LicenseIssueDate DATE, LicenseExpirationDate DATE, PassportNumber NUMBER, PassportIssueDate DATE, PassportExpirationDate DATE, IssuingOffice VARCHAR2(100))

Intuitively, you can see what's happened: you've combined three objects into a single table, the person, the driver's license, and the passport. Mathematically what's happened is that you've introduced five functional dependencies: LicenseIssueDate and LicenseExpirationDate both depend on DriverLicenseNumber. The three passport-related columns depend on PassportNumber, assuming that the two numbers are unique.

This is a subtle example of a functional dependency because the structure assumes that there is a one-to-one relationship between person and driver's license and between person and passport. Therefore, PersonID is still the key of the Person table, and you don't have any serious problems with updating or removing data. If a person gets a new passport, you just update those columns, for example. However, consider the more general model that Figure 11-1 represents, and what would happen if you tried to represent it directly. Since a person can have a varying number of multiple IDs, the best way to combine all this information in one table is to have one row for each combination of person and ID:

CREATE TABLE Person (

PersonID INTEGER NOT NULL, Name VARCHAR2(100) NOT NULL,

Sex CHAR(1) NOT NULL CHECK (Sex IN ('M', 'F')), DateOfBirth DATE NOT NULL,

DateOfDeath DATE, Height NUMBER, Weight NUMBER,

MaritalStatus CHAR(1) CHECK(MaritalStatus in ('S', 'M', 'D', 'W')) Comment VARCHAR2(2000), IDNumber NUMBER, LicenseIssueDate DATE, LicenseExpirationDate DATE, PassportIssueDate DATE, PassportExpirationDate DATE, IssuingOffice VARCHAR2(100)

CONSTRAINT Person_PK (PersonID, IDNumber))

I'm going to stop at this point because we're already far down a road that you don't want to travel. This table is a mess. You've mixed together all kinds of data at varying multiplicity. This is a terrible way to represent the world

you're trying to model. It results in, for example, two Person records for each person with a Passport and Driver's License. In the Passport-related row, all the Driver's License columns are null. In the Driver's License-related row, all the Passport columns are null. More importantly, you duplicate all the Person data in each of the two rows. When you add in identifications of different sorts along with their specific columns, it just gets worse. Now, when you update a Weight, for example, you have to update it in two different rows. This is Codd's update anomaly.

You solve this problem by representing the objects in a straightforward way in separate tables, which is third normal form or Boyce-Codd normal form (BCNF), depending on the exact structure. Normalization theory starts with one huge table and decomposes it into pieces using the functional dependencies, which represent the clusters of facts around object identity and relationships between objects. You can now see why OO and ER design, which represent objects and relationships directly, tend not to need much normalization. You directly represent your functional dependencies and structure your model from the start into decomposed relations because you represent associations directly.

But I digress. It gets worse.

A multivalued dependency occurs when a table is the join of two of its projections having a shared subset of columns [Fagin 1981, p. 390]. The functional dependency is a special case of the multivalued dependency. Again, let's explore the concept by example rather than focusing on the relational mathematics. Consider the

CriminalOrganization in Figure 11-2, and consider what might happen if you modeled the particular criminal operations that the people in the organization conducted (their "businesses"). The following code adds a Business Type table that contains the kinds of businesses a criminal organization might operate, and the Criminal Business table lists the people and the businesses they run.

CREATE TABLE CriminalOrganization (

OrganizationID INTEGER PRIMARY KEY REFERENCES Organization, LegalStatus CHAR(1) NOT NULL,

Stability CHAR(1) NOT NULL,

InvestigativePriority CHAR(1) NOT NULL, ProsecutionStatus CHAR(1) NOT NULL); CREATE TABLE BusinessType (

BusinessTypeID INTEGER PRIMARY KEY, TypeName VARCHAR2(100));

CREATE TABLE CriminalBusiness (

OrganizationID INTEGER REFERENCES CriminalOrganizaton,-the mob PersonID INTEGER REFERENCES Person,-the mob member

BusinessType INTEGER REFERENCES BusinessType,-the criminal business PRIMARY KEY (OrganizationID, PersonID, BusinessID));

There is nothing obviously wrong with this model until you understand the underlying semantics of the data. Businesses are run by organizations, not by individuals within the organization. The Criminal Business table

represents this, but it throws in the person as well. Table 11-1 presents the data for the Criminal Business table (I've replaced the implicit object-identifying integers with the relevant names for clarity; the names refer to several Holmes stories):

The multivalued dependency here stems from the fact that the PRIMARY KEY constraint on the Criminal Business table is the full set of primary keys from the three related tables Person, Criminal Organization, and Business Type. However, if you know the organization, you know the businesses it operates, and you know the people in the organization. But people and businesses are independent of one another. Thus, you have two columns that depend on one other column but are independent—a multivalued dependency that is not on the primary key of the table. The Business Type table is a join of two projections, the Organization-Person projection and the Organization-Business Type projection.

Table 11-1: The Criminal Business Table

Organization

Name Person Name

Business Type Name

Moriarty

Documento similar