• No se han encontrado resultados

As we normalize, we consider each user view as a relation. In other words, we con- ceptualize each view as if it is already implemented in a two-dimensional table. The first step in normalization is to choose a primary key from among the unique identifi- ers we find in the relation.

Recall that a unique identifier is a collection of one or more attributes that uniquely identifies each occurrence of a relation. In many cases, a single attribute can be found. In our example, the customer number on the invoice uniquely identi- fies the customer data within the invoice, but because a customer may have multiple invoices, it is inadequate as an identifier for the entire invoice.

When no single attribute can be found to use for a unique identifier, we can con- catenate several attributes to form the unique identifier. You will see this happen with our invoice example when we split the line items from the invoice as we nor- malize it. It is very important to understand that when a unique identifier is com- posed of multiple attributes, the attributes themselves are not combined—they still exist as independent attributes and will become individual columns in the table(s) created from our normalized relations.

In a few cases, there is no reasonable set of attributes in a relation that can be used as the unique identifier. When this occurs, we must invent a unique identifier, often with values assigned sequentially or randomly as we add entity occurrences to the database. This technique (some might say “act of desperation”) is the source of such unique identifiers as social security numbers, employee IDs, and vehicle identification num- bers. We call unique identifiers that have real-world meaningnaturalidentifiers, and those that do not (which of course includes the ones we must invent)surrogateorarti- ficialidentifiers. In our invoice example, there appears to be no natural unique identi- fier for the relation. We could try using customer number combined with order date, Color profile: Generic CMYK printer profile

but if a customer has two invoices on the same date, this would not be unique. There- fore, it would be much better to invent one, such as an invoice number.

Whenever we choose a unique identifier for a relation, we must becertainthat the identifier willalwaysbe unique. If there is onlyonecase where it is not unique, we cannot use it. People’s names, for example, make lousy unique identifiers. You may have never met someone with exactly your name, but there are people out there with completely identical names. As an example of the harm poorly chosen unique iden- tifiers cause, consider the case of the Brazilian government when it started register- ing voters in 1994 to reduce election fraud. Father’s name, mother’s name, and date of birth were chosen as the unique identifier. Unfortunately, this combination is only unique for siblings born ondifferentdates, so as a result, when siblings born on the same date (twins, triplets, and so on) tried to register to vote, the first one that showed up was allowed to register, and the rest were turned away. Sound impossible? It’s not—this really happened. And to make matters worse, citizens arerequiredto vote in Brazil and sometimes have to prove they voted in order to get a job. Someone should have spent more time thinking about the uniqueness of the chosen “unique” identifier.

Sometimes a relation will have more than one possible unique identifier. When this occurs, we call each possibility acandidate. Once we have identified all the pos- sible candidates for a relation, we must choose one of them to be the primary key for the relation. Choosing a primary key isessentialto the normalization process be- cause all the normalization rules reference the primary key. The criteria for choosing the primary key from among the candidates is as follows (in order of precedence, most important first):

If there is only one candidate, choose it.

Choose the candidate least likely to have its value change.Changing primary key values once we store the data in tables is a complicated matter because the primary key can appear as a foreign key in many other tables. Incidentally, surrogate keys are almost always less likely to change compared with natural keys.

Choose the simplest candidate.The one that is composed of the fewest number of attributes is considered the simplest.

Choose the shortest candidate.This is purely an efficiency consideration. However, when a primary key can appear in many tables as a foreign key, it is often worth it to save some space with each one.

For our invoice example, we have elected to add a surrogate primary identifier called Invoice Number. This gives us a simple primary key for the Acme Industries invoices that is guaranteed unique because we can have the database automatically assign sequential numbers to new invoices as they are generated. This will likely

152

Databases Demystified

Color profile: Generic CMYK printer profile Composite Default screen

make Acme’s accountants happy at the same time, because it gives them a simple tracking number for the invoices. There are many conventions for signifying the pri- mary key as we write the contents of relations. Using capital letters causes confusion because we tend to write acronyms such as DOB (date of birth) that way, and those attributes are not always the primary key. Likewise, underlining and bolding the at- tribute names can be troublesome because these may not always display in the same way. Therefore, we’ll settle on the use of a hash mark (#) preceding the attribute name(s) of the primary key. Rewriting our invoice relation in list form with the pri- mary key added, we get the following:

INVOICE: # Invoice Number, Customer Number, Customer Name, Customer Address, Customer City, Customer State, Customer Zip Code, Customer Phone, Terms,

Ship Via, Order Date, Product Number, Product Description, Quantity, Unit Price, Extended Amount, Total Order Amount

Documento similar