Regardless of whether we’re dealing with individual human beings or com-mercial entities, we typically capture our customers’ name and address attrib-utes. The operational handling of name and address information is usually too simplistic to be very useful in the data warehouse. Many designers feel that a liberal design of general-purpose columns for names and addresses, such as Name-1 through Name-3 and Address-1 through Address-6, can handle any situation. Unfortunately, these catchall columns are virtually worthless when it comes to better understanding and segmenting the customer base. Design-ing the name and location columns in a generic way actually can contribute to data quality problems. Consider the sample design in Table 6.1 with general-purpose columns.
In this design, the name column is far too limited. There is no consistent mechanism for handling salutations, titles, or suffixes. We can’t identify what the person’s first name is or how she should be addressed in a person-alized greeting. If we looked at additional sample data from this operational system, potentially we would find multiple customers listed in a single name field. We also might find additional descriptive information in the name field, such as “Confidential,” “Trustee,” or “UGMA” (Uniform Gift to Minors Act).
In our sample address fields, inconsistent abbreviations are used in various places. The address columns may contain enough room for any address, but there is no discipline imposed by the columns that will guarantee conformance with postal authority regulations or support address matching or latitude/
longitude identification.
Table 6.1 Sample Customer Dimension with Overly General Columns
DIMENSION ATTRIBUTE EXAMPLE VALUES
Name Ms. R. Jane Smith, Atty
Address-1 123 Main Rd, North West, Ste 100A
Address-2 P.O. Box 2348
City Kensington
State Ark.
ZIP Code 88887-2348
Phone Number 888-555-3333 x776 main, 555-4444 fax Customer Relationship Management 147
Instead of using a few general-purpose fields, the name and location attributes should be broken down into as many elemental parts as possible. The extract process needs to perform significant parsing on the original dirty names and addresses. Once the attributes have been parsed, then they can be standard-ized. For example, “Rd” would become “Road” and “Ste” would become
“Suite.” The attributes also can be verified, such as validating that the ZIP code and associated state combination is correct. Fortunately, name and address data cleansing and scrubbing tools are available on the market to assist with parsing, standardization, and verification.
A sample set of name and location attributes for individuals in the United States is shown in Table 6.2. We’ve filled in every attribute to make the design clearer, but no single real instance would look like this row.
Table 6.2 Sample Customer Dimension with Parsed Name and Address Elements
DIMENSION ATTRIBUTE EXAMPLE VALUES
Salutation Ms.
Informal Greeting Name Jane
Formal Greeting Name Ms. Smith
First and Middle Names R. Jane
Surname Smith
Suffix Jr.
Ethnicity English
Title Attorney
Street Number 123
Street Name Main
Street Type Road
Street Direction North West
Post Box 2348
Suite 100A
City Kensington
District Cornwall
Second District Berkeleyshire
State Arkansas
Region South
Country United States
(Continues) C H A P T E R 6
148
Table 6.2 Continued.
DIMENSION ATTRIBUTE EXAMPLE VALUES
Continent North America
Primary Postal ZIP Code 88887
Secondary Postal ZIP Code 2348
Postal Code Type United States
Office Telephone Country Code 1
Office Telephone Area Code 888
Office Telephone Number 5553333
Office Extension 776
FAX Telephone Country Code 1
FAX Telephone Area Code 888
FAX Telephone Number 5554444
E-mail address [email protected]
Web Site www.ABCGenIntl.com
Unique Customer ID 7346531
Commercial customers typically have multiple addresses, such as physical and shipping addresses; each of these addresses would follow much the same logic as the address structure we just developed.
Before leaving this topic, it is worth noting that some organizations maintain the complete set of name and address characteristics in their customer dimen-sion in order to produce mail-ready addresses, as well as support other com-munication channels such as telephone, fax, and electronic mail, directly from the data warehouse. Here the data warehouse customer dimension becomes a kind of operational system because it is the enterprise-wide authority for valid addresses. This is most likely to happen when no other operational system has taken responsibility for consolidating customer information across the enter-prise. In other cases, organizations already have decided to capture solicitation and communication touch points in an operational system. In these environ-ments, the customer dimension in the warehouse may consist of a more reduced subset of attributes meaningful to analysis, as opposed to the complete set of attributes necessary to generate the mailing labels or call list details.
International Name and Address Considerations
Customer geographic attributes become more complicated if we’re dealing with customers from multiple countries. Even if you don’t have international
Customer Relationship Management 149
customers, you may need to contend with international names and addresses somewhere in your data warehouse for international suppliers or human resources personnel records.
When devising a solution for international names and addresses, we need to keep the following in mind, in addition to the name and address parsing requirements we discussed earlier:
Universal representation.The design should be consistent from country to country so that similar data elements appear in predictable, similar places in the customer dimension table.
Cultural correctness.This includes the appropriate salutation and personal-ization for a letter, electronic mail, or telephone greeting.
Differences in addresses.Different addresses may be required whether they’re foreign mailings from the country of origin to the destination coun-try (including idiosyncrasies such as presenting the destination city and country in capital letters), domestic mailings within the destination coun-try, and package delivery services (which don’t accept post office boxes).
The attributes we described earlier are still applicable for international names and addresses. In addition, we should include an address block attribute with a complete valid postal address including line breaks rendered in the proper order according to regulations of the destination country. Creating this attribute once in the staging process, based on the correct country-by-country address formation rules, simplifies downstream usage.
Similar to international addresses, telephone numbers must be presented dif-ferently depending on where the phone call is originated. We need to provide attributes to represent the complete foreign dialing sequence, complete domestic dialing sequence, and local dialing sequence. Unfortunately, the complete foreign dialing sequence will vary by country of origin.
We have barely scratched the surface concerning the intricacies of interna-tional names and addresses. For more detailed coverage, we recommend Toby Atkinson’s book on the subject, Merriam-Webster’s Guide to International Busi-ness Communications (Merriam-Webster, 1999).