Oracle introduced its MDM products ten years ago, starting first with programs for managing customer and product data, and ending up with solutions for data management, called Customer and Product Data Hubs. The whole idea for developing applications in MDM area started internally when Oracle’s E-Business Suite was dealing with customer data quality issues. They first developed program to manage customer data model, called Oracle Customers Online, and shortly after its release they built Oracle Advanced Product Catalogue, another program for the same suite to manage product data. Adding data quality, source system management, and application integration capabilities, these two products grew into the Oracle Customer Data Hub and the Oracle Product Hub. Major breakthrough on the MDM market happened when Oracle acquired Siebel and Hyperion Data Relationship Management (DRM). After releasing Customer and Product hub, Oracle expanded its MDM line of products to Finance, Site and Supplier Hubs (Butler, 2011).
Oracle is currently focused on developing fusion versions for its existing Hubs. These fusion applications are combination of SOA and MDM. They provide integration, management and distribution of master data among applications from external systems. So far, Customer, Product and Accounting Fusion Hubs are available on the market.
45
MDM solutions from this vendor will be discussed through several products from the Oracle MDM Suite. Below is a list of applications that belong to the Oracle MDM portfolio (Oracle
Master Data Management. Retrieved March 5, 2012, from
http://www.oracle.com/us/products/applications/master-data-management/master-data- management-ds-075053.pdf, 2010, p. 1)
Enterprise Data Quality; Oracle Customer Hub; Oracle Product Hub; Oracle Supplier Hub; Oracle Site Hub;
Oracle Higher Education Constituent Hub; Hyperion Data Relationship Management.
Data modeling Oracle’s MDM products come with already predefined data models for each entity. Users don’t have the ability to start from blank database, but what they can do is update already existing tables in the master repository with new columns. Data models that Oracle uses are based on the Trading Community Architecture (TCA) model. Oracle Trading Community Architecture (TCA) is a data model that allows users to manage complex information about customers, organizations and customer’s accounts. The base of this model is used and readjusted when designing models for other types of domains such as product, site etc. Tables in the master repository have standardized names; each starting with HZ prefix followed by the name of entity which attributes are stored. For example, HZ_PARTIES stores data for Parties, HZ_CONTACT_POINTS for party’s contact points etc. This database is of relational type, organized in tables (entity), columns (attributes) and relationships (hierarchies) (Oracle Trading Community Architecture, 2006, p. 1).
Figure 22: List of predefined tables for Customer entity
46
Data import Since Oracle provides predefined data model, data is imported in the HZ tables discussed earlier. There are several different ways for data import: (1) SQL/ETL Load, (2) D&B Load and (3) File Load (Oracle Trading Community Architecture, 2006, p. 8 - 18):
(1) SQL/ETL Load: data is first extracted with scripts or tools, values are transformed to meet the data requirements of the interface tables, and afterwards data is loaded; (2) D&B Load: data is prepared by D&B sent in standard D&B bulk file which is next
run through the D&B Import Adapter and automatically mapped and loaded into the interface tables;
(3) File Load: data is loaded from a comma-separated value (CSV) file, or file delimited by another allowed character with Oracle Customers Online (OCO) or Oracle Customer Data Librarian (CDL).
Before loading data in the master repository, data is first imported into staging tables, matched and cleansed and afterwards imported in the interface tables. The staging tables are copies of the existing tables and are temporary storage for the external data that is being imported in the repository. Even after importing data, TCA runs post import processes for data standardization. There are various data transformations such as: name conversions to meet database standards, replacement of letters in phone numbers, removing of NULLs etc. Data export Data for certain entity can be exported in Excel spreadsheet. However, for data distribution to external applications Oracle uses cross referencing. Cross referencing is approach that assigns unique key IDs for each record in the central repository and maps it to the appropriate record from the external systems (similar to the key mapping discussed in the SAP solution earlier)
Figure 23: Example of cross reference between PARTIES (master table) and SYS_REFERENCES (external systems)
Source: Better Information through Master Data Management – MDM as a Foundation for BI, 2011, p. 9 With the help of Application Integration Architecture (AIA), data can be shared with other application through web services. This enables external applications from different platform to receive managed data from Oracle MDM Hubs.
47
Since every hub that Oracle MDM offers has its own domain of concern and different architecture, there is also difference in the cross referencing processes. There are two possibilities for cross reference: one way and two way cross reference. In the first approach data flow occurs one way, from Hub to external applications. This means that data is managed and updated only in the Hub and afterwards sent out to the other systems. This approach is used in the Product Hub. The two way cross reference is implemented in the Customer Hub and data flow is managed in two directions: from hub to external systems and vice versa. Data is managed in the hub, but can be also updated in the external systems and sent to the master repository for import. Changed data that is sent from external systems needs to pass predefined validations before is loaded in the central database. This type of data sharing gives freedom to external systems to use managed data from Oracle Hubs without major changes made to their legacy applications (Cross-Referencing for Master Data Management with Oracle Application Integration Architecture Foundation Pack, 2008, p. 5). Data validation As mentioned earlier, data is checked for errors right after being imported in the repository. There are several techniques that Oracle uses to validate, cleanse and manage data. Most of them are similar to the ones discussed in the previous MDM solutions. Data validation techniques are based on transformation, matching and merging and are part of the Data Quality Management (DQM), mechanism for managing data found in the TCA model.
Figure 24: Example of data validation workflow
Source: Data Quality Management, 2012, p. 9
Oracle MDM examines data through several steps (Data Quality Management, 2002, p. 2 - 25):
(1) Step one - Transformation functions. These functions include characters or blank space replacement, removing double letter, or any other data changes that will achieve certain standards throughout the database. Also, Oracle uses word replacement which replaces similar word variations with one standard word. Often times user enter
48
different data for the same item. In some cases they use item’s full name, in others the shortcut. To avoid such irregularities MDM gives one name per item. If before Slovenia was entered in the system as Slovenia, SLO or SI, MDM can replace all these variations with SLO. Word replacement and transformation functions can show some duplicate data throughout the system which was hard to be detected before; (2) Step two - Match rules. Matching is done in the similar manner as discussed in the
earlier solutions. Because user cannot be familiar with each and every record, the best way to detect match is when user defines some matching points that need to be accomplished. Oracle names these values as thresholds, and based on such limits user can define if two records are match or no match. Thresholds should be defined as some average value, between 40 and 60, because small value for threshold may return results that are not a match, whereas high value may exclude a lot of data and some possible match among;
(3) Step three - Duplicate identification and merging. Once data is transformed and certain match rules define, subsets of data can be prepared for duplicate identification. Oracle DQM provides batch jobs that compare records from different groups, looking for duplicates. Each record from one dataset is matched to all records from the other datasets. Batch jobs can run for a long time if there are a lot of records for comparison. That’s why often times is better for user to define subset of records and apply batch job to these subsets. Once duplicates are done, then merging of same records follows. The old record that is merged is deleted because it is already merged to the new one.
An extra feature that Oracle MDM provides in the validation process is monitoring and managing data decays. Data repositories contain big loads of obsolete data that accumulates over some period. However, an enterprise cannot always delete this data because it may still use it for analysis of historical transactions. Oracle MDM supports this data lifecycle by monitoring data decays and flagging the active and passive data. What this tracking allows is marking the current active data and making it available for the live applications. Data that is not used anymore is flagged as passive and stored in remote locations. It’s not accessible by external applications and it’s usually used for reporting.
Data security Oracle provides robust and precise security model that gives user rights to work with certain data or hierarchies. It is based on roles and authentication and as in any other security model, administrators have all the rights. The security is set on a granular level that event controls user access for different versions of data.
Advantages and disadvantages of Oracle MDM are shown in table 7. Even though there are several disadvantages regarding Oracle domain specific solution as well as the inability to support collaborative MDM still there are workarounds that can complement these deficiencies.
49
Advantages Disadvantages
Supports several domains: Customer, Product, Account, Site.
Doesn’t cover all domains. Prebuilt data models so users don’t have to
start from blank database
Data models can be modified, but not built from scratch for completely new entity
Keeping copy of the data in staging tables prevents errors on import.
Hubs are domain dependable, for each new domain Oracle launches new Hub.
Versioning data saves archives of data changes. Oracle focuses more on data cleansing and deduplication, but doesn’t offer great support for setting up rules to govern data.
Different ways to import data. Collaboration is excluded; only operational and analytical implementation styles are supported. (Collaborative implementation style may be implemented in the new line of advanced MDM hubs, called Fusion Hubs)
Cross reference data sharing which makes Oracle compatible with different kinds of external systems (include non Oracle based). Monitoring data decays, and decreasing the data load based on active and passive data. Automatic batch processes for faster and more efficient duplicate identification.
Based on the various features discussed earlier, Oracle MDM suite can join to IBM and SAP MDM solutions with its complex architecture and the various functionalities that offer data management. Organizing data management in different hubs, based on the type of entity, is of great help for users because they don’t have to purchase the whole suite but only those applications that are needed for managing their data. Also, with the use of Application Integration Architecture and web services, these “parts” of the suite can be easily integrated with other applications from different platforms and vendors. Prebuilt data models are also of great help, so they can give users more time for data validations instead of creating data model, something that comes with the Oracle software. Another advantage that is worth for mentioning are the plenty transformation rules and matches that help in data management. Also, the organization of the data in versions and comparing it among different versions, allow users to keep track of changes over time, but still don’t make these versions as separate data sets that don’t have any connections between themselves.
However, it seems that Oracle tries to facilitate the job for users by giving them everything prebuilt. Constraints can appear because of this flexibility. First, not all domains are supported. Database structure is already given and users may have to make changes in their own systems before loading data in the MDM repository. Defining Hubs for each domain is different approach than IBM and SAP. It seems that each Hub functions as independent applications. Also, data governance is on lower level, in some cases Oracle Hub seems like passive registry that servers cleansed data to external systems but doesn’t do match to keep it clean and managed.
50
Overall, main domains are covered. For special lines of business these Hubs can be readjusted. And with the new Fusion hubs that are already launched on the market, Oracle strengthens the collaborative methods and evolves to multi-dimensional MDM suite on a single least cost of ownership.
4. ANALYSIS OF SELECTED MASTER DATA MANAGEMENT
ARCHITECTURES
There are several aspects of comparison that can be considered when analyzing the discussed MDM architectures. I’ve chosen three approaches: (1) data quality dimensions, (2) three dimensional model and (3) the four data management phases which cover: consolidate, cleanse, govern and share. These three approaches give an overview of the problem, solution model and management processes described through the selected MDM solutions. The first approach covers data quality issue which is common problem in every enterprise data. The second approach is based on the three dimensional model that gives general view on MDM solution. And the last approach summarizes different management techniques that are present in each of the selected products.