Tramo 6 – Entrada a Alicante
2. EXAMEN DE ALTERNATIVAS
2.1.5. Generación de Corredores
Figure 1.1: Ontology evolution scenario
1.2
Information integration
Information integration is one of the oldest classes of applications where matching is viewed as a plausible solution. Under the information in- tegration heading, we gather here such problems as schema integration [11, 212, 219, 192], data warehousing [26], data integration (also known as enterprise information integration, EII) [44, 233, 65, 114], and catalog integration [1, 121, 31, 99].
A general information integration scenario is presented in Figure 1.2: given a set of local information sources (Local Ontology 1, Local Ontology 2 ) potentially storing their data in different formats, e.g., SQL DDL, XML, or RDF, provide users with a uniform query interface via the mediated (or global) ontology Common Ontology to all the local information sources. This allows users to avoid querying the local information sources one by one, and obtain a result from them just by querying a common ontology.
For example, if a user poses the query find a book about ontology match- ing to a common ontology, then, an information integration system commu- nicates with local information sources, e.g., www.amazon.com, www.bn.com,
1.2. INFORMATION INTEGRATION CHAPTER 1. APPLICATIONS
Figure 1.2: A general (centralized) information integration scenario
and returns a reconciled result to the user based on the input provided by those sources. In general, there are a number of macro steps that the information integration system has to perform. These include:
• interpret (rewrite) the query in terms of the common ontology;
• identify the correspondences between semantically related entities of the local information sources and the common ontology;
• translate the relevant data instances of the local information sources (involved in handling the user’s request) into a knowledge representa- tion formalism of the information integration system;
• reconcile the results obtained from multiple local information sources, namely detecting and eliminating, e.g., redundancies, duplications,
CHAPTER 1. APPLICATIONS 1.2. INFORMATION INTEGRATION
before returning the final answer.
Most often a step of identifying the correspondences between seman- tically related entities of the local information sources and the common ontology is referred to as matching. Let us limit our vision of matching to the description above for the moment. We will expand it to some extent in the next sections.
In some concrete information integration scenarios, the common ontol- ogy can be either physically existing or virtual. Below, we discuss these scenarios in some detail.
1.2.1 Schema integration
Schema integration is the oldest scenario [11, 212, 221, 220, 192]. Suppose, two (or more) enterprises want to perform either a merger or an acqui- sition among them. Ultimately, these enterprises have to integrate their databases into a single one. Usually, a first technical step is to identify cor- respondences between semantically related entities of the schemas. This step is known as matching. Then, by using the identified correspondences, merging the databases is performed. The matching step is still required even if the databases to be integrated are coming from the same domain of interest, e.g., book selling, car rentals. This is because the schemas have been designed and developed independently. In fact, humans follow diverse modeling principles and patterns, even if they have to encode the same real world object. Finally, the schemas to be integrated might have been de- veloped according to different business goals. This makes the matching problem even harder.
Under the schema integration heading we can classify some other scenar- ios. For example, (tightly-coupled) federated databases [212]. These typ- ically have one global schema providing a unified access to the federation
1.2. INFORMATION INTEGRATION CHAPTER 1. APPLICATIONS
of component databases. Component databases, in turn, are autonomous. Thus, in this application when, for example, one component schema of the federated database is changed, the federated (global) schema has conse- quently to be also reconsidered. Matching can help in identifying those changes.
Finally, it is worth noting the applications which we are not discussing here, e.g., distributed database systems [185]. These are usually designed in a centralized way, e.g., by a database administrator, and therefore, se- mantic heterogeneity does not exist there by construction [70].
1.2.2 Catalog integration
In Business-to-Business (B2B) applications, trade partners store informa- tion about their products in electronic catalogs. Typical examples of cat- alogs are product directories of electronic sales portals, such as Amazon or eBay. In order for a merchant to participate in the marketplace, e.g., eBay, it has to determine correspondences between entries of its catalogs and those of a single catalog of a marketplace. This process of finding correspondences among entries of the catalogs is referred to as the cata- log matching problem [31]. Notice that if we look at this problem from a merchant viewpoint, matching has to be performed for each marketplace it would like to participate. Having identified the correspondences between the entries of the catalogs, they are further analyzed in order to generate query expressions that automatically translate data instances between the catalogs. Finally, having matched the catalogs, users of a marketplace have a unified access to the products which are on sale. The above described scenario involving interactions between marketplaces and merchants can be viewed as a typical example of integrating local data sources into a data warehouse, see also [26].
CHAPTER 1. APPLICATIONS 1.2. INFORMATION INTEGRATION
product classifications, such as UNSPSC3 (The United Nations Standard
Products and Services Code) and eCl@ss4 (Standardized Material and Ser-
vice Classification). In a sense, we can view this scenario as one which en- ables interoperability among multiple B2B marketplaces, thus, facilitating product exchange schemas between the enterprises subscribing to different product classifications [207]. This is to be achieved by establishing the correspondences between semantically related entities of the standardized product classifications, which is a matching operation as well.
1.2.3 Data integration
Data integration is an approach where integration of information coming from multiple local sources is performed without first loading their data into a central warehouse [114]. This allows interoperation across multiple local sources having access to the up-to-date data. Notice that in the above considered catalog integration scenario, merchants are those who have to perform updates of the central warehouse of the marketplace. In this scenario the data integration system provides this functionality.
The scenario is as follows. First, local information sources participating in the application, e.g., bookstore, cultural heritage, are identified. Then, a virtual common ontology is built. Queries are posed over the virtual common ontology, and are then reformulated into queries over the local information sources, e.g., in the cultural heritages application, these might be catalogs of museums. In order to enable semantics-preserving query answering, correspondences between semantically related entities of the local information sources and the virtual ontology are to be established. Establishing these correspondences is known as a matching step.
Query answering is then performed by using these correspondences (map-
3http://www.unspsc.org 4http://www.eclass.de