When information systems (computer applications) need to collaborate and exchange information, semantic integration at the application level should be considered to support the task.
The key concern in semantic integration is how to make different applications understand, communicate with, and cooperate with each other. From the architectural perspective, three kinds of methods can be employed to achieve this goal.
(1) Pre-designed interface and information flow
This is fairly common in traditional software development, where a complete concept system (may be implicit) is established first, which provides different components of the architecture a common understanding for the domain of discourse. Based on the shared concepts, the interfaces and information flows for the components are thoroughly determined, therefore each component knows exactly what information it will receive, who will send it information, what the received information means, what information as a result should be sent out by itself after it does some operations on the received information according to its internal business logic, and whom to send.
The following figure shows an architecture example, where each component is an executable unit (with the necessary supporting environment) such as class, sub-procedure, program package, Web Service, or even independent application.
In such architecture, remarkable human intervention is required when knowledge and business are subject to change. Data structures may be re-defined, interfaces and information flows may be modified, programs may be rewritten or new programs need to be added.
(2) Interact with standard interfaces
This is a popular method in today’s software development. A typical example is Web Service. In such architecture, a “central” component provides specific services through standardized interfaces. The service is designed based on pre-defined rules and requirements. It does not care who will use the service and how they will use it. Other components know exactly what the services mean, the semantics of the exchanged information, and the definitions of the interfaces, so they can access the services via standard calls, and get information returned that they need. In some cases, other components may need to access a registration center to discover the characteristics of the services (like looking up telephone numbers from the yellow pages). The following is an example of this architecture:
A B
C
D
E
In such architecture, components can join or exit freely, which will never affect the functionality of the entire system as long as the services keep working. Human intervention can be reduced significantly. Only configuration specifications for the service side (server) and parameter settings on the accessing side (client) are required (here we do not consider the workload of developing the client components themselves). Any change in one client component will never have any impact on others. Flexibility and extensibility of the whole system are well supported.
(3) Establish interaction between anonymous components
This is an ideal status. In this architecture, no predefined interfaces and information flows are required. The system works based on its member components automatically finding other services, understanding them, and making use of them. In the following sample architecture, there is no central role and the curved arrows represent automatic interactions among components without human intervention. For example, an application needs to find the lowest price for a specific type of car for a customer through the Internet, and it will try to contact websites that offer the price information (the websites are changing, e.g. new ones coming and old ones stopping running), gather information, sort them, then determine the result and return it to the customer. This scenario depends heavily on semantic descriptions provided for each system’s information. The interactions occur in an arbitrary manner.
Service A
D
C
B
It looks like a kind of peer-to-peer system but it's not the same. In a typical peer-to-peer system, the interfaces and semantics of information exchanged among peers are strictly determined before the system starts working. What makes such systems flexible is that they allow any peer to join freely to provide service or exit freely at any time without crashing the systems. However, what we emphasize in a semantic integration problem is that there is no pre-defined interface and information semantics.
Actually, to make such systems work, initial human interventions are still required, but it can be minimized. For example, if A needs to interact with B, only very basic information like the IP address and port number of B should be provided by developers or users. Then, A will intelligently discover the semantics of the services provided by B, learn the manner to communicate with B, and cooperate with B to carry out some tasks. Note that in this case some common agreements are still necessary for the components to understand each other, such as some basic definitions for the concepts and business logics in a specific domain.
The mechanism discussed above looks like UDDI 14. However, there are still differences. Traditional UDDI technology focuses on a standard interface definition. From the definition the applications can only get to know how to invoke a service. The semantics of the service itself, the invoking parameters, and the returned values
14 http://www.uddi.org/pubs/uddi_v3.htm A B E C D
remain unaware for the applications. Human interventions are required to interpret the service and develop applications that really “understand” the semantics.
The interactions between applications require a supporting environment, which tries to eliminate semantic conflicts, facilitate converting the information with semantics outside of the applications and minimize the possible modification to them. From the viewpoint of implementation, we have to develop a semantic integration mechanism that is accessible for all applications, as shown in the following figure:
The rectangle between A and B acts as a translator to execute the necessary conversion for the input and output of A and B based on their semantics. The simplest case is, if A output speed data in Miles/Hour, and B can only receive a speed data in
Kilometers/Hour, then the translator will do the calculation on the exchanged data to integrate semantics of A and B. Both A and B don’t need any modification to themselves.