• No se han encontrado resultados

To implement the matchers proposed in this work, Liberate extends the Open Informa- tion Integration (OpenII) Harmony framework [112]. OpenII Harmony is an open source schema-matching framework that was implemented in collaboration between the MITRE Corporation and several industrial (e.g., Google, Microsoft, IBM, Yahoo) and academic collaborators (e.g., University of Wisconsin, University of Pennsylvania, University of Cal- ifornia, Berkeley).

OpenII Harmony is a mature and extensible project. It supports multiple schema for- mats (e.g. XSD) and multiple output formats (e.g., spreadsheets, and cvs) for matching results. In addition, it provides high-end GUI that allows users to refine the suggested map- pings, confirm or reject matches, add annotations, and specify transformation functions. All these features make it suitable to deal with the complex cloud schemas that come with different formats, spread into multiple files, and may require non-trivial transformations.

Our implementation uses the OpenII loaders, mappers and code-generator. It also extends the matcher module by adding four extra matchers: three element-based match- ers (i.e., web-semantic-matcher, gloss-based-matcher, and domain-knowledge-matcher) and one structural matcher (i.e., the modified similarity flooding matcher). All the implemented element-based matchers extend the matcher class. They have been implemented to work as composite matchers that can work togather and with other matchers implemented within

the OpenII framework. This facilitates combining and comparing the results of different matchers.

Figure 6.6: Configuring a Composite Matcher

Figure 6.6 shows how the implemented matchers can be used as composite matchers together and with other matchers within OpenII.

The WSM has been implemented to use several search engines. One of the main issues that we had to deal with is to reduce the number of calls to the search engines’ APIs. This is because, calling a search engine API and waiting for a response is the most time consuming operation in the WSM process. Moreover, most search engines limit the number of calls per second and the maximum number of calls per month or day for each user. Assuming that the two schemas S1 and S2 have the same number of elements (n), the number of search requests required by the WSM process, is given by:

TotalSearchRequestswithoutcache = 3 × n2

To reduce this number, we implemented a caching strategy. Using caching, we were able to reduced the number of calls by order of three as shown in the following equation:

TotalSearchRequestswithcache = n × (n + 2)

While from a theoretical point of view, the number of calls will still have a polynomial (quadratic) growth with respect to the number of elements in the schemas, the gain achieved in practice is worth the effort.

On the other hand, the Gloss-Based Matcher has been implemented by extending the sentence-to-sentence similarity from the SEMantic simILARity (SEMILAR) toolkit. Then, the GBM has been integrated with the OpenII framework. The SEMILAR toolkit includes libraries that facilitate textual preprocessing (e.g., collocation identification, part-of-speech tagging, phrase or dependency parsing, etc.) and semantic similarity computation at both word-level and sentence-level [141]. In our implementation of the GBM, we used these libraries to implement a modified version of the Mihalcea, Corley, and Strappavara (MCS) method for sentence-to-sentence similarity.

Finally, the modified similarity flooding matcher has been implemented in a separate module, and combined as a pipeline architecture.

Figure 6.7: Applying the GBM on AZURE Service Definition and StratusML Core Schemas

Figure6.7 is a snapshot that shows the results of applying the gloss-based matcher on two cloud schemas: a public schema that belongs to Microsoft Azure Service Definition and a private schema that corresponds to the StratusML core meta-model. The figure shows three of the filtration methods that have been used to reduce the noise of the false positive matches:

(i) Set focus: This filter refines the results by showing only the relationships between the concept selected from the first schema, including its sub-tree elements and all the corresponding matches from the other schema. For example, Figure 6.7 shows how the set focus filter has been used to refine the matching results of the GBM matcher to show only the “Task” component and its corresponding matching elements on Azure Service Definition.

(ii) Depth: This filter is used to show only schema elements that appear at a particular nested depth. The filter appears at the bottom of Figure 6.7, where we used it to show the matching results at depth one in the Azure Service Definition and depth five in StratusML schema. Filtering based on the depth can help alleviate some of the structural differences between the schemas.

(iii) Evidence threshold: Shown on the right side of Figure 6.7, this filter refines the matching results below a certain threshold. In the figure, this filter has been set to show the matching results with evidence greater than or equal to 0.45.

Figure 6.7 shows that using the GBM and the selected filters, it was possible to re- veal the similarity between the Azure “WorkerRole” and the StratusML “WorkerTask” concepts. The figure shows a mapping with confidence (evidence = 0.48).

The next section discusses the tests performed to evaluate the proposed system. To make this research reproducible, our modified OpenII framework implementation, in ad- dition to the datasets used in this chapter and the outcome matching results have been made publicly available online at the Liberate webpage3.