sI irEüAS DE TIERRA - EgE fif;H - CAtI. EtrruDIO DE CARGAS EI,ECTRICAS PARA. DIVISION DE INS

EgE fif;H

6.4. sI irEüAS DE TIERRA

In the previous section we described a variety of translation techniques that can be used to localize an ontology to the linguistic level. We also showed that all these approaches have some advantages and disadvantages with re- gard to discovering the more appropriate translation of an ontology element. With such a wide range of term translation approaches, it would be beneficial to have an effective strategy for combining these models into a localization system that carries many of the advantages of the individual techniques and suffers from few of their disadvantages.

From a technical point of view, the diﬀerent translation models can be seen as the building blocks on which a ontology localization solution is built. In particular, the following aspects of building a working localization system are considered in this section:

• organizing the composition of various translation algorithms (section-

5.8.1).

• combining the results of the basic translation algorithms to discover

the more appropriate translations for each ontology element (section- 5.8.2).

As the diﬀerent translation methods focus on the same objective there are several dependencies between them. Nevertheless, certain combinations

are not reasonable: basically because the output of a translation algorithm in some cases can not be used as input for other methods.

5.8.1 Translation Composition

In this section we present at the strategic level, some natural ways to combine diﬀerent translation algorithms. We choose to use multiple translation techniques to localize an ontology based on the following assumptions:

• Using more than one translation approach/resource gives us a wider

range of word translation candidates to choose from, and the correct translation is more likely to appear in multiple translation resources than a single translation resource. Translating the ontology elements using multiple diﬀerent translation resources gives us the possibility of minimizing very pronounced outliers.

• Using multiple translation approaches/resources gives us the possibil-

ity of maximizing the use of all available resources, allowing broad applicability to a range of operational settings. In fact, there are few resources that cover special terminology and fashionable terms. So, an ontology localization system should be ready to use whatever is available.

The translation composition proposed in this thesis is inspired on all empirical studies described in literature about multi-engine MT or MEMT architecture which operates by combining outputs from diﬀerent translation engines. In order to classify the translation strategies, a distinction can be made as to whether translation paradigms are triggered in parallel or

sequential.

Parallel composition

In a parallel strategy, each translation algorithm is fed with the source ontology element and generates an independent translation. The translations are then collected from their output and (manually or automatically) recom- bined. While there is an element of redundancy in such approaches given that more that one algorithm may produce the correct translation [Way, 2001], one might also treat the various outputs as comparative evidence in favor of the best overall translation.

The parallel combination of translation algorithms to localize an ontology is illustrated in Figure 5.3. In the ﬁgure, the ovals represent the context extracted from lexicon and core ontology, and the translation results are represented as concentric circles. Notice that the term context used for disambiguate the candidate translations depends on the techniques used to obtain the translations.

Figure 5.3: Parallel combination of translation algorithms.

Sequential composition

In this approach, two or more translation algorithms are triggered on different sections of the same source ontology element. The output of the different techniques is then concatenated without the need for further pro- cessing. For instance, one would like to first use a dictionary based translation (section 5.4) to discover candidate translations, before running one translation based on corpus (section 5.5) or ontologies (section 5.6) to select final translations. The reasoning behind this approach is that if one knows the properties of the translation algorithms involved, reliable translations can be produced by using fewer resources than in a parallel approach. In- tegration of knowledge-based techniques with corpus-based techniques is a common strategy in commercial translation.

The sequential combination of translation algorithms for localizing an ontology is illustrated in Figure 5.4. Note that this sequential process can be used to eliminate the need of multilingual resources in the final stages of the localization process. Thus, in this setting, the final translation de- cision benefits from the candidate translations obtained by the first algo- rithms. Indeed, the second translation algorithm (translation′) can use only a monolingual resource to select the more appropriate translations. As in the parallel composition of translation algorithms, the term context (con-

text ) extracted from the ontology depends on the technique used in each

step of the sequential translation process.

5.8.2 Translation Combination

When several translation algorithms or resources are combined, a crucial problem is to choose a translation among multiple translations produced for each algorithm. Note that this problem must be considered even though we combined algorithms that follow the same translation process. Translation algorithms adopting the same paradigm usually produce diﬀerent transla-

Figure 5.4: Sequential composition of translation algorithms.

tions for the same input, due to their diﬀerences in training data, or prepro- cessing strategies. Therefore the question we want to address in this section is, how do we go about choosing among translation algorithm outputs so that we end up with the best one?

Traditionally, translation combination has been conducted in two ways:

black-box combination and glass-box combination [Huang and Papineni, 2007].

To choose a speciﬁc translation, the black-box combination method basically uses the external information of each translation approach. The information can be extracted from the general conﬁdence that we have in the method, the input text to be translated, or the output produced by the translation method. This approach can be particularly useful when it is not possible to have access to internal features of the translation approaches (e.g., online MT systems).

In the glass-box combination, each translation algorithm provides de- tailed decoding information, such as translation model score, phrase and word probabilities, segmentation lattices31, or alternative translations per source word [Huang and Papineni, 2007]. This information is used to re- combine the best parts from multiple candidate translations into a new ut- terance that will be better than the best of the given candidates. The main advantage of these approaches is that a possibly new translation can be gen- erated that includes “good” partial translations from each of the involved algorithms.

Some of the well-known combination methods used in MT, such as lin- ear combination, hypothesis selection, noisy channel models, confusion net- works, and lattice combination can be used in ontology localization. The ﬁrst two methods use a black-box combination, while the other approaches use a glass-box combination. A description of all the combination methods used in the ﬁeld of MT is out of the scope of this thesis. However, for more details, cf. Nie et al. [Nie et al., 2001]; Callison-Burch and Flournoy [Callison-Burch and Flournoy, 2001], Nomoto [Nomoto, 2004], Paul et al. [Paul et al., 2005]; Brown et al. [Brown et al., 1990]; and Park [Park, 2001], Matusov et al. [Ma-

tusov et al., 2006], Sim et al. [Sim et al., 2007], Rosti et al. [Rosti et al., 2007b, Rosti et al., 2007a, Rosti et al., 2008].

5.9 Classiﬁcation Guidelines for Ontology Local-

In document CAtI. EtrruDIO DE CARGAS EI,ECTRICAS PARA. DIVISION DE INSENIffiIAS. HArcT,D HTIHBERÍO PAIOü'ÍINO }IAYOR CENTRAI,ES TEI. (página 153-166)