Localization is a very effort intensive activity and requires a systematic ap- proach covering the entire life-cycle of the localized product [Mudur and Sharma, 2002]. However, based on our investigation of existing academic projects and commercial systems [Esselink, 2000, M¨uuller, 2009, Jevsikova, 2009], we have identified that the current R&D efforts on localization (es- pecially in the software area) suffer from the lack of a comprehensive life cycle model. We consider that the ontology localization is not a once-in- a-lifetime activity. It should be viewed as a continuous, iterative activity in which the localization outcomes of the current and past localizations can and should affect the future choice of localization policies and strategies and, thus, the behavior of an automated localization system. A comprehensive localization life cycle model is needed to clearly define the different phases of a localization process and to show:
• what information and knowledge should be specified or defined at dif-
ferent phases, and,
• how the results of the ontology element translations provide the feed-
back to other phases of the life cycle.
In this section we present an ontology localization model, which identifies the key concepts and elements needed to build an automated ontology local- ization system. One of the elements in the model is the translation phase, which in many analogous implemented software localization systems is not automated. In the previous chapters of this thesis, we study the key ele- ments of the translation phase, with the dual aim of reducing the localization effort and identifying the steps to produce a general ontology localization model. In fact, the translation phase used to localize an ontology has been the core of our ontology localization life-cycle model.
6.1.1 The Automated Ontology Localization Model
The ontology localization life-cycle model is presented in Figure 6.1. This generic model depicts the major issues involved in the automating ontology localization activity. Our approach is inspired on different software life- cycle models [Sheu, 1997,Rajlich and Bennett, 2000,Ruparelia, 2010,Wright, 2011], which are used to illustrate the significant phases or activities of a software project from conception until retirement.
Although the order of steps presented in the model is logical, we believe that different ontology localization systems may use a different order, may group two or more steps into a single step or may not implement certain steps at all. The model is also independent of who actually performs the work. For example, if the ontology developer is using a distributed and
collaborative team for localizing an ontology, many steps will be performed by the developer and others by the localization team. The value of the model is that it covers the major issues involved in this activity and provides a vocabulary to discuss these issues.
In the figure the main phases are represented by a rectangle, whereas the sub-phases are represented by ellipses. The thick line represents the main process flow; the secondary process flow is represented by a solid line. The data access is shown as a dotted line in the figure.
Figure 6.1: The Automated Ontology Localization Life-Cycle Model. The proposed ontology localization life-cycle model is concerned mainly with managing the translation and localization of the ontology content into any number of target languages. In the following section we describe the main components involved in the model:
6.1.2 Automated Localization Cycle
The ontology localization cycle describes phases of the localization activity and the order in which those phases are executed. Each phase produces deliverables required by the next phase in the life cycle:
• Change Detection. This phase monitors the source ontology content
and it is responsible for detecting changes and initiating actions. We believe that change monitoring may operate continuously or at regular
intervals. In addition, this phase starts the Workflow Management process, which is responsible for distributing the work to one or more translators/reviewers in one or more localizations.
• Extraction. Each ontology term requires its own extraction method
from the ontology. The extraction method is responsible for extracting the ontology labels (representing any ontology element) and its context from the ontology.
• Segmentation. Once the label of the ontology element is extracted, it
must be segmented into individual short phrases or multiword units1 (MWU) in order to be translated appropriately.
• Leveraging. The Leveraging phase tries to translate all source labels
using the translations stored in previous ontology localizations. It may use one or more translation memories to store the pre-translated on- tology labels. This phase can be performed only when ontologies to be translated have a similar domain to ontologies previously translated.
• Work Distribution. Once the ontology localization activity has been
initiated, the work must be distributed to one or more translators/revie- wers in one or more localizations. This process is carried out by the
Workflow Management process. The systems should provide some
form of database which stores a list of translators and reviewers along with the language pairs they can handle. We consider that when the ontology to be localized is small and the target languages are known the own ontology editor may execute all tasks.
• Translation. In this phase, the translator actually translates the on-
tology labels received using the localization resources provided by the system or its own tools if the system can interface with them. This is likely the most important step since the main cost of localization is translation and the cost of translation is largely determined by the efficient of an environment provided to the translator. The translator may work online with a browser-based tool or offline on his desktop PC. However, the offline method requires some mechanism to update the realized work.
• Review. The translation work is then routed for reviewing (editing and
proofing). The work is checked for translation accuracy and for overall term correctness. The system should allow any way of measuring the translation quality.
1A multiword unit (MWU) is a connected collocation: a sequence of neighboring words
“whose exact and unambiguous meaning or connotation cannot be derived from the mean- ing or connotation of its components” [Choueka, 1988].
• Linguistic/Cultural Updating. The goal of this task is to update the on-
tology with the linguistic information obtained for each ontology term in the target language. The result of this process is a multingual ontol-
ogy, which expresses the correspondences between entities belonging to
the source ontology and the multilingual terms pertaining to a natural language. This phase may require only the adaptation of the ontol- ogy to a particular language or an ontology re-engineering process for transforming the conceptual model of an existing and implemented on- tology into a new, more correct and more complete conceptual model which is re-implemented. It is at this time that the localization re-
sources (translation memories, glossaries, etc) are updated and that Localization Resource Maintenance is best performed.
6.1.3 Data Structures.
All steps shown in the model revolve around two major data structures:
Workflow and Localization Resources. The aim of the Workflow reposi-
tory is to help manage, monitor and control the localization activity, while the Localization Resources repository helps to reduce the cost, increase the quality and increase the consistency of the translation work. They store the basic objects of the ontology localization activity: the participants, and the tools and resources, respectively. These objects require management and maintenance with the appropriate activities:
• Workflow Management. This activity refers to the process of defining
and maintaining the workflow templates that specify which steps are to be processed by users or by the system, and the conditions under which they are processed. Some ontology localization systems will have wizards with only a few questions to answer, others will require several pages of options to be set, still others will have graphical interfaces that allow for a process to be defined as a flowchart.
• Translation Resources Maintenance. The more work that is routed
through the ontology localization system, the more translation knowl- edge is accumulated, promoting more re-use. But as more and more data is accumulated, the system will also accumulate different transla- tions for the same ontology elements. As translation knowledge grows, it becomes less precise and contains more “noise”. Therefore trans- lation resources maintenance is required to avoid the chaotic growth of translation knowledge and ensure that the captured data can be leveraged in a meaningful way.
All steps above described are the base of our generic architecture for localizing ontologies and distributed and collaborative environments. The details of our approach will be described in section 6.3.