The methodology based on machinetranslation of sense disambiguated cor- pora achieves values of precision and number of synsets comparable to method- ologies based on bilingual dictionaries for Spanish . To perform a good com- parison we need to further analyse our results to group variants according to the degree of polysemy. For Spanish, our best algorithm (Berkeley Aligner for p > 0.9) performs better than all the criteria presented in  except monosemic- 1 criterion. Nevertheless, our proposal performs worse than their combination of criteria, as they obtain 7.131 with a precision higher than 85%, whereas we only obtain 5.562 variants in the same conditions.
Automatic evaluation methods play, as discussed before, a very important role in the context of MT system development. Indeed, evaluation methods are not only important but they are also an upper bound on the attainable success of the development process itself. In other words, improvements may take place as long as developers count on mechanisms to measure them. Otherwise, the de- velopment cycle is blind. A paradigmatic case of blind development occurred in the Johns Hopkins University 2003 Summer Workshop on “Syntax for Statistical MachineTranslation” (Och et al., 2003) 6 . A team of leading researchers and motivated students devoted 6 weeks to improve a phrase- based SMT system through the incorporation of syntactic knowledge. Although they suggested a rich smorgasbord of syntax-based features, only a moderate improvement (from 31.6% to 33.2% according to BLEU) was attained, which, indeed, came almost exclusively from using the IBM 1 model word alignment probabilities to compute a lexical weighting feature function. They argued two main reasons for this result. First, they observed that syntactic parsers introduce many errors. Second, and most important, they noted that the BLEU metric, which they used for development and test, was not able to capture improvements due to a better syntactic sentence structure.
Machinetranslation is now a mature technology in the translation industry. It is broadly used in the translation sector with a market share forecast of $638 million in 2020 (Van der Meer & Ruopp 2014; Massardo et al. 2016). The developments in neural technology in the field of artificial intelligence have outperformed other competing developments (both rule-based and statistical engines) (Moorkens 2018) to the point that for some authors it is “bridging the gap between human and machinetranslation” (Wu et al. 2016). Beyond the discussion of whether machinetranslation has already reached human parity (Läubli, Sennrich & Volk 2018), it is sound to explore what the real perceptions of professional translators are towards using this technology. Its advantages in speeding productivity and cost reduction have already been proved (Massardo et al., 2016) but now that machinetranslation is increasingly being incorporated in translation processes, the voice of the translator needs to be heard. In the present study, we will address this issue from a specific perspective: that of translators working in the migratory context.
matic acquisition procedures are more feasible. The aim of this paper is to add automatically acquired NEs to the dictionaries of a Rule-Based MachineTranslation (RBMT) system. Specif- ically, we consider the Apertium (Tyers et al., 2010) English–Spanish engine. Around one third of the entries (8,000) in its bilingual dictionary are proper nouns. However, they cover less than 10% of the NEs that appear in the English version of Europarl.
that works in both translation directions. Regarding the press, three newspapers use machinetranslation engines. El Periódico was the first newspaper to simultaneously publish both a Catalan and a Spanish edition. As stated by Fité (2006), language coordinator for the newspaper, machinetranslation has made it possible to produce two identical versions. It is important to highlight that there are not two different editions depending on the target language of the user, but that the contents are equivalent, and the language (be it Spanish or Catalan) is the only thing that varies. News is produced in Spanish and is machine translated into Catalan in just a few seconds, with a sub- sequent human post-editing stage being con- ducted by expert Catalan language editors (Fité, 2006). A similar approach has been adopted by La Vanguardia, one of the most widely read newspapers in Catalonia, and by Segre, a local newspaper edited in Lleida.
Besides, we have decided to raise awareness of machinetranslation and post-editing because it has become a key tool for translation service providers and freelance translators. As it is shown in the ProjecTA report (Torres Hostench et al., 2016: 24), the 47.3 % of the translation services providers surveyed stated that they use machinetranslation as a working tool. However, they state that post-editing represents less than 10 % of their workload, so it seems that companies do not see post-editing as a service equivalent to normal human revision. Throughout the Degree in Translation and Interpreting we have been able to develop new knowledge thanks to subjects such as Information Technology Applied to Translating, Computer-Assisted Translation (CAT) and ICT for Translation and Localisation. In addition, we have received complementary training thanks to the course organized by CITTAC, entitled Advanced Seminar for Trainers and Junior Researchers on MachineTranslation and Post- editing, with the collaboration of Prof. Miriam Seghiri and Prof. Pilar Sánchez Gijón. Finally, due to the fact that we have been awarded a grant from the Ministry of Education, we are running a project parallel to our final degree work called “An approach to machinetranslation (MT) and post-editing (PE) tools from the translator's perspective”, tutored by Dr. Mª Teresa Ortego Antón.
This paper presents a linguistic analysis of a corpus of messages written in Catalan and Spanish, which come from several informal newsgroups on the Universitat Oberta de Catalunya (Open University of Catalonia; henceforth, UOC) Virtual Campus. The surrounding environment is one of extensive bilingualism an d contact between Spanish and Catalan. The study was carried out as part of the INTERLINGUA project conducted by the UOC's Internet Interdisciplinary Institute (IN3). Its main goal is to ascertain the linguistic characteristics of the e-mail register in the newsgroups in order to assess their implications for the creation of an onlin e machinetranslation environment. The results shed empirical light on the relevan ce of characteristics of the e-mail register, the impact of language contact and interference, and their implications for the use of machinetranslation for CMC data in order to facilitate cross-linguistic communication on the Internet.
Research on MachineTranslation (MT) and post-editing (PE) has at- tracted great interest over the last decade, not only among Translation Studies scholars, but also among translation industry stakeholders. TAUS (Joscelyne 2009) market study indicates that 92.23% of the language server providers in- cluded in its study already use or intend to use MT and PE as part of their translation process. However, in the Audiovisual Translation (AVT) market, professional experiences in MT and PE are limited (Volk et al. 2010) and in- dustry voices in favour of MT are just beginning to be heard (Georgakopoulou 2010). Interest in academia has increased in recent years, focussing on the im- plementation of MT and PE in subtitling, in part due to EU-financed projects such as eTITLE (Melero et al. 2006) EU-Bridge (Waibel 2012) or SUMAT (Del Pozo et al. 2012). The promising results of these studies (Fishel 2012; Bywood et al. 2013; Freitag et al. 2013) have encouraged other researchers to study the inclusion of MT in other AVT modes such as audio description (Ortiz-Boix 2012; Fernández et al. 2013).
In 1993 the European Commission was concerned in standarising linguistic technology production in order to speed up the creation of new products and their transfer to other projects. The Expert Advisory Group on Language Engineering Standards, known as EAGLES, was created and an evaluation working group designed a common method to evaluate natural language processing systems and products. For MT technology, evaluations assess systems according to HTC quality standards (consistency, fidelity, wellformedness, etc.), and also guide potential consumers to decide which system to use. Assessment and guiding criteria focus on the suitability of systems to their specific purpose, following the trend of previous approaches such as JEIDA 0 and Van Slype 0 s and especially the ISO 9126 standards (ISO, 1991). Even bad and crummy machine translations are considered to be acceptable if front-end users prefer to postedit them rather than translating from scratch (Church and Hovy, 1993) as Wagner already said in 1985 when considering postedition costs of Systran systems (Wagner, 1985). The evaluation framework for machinetranslation systems is the FEMTI 1 , which guides evaluation according to the context of use (Hovy et al., 2002a); (Hovy et al., 2002b)). For instance, in (Bruckner and Plitt, 2001) the evaluation is set in an evironment where translation memories are used.
MachineTranslation (MT) is directly linked to MT Evaluation since it plays a key role in the MT development cycle in order to improve already existing MT systems as well as to develop new MT strategies. In addition, MT evaluation can also be crucial for MT users, since it may help them find the MT system that best fulfills their needs. As stated in Chapter 1, MT is a very complex task since it implies understanding and producing natural language. Similarly, evaluating MT output also implies performing a complex process: understanding a sentence and decide whether it has been correctly translated Throughout the history of MT Evaluation, several approaches and methodologies have been proposed, developed and used. This chapter aims at providing an overview of MT evaluation, focusing on its different types, as well as discussing their weak and strong points. MT evaluation has been classified into two main types, non-automatic evaluation (section 2.1) and its subtypes (section 2.1.1 and 2.1.2) and automatic evaluation (section 2.2). Special emphasis is placed on the latter and its two main approaches: MT evaluation without reference translations (section 2.2.1) and MT evaluation using reference translations (section 2.2.2). Since MT evaluation using references is the framework for the research presented in this thesis, this type of automatic evaluation will be analysed in detail, presenting the different MT metrics available nowadays and the information they use. Finally, section 2.3 draws some conclusions.
In the current improved system, the grammatical category was provided by the Freeling tool (Carreras, Chao et al. 2004), and this information was then used to solve some of the problems found in the baseline system. The grammatical category can be used either in pre- or post-processing rules, or in the translation model as a decoder-based solution. While the former includes solutions for apostrophes, clitics, capital letters at the beginnings of sentences, the relative pronoun cuyo and polysemy disambiguation, the latter becomes useful for the homonymy disambiguation and the lack of gender concordance. A detailed description of the problems solved by this grammatical category-based approach is presented below.
However, Apertium is not ready to be used by many concurrent clients out-of-the-box. For in- stance, it cannot run in coordination on many computers and, for each translation to be per- formed, spends a relatively high amount of CPU time loading resources. Fortunately, there are some free/open-source applications which mitigate these problems: ScaleMT (S´anchez- Cartagena and P´erez-Ortiz, 2010) and Apertium- service (Minervini, 2009). The first one was cho- sen because its architecture allows easily running the service on multiple servers and it provides a lower response time when processing many con- current requests (S´anchez-Cartagena and P´erez- Ortiz, 2010). Apertium and ScaleMT are de- scribed in detail below.
In spite of the considerable efforts and resources directed over recent decades toward making MT automatic, the most commercially successful systems still rely on human intervention, usually at the pre- and post- MT stages. Pre-editing is appropriate especially for texts of a technical or purely information-conveying nature, where style considerations are limited to clarity, as opposed to literary and even journalistic texts, in which the simplification inevitably resulting from pre-editing would often entail a loss of crucial stylistic features vital to an interpretation which goes beyond the bare-bones message. A psycholinguistic study of PE, Krings (2001) was done without controlled language, i.e. pre-editing, but Krings argues that even if the latter improves PE, it adds significant time and effort to the overall translation task. Post-editing, on the other hand, means that stylistic features are retained and therefore amenable to translation, depending on the time available and the wishes of the person who ordered the translation.
One of the most common criticisms towards Rule- Based MachineTranslation (RBMT) regards the amount of work necessary to build a system for a new language pair (Somers, 2003). In fact, in a traditional scenario, linguists with expertise in the source and target language need to manually build all the dictionary entries and transfer rules. Conversely, in the Statistical MachineTranslation (SMT) approach (Koehn, 2010), no such effort is required as the system can be automatically built from parallel corpora. However, this approach is only applicable for those language pairs for which big amounts of parallel text are available.
Schwartz, Karen D. 2013. How mobile technology is driving language translation. FEDTECH, October 21. http://www.fedtechmagazine.com/article/2013/10/how- mobile-technology-driving-language-translation (accessed October 29, 2013). Stanford Natural Language Processing Group. MachineTranslation.
Additionally, although ISO 17100:2015 appears to be more receptive to technological advances than EN 15038:2006, some may not consider it to be the definitive solution. ISO 17100:2015 provides definitions for machinetranslation (MT) and post-edit and includes the first element as one of the possible technologies and the second in the list of added value services. However, the standard clearly states that “[t]he use of raw output from machinetranslation plus post-editing is outside the scope of this International Standard” (ISO, 2015: 1). A more recent ISO standard, ISO 18587:2017 Translation services–Post-editing of machinetranslation output–Requirements (ISO, 2017), is intended to undertake the task of standardizing MT usage and, more specifically, of post-editing, an activity that will be addressed at greater length in section five of this paper. On reading the new standard it immediately becomes apparent that it shares a number of similarities with 17100, as can be seen, for example, in the different phases of the process—pre-production, production and post-production (ISO, 2017: 5-7), although with some substantial changes—or as occurs with the competences the post-editor is presumed to have (ISO, 2017: 7), which only show some slight modifications in the way they are defined. Nevertheless, it can be said that the new standard is an advance in terms of regulating post-editing as a possible (and necessary) element in the translation workflow: as reflected in the standard, the use of post-editing makes it possible to save costs, speed up delivery times and, in short, translate documents that would otherwise be impossible to translate (ISO, 2017: v).
In the current society, translation is needed for many different aspects. Translation engines are the perfect solution for those who do not have the necessary resources to translate from one language to another. However, these tools are also part of professional translators’ equipment and allow them to achieve a higher productivity. The purpose of this research is verifying if four different machinetranslation engines offer an acceptable product when translating a descriptive-promotional beauty text. In order to reach our objective, we analyse the results by means of a classification of errors adapted from MQM (multidimensional quality metrics). The final results will allow us to detect the main translation errors.
However, we can understand each other. Forcada advocates applying the European directive called “Responsible Research and Innovation” (RRI) (1). The directive recommends that all actors in the research and innovation chain, including users, be involved. The EU proposal known as MT@EC (MachineTranslation at European Commission) is a step in that direction. MT@EC, also known as eTranslation, is presented by the EU as a free service designed to offer secure translation for public administrations in the EU and its Member States, as well as EU institutions and agencies. For the time being, it works best with texts about issues related to the EU. It is, without a doubt, a way of reinforcing and bringing to the forefront, once again, one of the fundamental objectives of the EU: to protect multilingualism.
According to the records of the Biblioteca nacional de España and Catálogo colectivo del patrimonio bibliográfico español, De la Dehesa’s translation was published in Alcalá, Oficina de la Real Universidad, in 1807, the first being also the only edition. However, there are 29 copies registered in the Catálogo colectivo del patrimonio bibliográfico español spread throught several libraries and universities, which shows that it was well received. De la Dehesa begins his translation with a brief introduction into the ideas of the beautiful and the sublime, giving information taken from the French Encyclopedia and declaring to have found a French translation of the Enquiry at the University of Alcalá written anonymously in 1763. Right after that he compares his translation with the French one, apparently the other translator made his translation too scientific and strained; however, De la Dehesa’s translation manages to get a better approach and successfully reaches a balance between correctness and naturalness, at least in his opinion.
Kaggle es una plataforma web dedicada al Data Science y Machine Learning en donde se albergan competiciones y datasets, entre otras cosas. El problema elegido ha sido el propuesto por una competición llamada “Ghouls, Goblins, and Ghosts... Boo!” en la que se propone entrenar un modelo de Machine Learning para clasificar un conjunto de datos que describe tres clases de criaturas encantadas: demonios, duendes y fantasmas. La competición proporciona tres archivos CSV: