(1)Can raw machine translation ensure inclusion

(1)

Can raw machine translation ensure inclusion? The case of public-health information in Catalonia

Anthony Pym (Universitat Rovira i Virgili) Nune Ayvazyan (Universitat Rovira i Virgili) Jonathan Prioleau (Universitat Rovira i Virgili) Version 3.2. August 2022

Abstract: As Catalonia becomes increasingly multicultural as a result of immigration on many levels, official government communications are received by speakers of numerous different languages. For some languages, this is achieved through the employment of qualified translators.

For the non-official languages, however, there is increasing reliance on the use of free online machine translation, explicitly without human correction (“post-editing”). Here we survey the use of Google Translate on the official website of the Catalan health service, focusing on Covid- 19 vaccination information in 2021-22. We identify the strategic advantages of machine

translation and then compare them with main communication problems in terms of

comprehensibility and actionability. It is concluded that most of the communication problems can be solved by a better understanding of machine-translation literacy, by the judicious use of translation-memory software and efficient post-editing, and most especially by writing texts in such a way that the machine-translation problems are solved before they appear (“pre-editing”).

In the relative absence of policies for non-official languages, serious strategic planning is required in order to ensure that the benefits of machine translation work towards a more inclusive society, rather than alienate users who merely see their language being abused.

Keywords: machine translation, post-editing, pre-editing, social inclusion Introduction

The Covid-19 pandemic presented new challenges for many social practices, among them basic language policies. Traditional territorial policies would seek to ensure the rights of long-term social groups to official-language status over a temporal dimension measured in generations: in the case of Catalonia, this fundamentally concerns relations between Spanish, Catalan and Aranese. In time of a pandemic, however, communication is needed with all the people actually living in a territory, no matter what their language, since collective well-being depends on actions such as limiting mobility, wearing masks, and accepting vaccination. Territorial policy thus has to address complex issues of mobility and inclusion (cf. Grin 2022). Only when all the people in the territory change their behavior can collective well-being be enhanced to the fullest extent. So what happens when traditionally territorial policy meets an urgent communication challenge that involves mobility and inclusion? Here we look at the communication practice of the Catalan government health service, particularly its public website, where the prime solution to this particular problem was to use raw machine-translation output.

Within the field of healthcare communication, vaccination information is of particular interest in that it requires a high degree of trust on the part of the receiver of the communication (Pym 2020b; Pym & Hu 2022). The complexity of the raw medical information is such that the

(2)

general public does not have direct access to it in terms of interpretative skills. The actual risks are thus very difficult for the individual to assess, and in this case the issue was further

complicated by the circulation of conspiracy theories for all tastes. There are few kinds of communication that are thus so dependent on the perceived trustworthiness of not only the message, but more particularly the sender of the message. We are thus particularly interested in the perceived trustworthiness of texts that are visibly mediated by unedited machine translation.

How do users actually process vaccination information?

Previous research

Our own interest in these questions stems from data collected within the large research project Mobility and Inclusion in Multilingual Europe (2014-18), where we were looking at the

mediation strategies used by speakers of marginalized or minoritized languages. Asylum seekers were found to be using machine translation in order to access official information in detention centers in both Leipzig (Fiedler & Wohlfarth 2018, 279-280) and Ljubljana (Pokorn & Čibej 2018, 297-298). This typically involved short-term communication solutions such as looking up key terms prior to a visit to the doctor, thus enhancing communication in the host-language, and as a tool for language learning. In our interview study of the Russian community in Tarragona- Salou (Ayvazyan & Pym 2018), we found that 78% of our 50 respondents reported using machine translation, especially the younger community members, even though they were generally aware of the possible errors. Such widespread use of machine translation is sprinkled with occasional comments that the resource could be used to check on human mediators who were not entirely trusted (Ayvazyan & Pym 2018, 350; Pym 2018, 261). Machine translation might make mistakes, but it offers relatively independence (the user is in control of the input) and confidentiality (what you say might not be reported to an authority). Many of the asylum seekers come from countries where public officials are quite likely not to be unbiased and might indeed operate as spies (Allaby 2018). User-initiated machine translation thus offers clear advantages over mediation via interpreters, for example, with respect to speed, cost, user independence and ostensible confidentiality (Pym 2018, 260-261). These virtues are to be counterbalanced by a lack of translation quality, which the user may or may not perceive adequately.

This pragmatic use of machine translation would appear to have become a general social phenomenon. It has been calculated that human translation accounts for less than 0.69% of the words translated in the world (Pym and Torres-Simón 2021, working from Wood 2018),

although little is known about how well or badly all those users actually interact with the output.

The need for some basic training in the area is underscored by growing attention to machine- translation literacy, which would include knowing when and where not to use machine

translation and how to negotiate clear translation errors (cf. Bowker 2009a, 2019b; Bowker and Buitrago Ciro 2019).

There is some evidence on the way machine translation is used in provider-initiated healthcare communication. A literature review by Al Shamsi et al. (2020) compares 14 studies on communication solutions in healthcare and finds that cost and time delays are major factors in medical consultations. They further conclude that online translation tools present viable solutions in some situations and can be combined with the provision of interpreting services. Flores (2005) reviews 36 studies on English-language health services and concludes that “multiple studies document the positive impact that both trained, professional interpreters and bilingual providers

(3)

have on LEP [Limited English Proficiency] patients’ quality of care” (2005, 255). A large study by Lindholm et al. (2012) suggests that the savings of professional mediation are most

pronounced when interpreters are present at admission only or at both admission and discharge;

there may be no savings at all when there is a less targeted use of human mediation (cf.

Wallbrecht et al. 2014). This means that in the more run-of-the-mill low-stakes medical encounters, there are situations when provider-initiated machine translation could provide workable solutions, alongside intercomprehension, medical staff who speak something of an L2, and volunteer mediators such as family members. As Al Shamsi et al. (2020) conclude, machine translation is being indeed being used in low-stakes exchanges, although there is still little empirical evidence of its actual effects.

A similar literature review by O’mara and Carey (2019) looks at seven recent studies on government information for culturally and linguistically diverse communities in Australia.

Although the survey generally finds that effective strategies mix translation and interpreting services with other communication strategies, the reviewed studies that included machine translation were almost exclusively restricted to the education field. There was very little empirical evidence on the actual effects of machine translation: “At present, it is not clear

whether information technology is effective for translating government information” (2019, 19).

These studies are to be placed against a background of industry claims that parity has been reached between neural machine translation and human translation (Hassan et al. 2018).

That assessment is nevertheless based on assessment of the content (not form) of isolated sentence pairs, where non-professional users could not distinguish between the two with significant frequency. Real-life usage tends to concern texts rather than isolated sentences (Läubli, Sennrich and Volk 2018), and harm can come from error rather than an incapacity to judge.

Considerable academic attention has been paid to how translators correct raw machine- translation output (“post-editing”) (see, for example, the special issue of the Journal of

Specialised Translation in 2019), which would constitute a mode of machine-translation literacy that requires specialized training to be carried out effectively (Nitzke, Hansen-Schirra and

Canfora 2019). With respect to the quality achieved by post-editing, Fiederer and O’Brien (2009:

62-63) found that post-edited texts were slightly better than fully human translations for clarity and accuracy but not for style. García (2010) found that post-edited machine translation was rated slightly higher than fully human translations when assessed by an experienced evaluator.

Temizöz (2013) reports that much depends on who does the post-editing. In a review of empirical studies, Koponen (2016) concludes that post-editing can give results similar to fully human translation. More recent research, however, tends to find that the inclusion of machine translation in the work process has mixed but generally negative effects on the final translation quality, although it tends to offer some time gains (cf. the reports in Moorkens et al. 2018;

Macken, Prou and Tezcan, 2020).

In comparison, there are relatively few studies on the way texts can be written especially for machine translation (“pre-editing”). This involves removing “negative translatability

indicators”, in other words elements that are likely to be problematic for machine translation (Pym 2020). The practice is in some respects an application of controlled language, although Marzouk and Hansen-Schirra (2019) report that their use of controlled language made no

significant improvement to neural machine-translation output. Ideally, specific indicators should be identified for specific domains and perhaps for particular language pairs or language families (cf. Nyberg, Mitamura and Huijsen 2003, O’Brien and Roturier 2007, Fiederer and O’Brien

(4)

2009). The cost-effectiveness of pre-editing with respect to post-editing generally only comes into its own when a given start text is to be translated into more than a few target languages.

Despite these various studies, there is widespread belief that machine translation should not be used for high-stakes texts. Even the most innocuous errors can leave speakers of a language feeling relegated to an inferior status (cf. Angermeyer 2017). Hale and Liddicoat (2015), writing prior to the use of neural machine translation, claimed that machine translation was basically unsuited to situations where accuracy and cultural values were important,

especially in healthcare. When press reports made it known that the Australian government used machine translation in the early stages of its Covid-19 messaging (Dalzell 2020), there was considerable outrage across the community of professional translators (cf. American Translators Association 2020). In some cases, the assumed translation errors were by default attributed to machine translation (Pym et al. 2022). It became general Australian government practice to avoid machine translation in healthcare messaging.

Little did they know that precisely the opposite communication solution was being adopted in Catalonia at the same time.

The GenCat website as an application of non-policy

Language policy in Catalonia is squarely focused on the defense of Catalan in terms of historical territorial rights. The co-official status of Catalan is established in Article 3 of the Spanish Constitution of 1978 and has been developed in Catalan legislation on linguistic normalization in 1983 and on language policy in 1998. Public education adopts a basic policy of immersion in Catalan, with ongoing debates and legal decisions about the relative use of Spanish. There appears to be no one policy dealing with the provision of government services in non-official languages, although there is indirect mention in some provisions for bilingual education (Ali &

Ready 2021). There is, however, a general policy trend towards the use of artificial intelligence in government services, and particularly towards online solutions in healthcare. The Digital Spain 2025 initiative (Ministerio de Asuntos Económicos y Transformación Digital 2022), embedded in wider European agenda for digital transformation, includes goals such as

“empowering patients with telemedicine tools, self-diagnosis and greater accessibility”

(emphasis ours). The use of automated language services is part of that vision.

Here we focus on the official website of the health services of the Catalan government, the Generalitat de Catalunya (GenCat). This is the site to which most municipal websites refer in Catalonia; it gives the official updates on all aspects of health services, from how to deal with mosquito bites through to how to survive Covid. The Covid-19 information is updated in Catalan as the start text. A drop-down menu at the top of the page invites speakers of Spanish, English and French to select their language and then see the page as translated by Google Translate.

Speakers of other languages can do the same if they know how to go to the Google Translate site and insert the URL (133 languages are currently provided for) but the GenCat page limits its menu to Spanish, English, often French and sometimes Aranese.

It is difficult to estimate the number of people who could potentially consult these

machine translations. Since the start texts (STs) for the translations are indicated as being always in Catalan, the users could be anyone in Catalonia who does not understand that language.

According to official statistics for 2018, that number would be 512,068 people, some 6.6 percent of the population (Idescat 2018a). We note, however, that the survey asked respondents whether they were “able to understand a conversation on an everyday topic” in Catalan (Idescat 2018a),

(5)

which would be the operative definition of “understand” here. One doubts that official Covid information really counts as an everyday conversion, so the potential population could be considerably greater – enough to compromise attempts at universal vaccination.

Here we are not focusing on Spanish because the quality of the machine translation between Catalan and Spanish tends to be high: the two languages are cognate and the paired databases are very extensive (daily newspapers are machine-translated from Catalan to Spanish).

We focus instead on the English version of the Covid-19 information provided by GenCat. This is the version that seems most likely to be consulted by people in Catalonia who say they do not speak Spanish or Catalan. According to the 2018 language survey, this would be only 0.2 percent of the total population (Idescat 2018b), perhaps small enough for a policymaker somewhere to have dismissed it as not worth including in a developed communication strategy. That small percentage is nevertheless a still sizeable population of 15,507. Once again, though, this only represents the percentage of people who say they understand an everyday conversation in Spanish. There are reasons to suppose that the number is much greater, and not just because healthcare directives are not everyday communication. For example, in 2021 there were 30,270 interpreting services rendered in the Catalan courts, some 43 percent of which were actually for Arabic (Suport judicial, 2021: 2, 8), for which the lingua franca would tend to be French. Many speakers of languages other than Spanish and Catalan will nevertheless turn to English as a second or third language: some 52 percent of non-Spanish citizens in Catalonia say that they understand English, as opposed to 44 percent for Spanish citizens (Idescat 2018b). English is also reported to be a lingua franca among groups such as retirement immigrants in Spain (Gustafson and Laksfoss Cardoso 2017) and accounted for 2,340 interpreting services in the Catalan courts in 2021 (Suport judicial, 2021: 2). Since these are relatively transitory social groups – based on wealthy mobility from the North and economic migration from the South –, they likely not to be fully reflected in the official language surveys. This is another reason to suspect that the number of potential users of the English-language website is much greater than 0.2 percent of the population.

Methodology

The GenCat website for Covid-19 information has been studied by our research students Marc Anguiano Musons in 2021 and Jonathan Prioleau in 2022. Both surveyed the entire website, although here we focused only on vaccination information. The 12-month difference between the two studies allows us to track the changes made to the site, especially with respect to the

correction (or mostly non-correction) of errors. Prioleau (2022), who was employed at the time as an interpreter at a clinic in the United States where Covid-19 vaccinations were given, also undertook a translation of the site from Catalan to English, which allows the machine translation to be compared with a human translation.

The information provided by these studies affords a general overview of the use of raw machine translation. In broad terms, the translations are surprisingly readable, to the extent that almost all the basic information can be understood correctly. The advances made in neural machine translation since 2016 are palpable. Our first methodological concern, though, is that this baseline readability might create a degree of trust in the text that renders the user insensitive to the translation errors that occasionally occur. At the same time, the marked presence of relatively inconsequential errors could make the text seem less trustworthy to the reader, thus diminishing the actionability of the message: it is one thing to understand a message; it is another

(6)

to act upon it (cf. the PEMAT guidelines, Shoemaker et al. 2014). Hence our particular interest in what happens to trust in the reception process. This led to the formulation of two hypotheses:

1) some machine translation errors correlate with a loss of comprehension, and 2) some machine- translation errors correlate with a loss of trust-based actionability.

Results

Here we present an overview of the translation errors, which range from the potentially disastrous to the merely comical. Prioleau (2022) offers a bottom-up categorization of the translation errors, which we adopt here. Since our interest is on how the use of machine

translation might interact with a certain kind of machine-translation literacy, we offer illustrative examples rather than a quantitative analysis. Here the presentation goes from the most potentially serious to the more inconsequential.

Untranslated images

One of the guidelines for effective healthcare information is that images be used to lead readers through the text (Shoemaker et al., 2014). Unfortunately, machine translation will usually not render text that is embedded in images. In some cases, this can lead to a loss of actionability. In 2022, the GenCat website included an image with a text announcing information on who was eligible to receive a booster shot of the Covid-19 vaccine. Here we present the image with the accompanying machine-translated text in English:

Clearly, the translated text below the image did not make mention of the key linguistic

information: that this announcement concerns the booster shot. For the English-language user, this instruction is entirely readable (a group has opened, somewhere) but wholly unactionable (what is it open for?). The recommendation for website managers is clear and simple enough: do not embed text in images. At best, the user will wonder what they are missing.

Failure to select healthcare terminology

Since Google Translate is designed for very general use, it can fail to identify the specific terms used in a given field. The most egregious example of this is the rendering of “dosi de record” as

“record dose” and occasionally “memory dose”, rather than the accepted English term “booster

(7)

shot”. The machine-translated names are simply not recognized by users who do not speak Catalan or Spanish.

Similar examples include the recommendation to wash one’s hands with “alcoholic ice”, sine the Catalan word “gel” can refer to both ice and gel. At a comparable level of entertainment, on one occasion the Catalan term “quarantena” (quarantine) was rendered as “forties”, which it can also mean sometimes but not in this context: “If you are in your forties because you have become in close contact with someone with COVID-19 . . .”. Another example concerns the word “convocatòria”, which can elsewhere be translated as “call” but not here:

ST: Si ja tens administrada una primera dosi de la vacuna contra la COVID-19, descobreix com es duu a terme la convocatòria de la segona dosi.

MT: If you have already received a first dose of the COVID-19 vaccine, find out how the second dose is called.

Human TT: If you already have your first COVID-19 shot, find out how to get your second COVID-19 shot.

A more innocent example of the same problem is found in the subheading “Salut A-Z”, where the healthcare information is presented in alphabetical sections, from A to Z. The word “salut”

has multiple meanings, including not just “health” but also “cheers” or even “hello”. The machine translation unfortunately went for the last-mentioned option: “Hi AZ.” The website is apparently greeting an unknown interlocutor by the name of AZ.

In these examples, the use of machine translation leads to merely useless pieces of

language. However, although comprehension is blocked, there is little risk of a false action being taken. A moderately intelligent reader is unlikely to wash their hands with ice, assume that contact with Covid changes their age, fret over a secret name for a second vaccination, or wonder who AZ is. We would hope that basic machine-translation literacy enables users to filter out these infelicities.

In all these cases, the problems could be avoided by preparing a field-specific glossary then setting it so that it overrides the general machine-translation feed. The same GenCat website lists several such glossaries under “recursos” (resources).

Failure to adopt context-specific readings

There are also cases where the potential polysemy of the ST creates problems. This one has already been mentioned:

ST: Dispensadors de gel hidroalcohòlic MT: Hydroalcoholic ice dispensers

This is a simple mistranslation that appeared in 2021 (Anguiano-Musons 2021, 16). The Catalan term “gel” can mean both “gel” and “ice”, and the machine translation preferred the latter. In 2022, the term was replaced with “solucions hidroalcohòlics” or “preparats hidroalcohòlic”, both of which successfully avoided the suggestion that people should be washing their hands with ice.

But the damage was done. If you search for “hydroalcoholic ice dispensers” in July 2022, you find some 67 hotels and public institutions that have repeated the same error, perhaps because

(8)

they copied the official terminology. We nevertheless have no reports of people washing their hands with ice.

Problems of actionability

As is well known, neural machine translation can make minor omissions in order to smooth a text and enhance its readability. The omissions are often inconsequential, but not always. In the following case, an important negation goes missing:

ST: Fes gestions de forma no presencial amb el sistema sanitari.

MT: Make arrangements in person with the healthcare system.

Human TT: Access your healthcare system online.

In other words, the machine-translation omission of “no” to qualify “presencial” effectively produced the opposite meaning, in this case with consequences for actionability.

A similar reversal occurs with the Catalan preposition “a”, which can mean “to” but also

“at”:

ST: Descarrega el teu certificat COVID a La Meva Salut MT: Download your COVID certificate to My Health.

Human TT: Download your COVID certificate at My Health (La Meva Salut).

These are examples of cases where some degree of post-editing is essential, if only to have a pair of human eyes check the text and authorize it, as might a notary in the legal field.

Complex grammar

A general rule of thumb in machine-translation assessment is that the longer the sentence, the more likely the errors of syntactic reference. An example:

ST: Totes les persones, incloent-hi els infants, si són un cas positiu de COVID-19 o si estan realitzant una quarantena per contacte estret no es poden vacunar.

Machine translation: Not all people, including children, can be vaccinated if they have a positive case of COVID-19 or if they are undergoing close contact quarantine.

The problem here is that the negative particle “no” comes near the end of the sentence and the machine-translation algorithms have a tough time knowing what it applies to – but then so do many readers of Catalan. In principle, “Not all people can be vaccinated if they have a positive case” implies that some people who have been exposed to COVID-19 can, in fact, be vaccinated.

This is not the case, as can be made clear in a human translation that applies considerable syntactic simplification:

If you are COVID-19 positive or in quarantine due to close contact, you cannot be vaccinated. This includes children.

(9)

The remedy here is clear and well-known: avoid long sentences.

Disclaimer and workflow management

Perhaps the most high-stakes translation error comes in the webpage where the use of machine translation is explained. Here is a screen shot from June 2022:

This disclaimer appropriately informs the user that the translation “may contain errors”, which might offer some degree of legal protection in the case of harm ensuing from misinterpretations.

The text also identifies the legitimate advantages of the strategy: a basic understanding of the information and real-time updating of the contents. The translated text, however, refers to “the Catalan versions of this website”, whereas the Catalan ST makes it clear it should refer to all the versions except the one in Catalan. One doubts the disclaimer will offer much legal protection.

The discrete presence of human translations

Despite the overriding reliance on machine translations in this website, there are a few fully human translations to be found. Our search conducted in June 2022 required quite a few clicks to locate a well-translated brochure The Covid-19 vaccination guide. Here the medical terms are generally correct (“vaccination shot” instead of “dose”, for example). But when we looked for the term “booster shot”, it was nowhere to be found. The brochure was published in October 2021, so when we were looking for information in June 2022, it was woefully out of date. This serves to illustrate the major theoretical advantage of machine translation: updates are made to the Catalan text, and all other versions are updated automatically.

A second human translation was found in quite a different part of the website. Under information for “people from Ukraine with temporary protection”, we find a PDF with the basic Catalan information, an English translation that might be human (it correctly refers to the

“booster shot”), and a human translation into Ukrainian. All foreigners are equal, but some are more equal than others, it seems.

Previous research

(10)

The intriguing thing about the texts here is that the last sentence is fairly nonsensical in English:

“you will be able to get […] the booster shot if you have already received it”. The same apparent contradiction is in the Catalan. But the Ukrainian here uses explication to make the meaning clear: it back-translates as “or the booster shot, if you are already vaccinated”. Here, as in many of the examples above, the basic cause of the problem is the way the Catalan start text is written.

Discussion: Ways to solve problems

The above examples include the most common pitfalls of machine translation: domain- inappropriate terminology, lack of contextualization, grammatical confusion, pronoun misattribution and unwarranted smoothing.

One could also argue, however, that the problems lie not with machine translation as such, but with the way it has been used as a once-and-for-all solution. This concerns more than the kind of poor workflow management that results in untranslated images. It also has to do with not testing the translated website before publishing it, and not considering any of the many ways in which relatively simple technologies and workflows could have removed most of the errors we have just seen.

Most of the solutions are within easy reach. The most obvious and perhaps expensive kind of improvement involves translators regularly checking the machine-translation output, doing light post-editing as a check before mistranslations are released to the population. A more cost-effective solution is to use translation-memory software into which one has not only

whatever machine-translation feeds one wants (DeepL is performing better than Google Translate for the Spanish-English pair) but also authenticated propositions from all previous translations, within which the new updates stand out and can be focused on immediately. The software also includes domain-specific glossaries that can be set to override the proposals coming from raw machine translations, thus solving the problem of indiscriminate terminology.

The translation automation association TAUS provided free translation-memory databases at the beginning of Covid-19, including for Spanish-English, but there is no sign of them here. The political decision to use Catalan as a start language, understandable enough, would have curtailed some of the solutions available.

The most viable solution is nevertheless to write the ST in such a way that the MT problems are avoided before they appear. This basically involves simplifying text. It is

technically called “pre-editing”, as opposed to “post-editing”, which is an intervention after the passage through machine translation. Pre-editing typically takes more time than post-editing and might thus appear less cost-effective. Its benefits, however, automatically appear in all the languages into which machine translation is carried out. If one is going into three languages, as is the case here, pre-editing is very likely to be more cost-effective than post-editing. And if it is done well, then there is no reason why machine translation should not be provided in many more languages as well, especially those that are relatively cognate.

To illustrate the virtues of pre-editing, here we use a little reverse engineering. In the examples below, we take the problematic cases we have seen above (ST in Catalan), we give the raw machine translation with the problems indicated in italics (MT in English), then we revise the Catalan input so as the avoid the machine-translation problems (Revised ST in Catalan), and finally we present the raw Google Translate version of that revised text as it was rendered in July 2022, without applying any special glossary or translation memory (which one should do in a

(11)

real-world context). Here we also add the raw French MT of the original ST and the revised ST, to indicate that the problem is solved not just for English but for other languages as well (and bearing in mind the number of speakers likely to use French as a lingua franca). It is important to note here that this activity has been carried out bottom-up, working from the errors in order to remove their causes, rather than as a top-down application of an abstractly defined controlled language – bearing in mind that Marzouk and Hansen-Schirra (2019) report that controlled language had no significant effect in a similar situation. That is, we are looking for the kinds of solutions that can work in this particular domain, for this kind of text, and for more than one target language. After each case below, we propose a guideline that might apply more generally when healthcare messaging is being written:

ST: Descobreix com es duu a terme la convocatòria de la segona dosi MT-EN: Find out how the second dose is called

MT-FR: Découvrez comment s’effectue l’appel pour la deuxième dose Revised ST: Descobreix quan i com pots rebre la segona dosi

New MT-EN: Find out when and how you can receive the second dose

New MT-FR: Découvrez quand et comment vous pouvez recevoir la deuxième dose

Proposed guideline: Since the term “convocatòria” is problematically ambiguous, spell out what it means in this context. (Note that if we want “dosi” to be rendered as “shot”, we would have to add a specialized glossary or translation memory.)

ST: Salut A-Z MT-EN: Hi A-Z

MT-FR: Santé de A à Z

Revised ST: Temes de salut de A a Z New MT-EN: Health issues from A to Z New MT-FR: Les thématiques santé de A à Z

Same principle: Explicitate the potentially ambiguous terms. Note that the first French MT was good in this case but was not excessively harmed by the pre-editing.

ST: Fes gestions de forma no presencial amb el sistema sanitari MT-EN: Make arrangements in person with the healthcare system MR-FR: Prendre des dispositions en personne avec le système de santé Revised ST: Fes gestions en línia amb el sistema sanitari

New MT-EN: Make arrangements online with the healthcare system New MT-FR: Prendre des dispositions en ligne avec le système de santé

Proposed guideline: Opt for a more clearly distinguished term – admittedly at the risk of an Anglicism in this case, but that might be considered a valid trade-off.

ST: Totes les persones, incloent-hi els infants, si són un cas positiu de COVID-19 o si estan realitzant una quarantena per contacte estret no es poden vacunar.

MT-EN: All people, including children, if they are a positive case of COVID-19 or if they are undergoing close contact quarantine cannot be vaccinated.

(12)

MT-FR: Toutes les personnes, y compris les enfants, si elles sont un cas positif de COVID- 19 ou si elles sont en quarantaine en raison d’un contact étroit ne peuvent pas être

vaccinées.

Revised ST: Si una persona té COVID-19 o fa una quarantena per contacte estret, no pot vacunar-se. Això també s’aplica als nens.

New MT-EN: If a person has COVID-19 or is in quarantine due to close contact, they cannot get vaccinated. This also applies to children.

New MT-FR: Si une personne a le COVID-19 ou est en quarantaine en raison d'un contact étroit, elle ne peut pas se faire vacciner. Cela s’applique également aux enfants.

Proposed guideline: Write short sentences. Here the syntactic obscurity is reduced by moving the embedded clause away from the verb and by writing in the singular.

ST: Descarrega el teu certificat COVID a La Meva Salut.

MT-EN: Download your COVID certificate to My Health.

MT-FR: Téléchargez votre attestation COVID à La Meva Salut

Revised ST: Descarrega el teu certificat COVID des de la web de La Meva Salut New MT-EN: Download your COVID certificate from the My Health website New MT-FR: Téléchargez votre certificat COVID sur le site Ma Santé

Same guideline: The ambiguous preposition has been replaced by one that is not so ambiguous.

ST: Si el teu fill o filla és un cas positiu de COVID-19 no es pot vacunar. Tampoc si està realitzant una quarantena per contacte estret.

MT-EN: If your son or daughter is a positive case of COVID-19 you cannot be vaccinated.

Nor if you are performing a quarantine by close contact.

MT-FR: Si votre fils ou votre fille est un cas positif de COVID-19, ils ne peuvent pas être vaccinés. Ni si vous êtes mis en quarantaine pour contact étroit.

Revised ST: Un nen no es pot vacunar si té COVID-19 o si fa una quarantena per contacte estret.

New MT-EN: A child cannot be vaccinated if they have COVID-19 or if they are in quarantine due to close contact.

New MT-FR: Un enfant ne peut pas être vacciné s’il a le COVID-19 ou s’il est en quarantaine en raison d'un contact étroit.

Guidelines: Position the subject first and as close as possible to the verb (once again, one would have to weigh up the extent to which this is an interference from English); avoid pronouns.

ST: Tindràs accés a la vacuna contra la COVID-19, o la dosi record, si ja la tens administrada.

MT-EN: You will have access to the vaccine against COVID-19, or the record dose, if you already have it administered.

MT-FR: Vous aurez accès au vaccin contre le COVID-19, ou à la dose de rappel, si vous l’avez déjà reçu.

Revised ST: Tindràs accés a la vacuna contra la COVID-19. Si ja ets vacunat, tindràs accés a la dosi record.

(13)

New MT-EN: You will have access to the vaccine against COVID-19. If you are already vaccinated, you will have access to the booster dose.

New MT-FR: Vous aurez accès au vaccin contre le COVID-19. Si vous êtes déjà vacciné, vous aurez accès à la dose de rappel.

Guidelines: Split long sentences; avoid pronouns.

From such bottom-up reverse engineering, one can extract some quite logical desiderata:

short sentences, no grammatical complexity, no pronouns, and explicitation in cases of potential ambiguity. Other guidelines appear in each field, as one experiments to see what works. As noted, one legitimate objection is that the results tend to sound like they come from English. At the end of the day, though, the aim of this messaging is to save lives, perhaps at the expense of a few items of deceptively autochthonous syntactic complexity. When you write the original Catalan messaging in such a way that it is easy to understand and act upon, that is good for Catalan users as well, in addition to the benefits it brings for machine translations that are of use to end users in other languages.

Conclusions

The use of machine translation for time-sensitive healthcare information seems not to correspond to any major policy decision in this case: territorial policies simply look the other way. The practice is nevertheless not without some justification. Its advantages include immediate updating of information, reduced costs, general understandability of basic information, and perhaps a reasonable level of acceptance among users who have basic machine-translation literacy, particularity in situations where the translation connects with further checking processes and spoken exchanges.

If one wanted to isolate only the translation errors, especially the more comical ones, it would be easy to condemn machine translation out of hand. One could be quite reasonably outraged and then declare that machine translation should never be used for official

communication, especially in the case of supposedly actionable healthcare messaging. One might further claim that, even if machine translation does not actually infringe on language rights, in this case it certainly risks alienating a considerable number of residents in the territory, thereby losing trust, reducing behavior change, and effectively compromising public health. It

nevertheless seems more judicious to view these linguistic problems as hazards that, first, stand in a trade-off relationship to the several clear advantages of machine translation, and second, appear in machine translation as just one element in a more complex and ongoing

communication practice.

Our proposal here is not to exclude translation technologies, but to work with them in order to find solutions to their current shortcomings. Basic translation memory software can override terminological and phraseological errors, highlighting updates and allowing reasonably quick post-editing. In this particular case, however, with translation into three or more languages, the most cost-effective solution is certainly pre-editing, or just a start text written in clear, simple language, produced by professionals who are properly trained to write it.

If there is one policy recommendation to be made in this area, it would be to have a policy. Territorial language debates, not just in Catalonia but generally across Europe, tend to sideline non-official languages as a problem for the speakers of those languages, who need to

(14)

learn the official languages. In the case of urgent healthcare information, however, those priorities no longer apply: there is no time for learning; behavior changes are needed now. It is there, under time pressure, that policy is needed to address situations where technology comes to the fore precisely because it can save time. Such a policy must be designed to protect lives, not just languages.

References

Ali, Farah, and Carol Ready. 2021. “Integration or Assimilation? A Comparative Intertextual Analysis of Language Policy in Madrid and Catalonia”. International Journal of Language and Law, 10: 24-47.

Al Shamsi, Hilal, Abdullah G. Almutairi, Sulaiman Al Mashrafi and Talib Al Kalbani. 2020.

“Implications of Language Barriers for Healthcare: A Systematic Review”. Oman Medical Journal, 35(2): article e122.

Allaby, Elaine. 2018. “Are Eritrea government spies posing as refugee interpreters?”. Al Jazeera 26 February 2018. Accessed 14 July 2022.

https://www.aljazeera.com/features/2018/2/26/are-eritrea-government-spies-posing-as- refugee-interpreters

American Translators Association. 2020. “Australian Government Used Google Translate for COVID-19 Messaging”. Accessed 14 July 2022.https://www.atanet.org/industry- news/australian-government-used-google-translate-for-covid-19-messaging/

Angermeyer, Philipp Sebastian. 2017. “Controlling Roma refugees with ‘Google-Hungarian’:

Indexing deviance, contempt, and belonging in Toronto's linguistic landscape”. Language in Society, 46(2):1-25.

Anguiano-Musons, Marc. 2021. Translations of Covid-19 health information in Catalonia.

Treball de fi de grau. Tarragona: Universitat Rovira i Virgili.

Ayvazyan, Nune, and Anthony Pym. 2018. “Mediation choice in immigrant groups. A study of Russian speakers in southern Catalonia”. Language Problems and Language Planning, 42(3): 344-364.

Ayvazyan, Nune, and Anthony Pym. 2022. “Portraying linguistic exclusion. Cases of Russian speakers in the province of Tarragona, Spain.”. In Advances in Interdisciplinary Language Policy, edited by François Grin, László Marácz & Nike Pokorn, 237-256. Amsterdam:

John Benjamins.

Bowker, Lynne. 2009a. “Can Machine Translation meet the needs of official language minority communities in Canada? A recipient evaluation”. Linguistica Antverpiensia, 8. 123-155.

Bowker, Lynne. 2019b. “Machine Translation Literacy as a Social Responsibility”. In

Proceedings of the UNESCO International Conference on Language Technologies for All.

Paris, December. https://lt4all.org/media/papers/O7/145.html

Bowker, Lynne. 2021. “Translation technology and ethics”. In The Routledge Handbook of Translation and Ethics, edited by Kaisa Koskinen & Nike K. Pokorn, 262-278. Abingdon:

Routledge.

Bowker, Lynne, & Jairo Buitrago Ciro. 2019. Machine Translation and Global Research:

Towards Improved Machine Translation Literacy in the Scholarly Community. Bingley:

Emerald Publishing.

(15)

Dalzell, Stephanie. 2020. “Federal Government used Google Translate for COVID-19 messaging aimed at multicultural communities”. ABC News. https://www.abc.net.au/news/2020-11- 19/government-used-google-translate-for-nonsensical-covid-19-tweet/12897200.

Fiedler, Sabine, and Agnes Wohlfarth. 2018. “Language choices and practices of migrants in Germany. An interview study”. Language Problems and Language Planning, 42(3): 266- 285.

Flores, Glenn. 2005. “The Impact of Medical Interpreter Services on the Quality of Health Care:

A Systematic Review”. Medical Care Research and Review 62(3): 255-299.

DOI: 10.1177/1077558705275416

Grin, François. 2022. “Principles of integrated language policy”. In Advances in

Interdisciplinary Language Policy, edited by François Grin, László Marácz & Nike Pokorn, 24-52. John Benjamins.

Grin, François. 2022. “Principles of integrated language policy”. In Advances in

Interdisciplinary Language Policy, edited by François Grin, Laszlo Marácz & Nike Pokorn, 24-52. John Benjamins.

Gustafson, Per, and Ann Elisabeth Laksfoss Cardozo. 2017. “Language Use and Social Inclusion in International Retirement Migration”. Social Inclusion 5(4): 67-77.

Hajek, John, Maria Karidakis, Riccardo Amorati, Yu Hao, Mehda Sengupta, Anthony Pym, and Robyn Woodward-Kron. 2022. Understanding the experiences and communication needs of culturally and linguistically diverse communities during the COVID-19 pandemic.

Melbourne: The University of Melbourne, report prepared for the Victorian Department of Families, Fairness and Housing.

Hale, Sandra, and Anthony Liddicoat. 2015. “The meaning of accuracy and culture, and the rise of the machine in interpreting and translation”. Cultus: The Journal of Intercultural Mediation and Communication, 8: 14–26.

Hansen, Silvia. 2022. “Evaluating the Quality of Easy and Plain German from an Addressee oriented Perspective”. Accessed 14 July 2022.

https://www.youtube.com/watch?v=vQlMYFZnxdM.

Hassan, H., et al. 2018. “Achieving Human Parity on Automatic Chinese to English News Translation”. Accessed 14 July 2022. https://www.microsoft.com/en-

us/research/uploads/prod/2018/03/final-achieving-human.pdf.

Idescat. 2018a. “Coneixement de llengües”. Enquesta d’usos lingüístics de la població. Accessed July 2022. https://www.idescat.cat/indicadors/?id=anuals&n=10367

Idescat. 2018b. “Població segons coneixement de l’anglès i nacionalitat”. Enquesta d'usos lingüístics de la població. Accessed 14 July 2022.

https://www.idescat.cat/pub/?id=eulp&n=4658

Läubi, Samuel, Rico Sennrich and Martin Volk. 2018. “Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation”. Accessed 14 July 2022.

https://arxiv.org/pdf/1808.07048.pdf

Lindholm, Mary, J. Lee Hargraves, Warren J. Ferguson, and George Reed. 2012. “Professional Language Interpretation and Inpatient Length of Stay and Readmission Rates”. Journal of General Internal Medicine 27(10): 1294-1299.

Macken, Lieve, Daniel Prou, and Arda Tezcan. 2020. “Quantifying the Effect of Machine Translation in a High-Quality Human Translation Production Process”. Informatics 7(2), 12 (1-19); https://doi.org/10.3390/informatics7020012

(16)

Marzouk, Shaimaa, and Silvia Hansen-Schirra 2019. “Evaluation of the Impact of Controlled Language on Neural Machine Translation Compared to Other MT Architectures”. Machine Translation 33, 179–203. https://doi.org/10.1007/s10590-019-09233-w

Ministerio de Asuntos Económicos y Transformación Digital. 2022. “Digital Spain 2025”.

Accessed July 2022.

https://portal.mineco.gob.es/RecursosArticulo/mineco/prensa/ficheros/noticias/2018/Agend a_Digital_2025.pdf.

Moorkens, Joss, Sheila Castilho, Federico Gaspari, Stephen Doherty, eds. 2018. Translation Quality Assessment. Cham: Springer.

Nitzke, Jean, Silvia Hansen-Schirra and Carmen Canfora. 2019. “Risk management and post- editing competence”. Journal of Specialised Translation, 31: 239-259.

O’mara, Ben, and Gemma Carey. 2019. “Do multilingual androids dream of a better life in Australia? Effectiveness of information technology for government translation to support refugees and migrants in Australia”. Australian Journal of Public Administration 2019: 1- 23. DOI: 10.1111/1467-8500.12388

Pokorn, Nike K., and Jaka Čibej. 2018. “’It’s so vital to learn Slovene’. Mediation choices by asylum seekers in Slovenia”. Language Problems and Language Planning, 42(3): 286-304.

Prioleau, Jonathan. 2022. Machine translation in healthcare. Master’s dissertation. Universitat Rovira i Virgili.

Pym, Anthony, and Bei Hu. 2022. “Trust and Cooperation through Social Media. Covid-19 translations for Chinese communities in Melbourne”. In Translation and Social Media Communication in the Age of the Pandemic, edited by Tong King Lee and Dingkun Wang, 44-61. Routledge.

Pym, Anthony, and Ester Torres-Simón. 2021. “Is automation changing the translation profession?”. International Journal of the Sociology of Language 270: 39-57.

Pym, Anthony. 2020a. “Quality”. In Minako O’Hagan (ed.) The Routledge Handbook of Translation and Technology, pp. 437-452. Abingdon and New York: Routledge.

Pym, Anthony. 2020b. “When trust matters more than translation”. Pursuit (29 July), University of Melbourne, https://pursuit.unimelb.edu.au/articles/when-trust-matters-more-than- translation

Shoemaker, Sarah J., Michael S. Wolf, and Cindy Brach. 2014. The Patient Education Materials Assessment Tool (PEMAT) and User’s Guide. Updated edition. U.S. Department of Health and Human Services. https://www.ahrq.gov/health-literacy/patient-education/pemat.html Suport judicial. 2021. Serveis d’interpretació i traducció judicial. Barcelona: Subdirecció

General de Suport Judicial i Coordinació Tècnica.

Temizöz, Özlem. 2013. Postediting machine-translation output and its revision. Subject-matter experts versus professional translators. PhD thesis. Universitat Rovira i Virgili.

Wallbrecht Joshua, Linda Hodes-Villamar, Steven J. Weiss and Amy A. Ernst. 2014. “No difference in emergency department length of stay for patients with limited proficiency in English”. Southern Medical Journal 107(1): 1-5.

Wood, Emily. 2018. “Twenty years of building for everyone”. Google Company

Announcements. https://blog.google/intl/en-ca/company-news/inside-google/twenty-years- of-building-for-everyone/. Accessed 14 July 2022.