Many factors contribute to the difficulty of machine translation, including words with mul- tiple meanings, sentences with multiple grammatical structures, uncertainty about what a pronoun refers to, and other problems of grammar. But two common misunderstandings make translation seem altogether simpler than it is. First, translation is not primarily a linguistic operation, and second, translation is not an operation that preserves meaning.
Machine translation is not easy. There are several well-known problems of machine trans- lation which are very fundamental and pose difficulties for human translators as well.
• Syntactical ambiguities:
The structure of a sentence often depends on semantics, not only on the type of words.
• Polysemy:
Polysems are words which have several similar meanings. They are difficult to translate since an appropriate word in the target language has to be found.
• Homonymy:
Homonyms are several independent words which ’share’ the same linguistic corpus. They are difficult to translate since their meaning often depends on the context.
• Referential Ambiguity:
Pronouns refer to certain words but it is often not clear to which. References might cross sentence boundaries and heavily depend on context and semantics. As a consequence, gender might change from language to language and has to be adjusted accordingly.
• Metaphors and symbols:
Both metaphors and symbols depend on the underlying culture and sometimes cannot be translated. There might exist equivalent expressions in the target language which can be located using idiomatic dictionaries.
• Synonyms:
There are often several words with almost the same meaning which makes it very difficult to choose the right translation since it depends on context, style and semantics. Differences are often very subtle.
• Fuzzy Hedges:
Vague words, terms and expressions like in a sense and irgendwie are called Fuzzy Hedges. Such expressions are language dependent and difficult to translate.
• New developments:
As society and technology progress, new words, terms and expressions are introduced. Words might be used in new contexts, new slangs might appear or marketing equips simple phrases with complete new meanings.
Communication of meaning is only one among many functions of language. Language is a social phenomenon. Computers rarely ’know’ about society and they will therefore have problems with translating utterances for:
• Demonstrating one’s class to the person one is speaking or writing to; • Simply venting one’s emotions, with no real communication intended;
• Establishing non-hostile intent with strangers, or simply passing time with them; • Telling jokes;
• Engaging in non-communication by intentional or accidental ambiguity, sometimes also called ’telling lies’;
There are even more problems which sometimes sound simple but are extremely com- plicated to solve in MT. They are all connected to context: It is virtually impossible to separate the formulation of even the simplest sentence in any language from the audience to whom it is ad- dressed [19]. At the first glance, this is surprising, at the second it becomes obvious. Further, there are translation tasks which are very complicated for human translators, e.g. stage plays, lyrics, ad- vertising, titles of books, newspaper headlines, poetry etc. A joke in language A must also become a joke in language B, even if it isn’t [19].
Apart from MT problems there are real-world problems not yet solved. There are some cross-cultural issues which can not or only hardly be translated. Chinese medicine for example has
several branches completely unknown to Western people. The branches themselves are perfectly logical and consistent in their own terms and have their own explanations and methods for obser- vations, measurements and diagnosis. But the specialized vocabulary is very hard to explain to non-specialists, not to talk about translating it into a foreign language. Imagine a fully automatic MT system. It would need to know for whom the text to translate is intended. If the text was in- tended for normal people, the system had to add an explanation about Chinese medicine in general. The other way around, if there was special English vocabulary for that particular branch of Chinese medicine, the MT system would have to figure out what the English text was about and then use the specialized Chinese vocabulary.
Machine translation has never measured up to the quality of human translation. One of the primary reasons is a lack of knowledge. The Transformation, Linguistic Knowledge Transfer, and Linguistic Knowledge Interlingua Machine Translation engines all rely on dictionaries and grammars to bridge the language barrier. The problem is all dictionaries are incomplete and all grammars leak. The best translators have numerous dictionaries and still on a daily basis find terms and expressions that their dictionaries don’t contain. Grammars don’t follow the rules so all attempts to teach a computer to decipher language using dictionaries and grammars fail because computers can’t fill in the blanks or break the rules.
The Example Based Machine Translation engines acknowledge the fact that computers won’t be able to translate perfectly and try to find close matches of similar sentences that might help someone understand another language. They accept that it won’t be perfect up front. They use large bilingual databases of sentences between two languages. This is how most of the translation memory systems work. Of course, if a sentence has been translated and is a perfect match then these systems get it right. The problem with that is that sentences proliferate faster than rabbits. There are billions of people everyday creating googles of new sentences in thousands of languages. Very few people on this planet wake up and say or write the exact same sentences they said or wrote on any previous days. These systems lack knowledge also and can never get enough because of this ’sentence proliferation’.
The Statistical Based Machine Translation Engines don’t use grammars or dictionaries. They are built on the premise that words are defined by the company they keep. ”Context is ev- erything.” They use mathematical models to analyze large databases of bilingual text and using probability try to create an equivalent translation. They feed off the notion that words are use in predictable patterns. That phrases or patterns of words inside of sentences are extremely redundant. To validate this claim try to recall a conversation you may have had with someone who
spoke your language as a second language to them. They may have used words native to your language. They may have followed the proper grammar of your language, but more often than not the way they put the words together was a little awkward, not natural. This is because the patterns of word usage in their native tongue are usually different and it carries over to their second language even though they may have turned things around with proper grammar. If analyze the groups of words inside sentences you will find that they are repeated quite often.
So if the Statistical Machine Translation Systems have so much promise, why isn’t their quality any better? It is because the lack of knowledge. People feed these machine translation sys- tems large databases of bilingual text done by professional translators. The problem is they are never large enough or comprehensive enough to cover all the patterns that exist in the language. They are also further hampered by the fact that human translators don’t translate the same phrases the same way every time. Human translators seem to have a flaw in that they don’t all have ”photographic memories” and can remember exactly how they translated a phrase the last time they saw it.
So you may ask, if all these systems need is more knowledge why don’t we collect more knowledge and make these systems better. One of the biggest challenges is that to get top quality translations it is very expensive. It costs anywhere from $0.20 to $0.30 per word to translate docu- ments with multiple reviews. To get the size of bilingual databases that these systems would need it would cost hundreds of millions to billions of dollars. The time it takes to do those translations is also a challenge.
Many factors contribute to the difficulty of machine translation, including words with mul- tiple meanings, sentences with multiple grammatical structures, uncertainty about what a pronoun refers to, and other problems of grammar. But two common misunderstandings make translation seem altogether simpler than it is. First, translation is not primarily a linguistic operation, and second, translation is not an operation that preserves meaning.