The Advent of Translation Software
In theory, machine translation (MT) denotes a procedure whereby a computer programme can manage the translation process on its own without human intervention. This process consists of decoding the meaning of the source text and subsequently re-encoding this meaning in the target language. Ostensibly simple, the translation process in fact requires complex cognitive operations. To decode the meaning of the source text in its entirety, the translator must interpret and analyse all the features of the text, and this requires in-depth knowledge of the grammar, syntax, semantics, and idioms etc. of the source language and even the culture of its speakers. The translator then needs the same detailed knowledge of the target language to be able to re-encode the message. It is here that the fundamental challenge lies for machine translation: how is it possible to programme a computer to make sense of a sentence in the way a person does and then to phrase it in the target language so that it sounds fluent and appealing?
One of the specific reasons why it is so difficult for a computer to make sense of a linguistic text is the problem of semantics: the same word has a range of different meanings depending on the context in which it is used. If the word “education”, for example, was to be translated into German, it would have to be rendered as “Bildung”, “Aufklärung”, “Einweisung”, “Ausbildung”, “Erziehung” , “Betreuung”, or “Unterricht” depending on whether it is used in the sense of “general education”, “vocational training”, “instruction”, “qualification”, “upbringing” or “teaching” etc. in English. When human translators read the original sentence in English, they use their entire life experience as a frame of reference. Unless a computer is endowed with a knowledge base as a vast as the human brain’s, it simply cannot recreate the context required to grasp the meaning even of a relatively simple sentence.
Given these great challenges, it is not surprising that so far, not only is human intervention still needed in machine translation, but actually makes up the bulk of the translation process: extensive pre-and post-editing work on machine-generated translations is required. In trying to improve the capabilities of such programmes, three basic approaches have been tried and are constantly developed further. The first of these is interlingual machine translation whereby each sentence of the text to be translated is broken down into component words which are then further analysed for their base forms and grammatical and functional structures. As a next step, they are transferred into an “interlingua”, i.e. an artificial language or universal interface. Finally, the target text is generated out of the interlingua. Although research into this system began a long time ago, a truly viable interlingual machine translation programme has yet to be developed.
The second system takes a statistical approach: programmes such as Language Weaver calculate the probability of linguistic correspondence by examining bilingual text corpora such as the French-English record of the Canadian parliament or the records of the European Parliament. They thereby learn translation patterns and use them to translate similar constructs. To the extent that such corpora are available, the results attained are promising. However, the availability of such corpora is still the exception rather than the rule. This statistical approach is used in the Google language tools, for example for Arabic – English and Chinese – English. As any quick Google test shows, these work up to a point and are sufficient to get the gist of the source text but claims of extraordinary accuracy are generally not borne out in tests conducted by professional human translators. That said, such systems are useful for very specific texts with limited vocabulary and linguistic structures such as product descriptions and weather reports.
The third approach taken to machine translation are memory-based systems. The reason why memory-based systems are by far the most useful of the three approaches and the only one regularly used by most professional translators is that they do not actually translate. Instead, such systems draw on a broad database of exact or similar matches from sentences or phrases that have already been translated in the past. They are very useful in areas with a large proportion of standard phrasing such as business letters, boilerplate templates, medical diagnostics, and annual management reports and also to translate treaty names and provisions of international law etc. because they have direct and standardised equivalents in other languages. Examples of such memory-based CAT tools (CAT stands for computer-assisted translation) include Trados and Wordfast.
In conclusion, both machine translation and especially memory-based CAT tools can be useful in certain specific applications such as product descriptions written in a controlled language. For the most part, however, they tend to create unreliable and even non-sense translations due to their inability to read context. One classic is the phrase “The spirit is willing, but the flesh is weak” which, when translated into Russian and back comes out as “The vodka is good, but the steak is lousy.” Claude Piron, a long-time translator for the United Nations and the WHO estimated that only about 25% of a professional translator’s job can currently be automated, leaving the more difficult 75% of the text to be dealt with by a human translator, most notably the task of doing extensive research to resolve ambiguities in the source text. So far then, computer applications can only produce rough translations that, with luck, give the gist of the source text. This means that for a long time yet, even the best translation software cannot hope to replace human translators.
