The Push Button Translation
Machine Translation is a branch of Computational Linguistics and is loosely categorized under Artificial Intelligence.
It has an interesting history. The first recorded instance of machine translation occurred in 1954 in what has been called the Georgetown-IBM experiment. This experiment involved the completely automatic Russian to English translation of some sixty sentences in the field of organic chemistry. Non-Russian speakers typed Russian sentences using Romanized Russian (Russian that is written using the English alphabet as opposed to Cyrillic) into a computer keyboard and sensible English came out of the printer. The Georgetown-IBM experiment, although limited with 6 grammar rules and 250 words in its vocabulary, showcased the possibility that automatic translations could be achieved.
This was just the beginning and much more research was required, but the thinking and hope was that contextually accurate, fully automatic translations could be produced within five years. Funding-wise, it was wildly successful and, as it coincided with a warming Cold War, the military and intelligence communities invested heavily in the necessary computer equipment. Similar experiments were also carried out in the Soviet Union from English into Russian. It seemed that the prize of Fully Automatic High-Quality Translation of Unrestricted Text was not far off.1
The initial hopes of the participants in this field was cut short by the publication of the ALPAC (Automatic Language Processing Advisory Committee) Report in 1966 which noted that the problems encountered using machine translations were much greater than anticipated in spite of their supposed early success. The report was so negative, in fact, that funding was drastically reduced.2
Why this was the case and why there still is not a workable, fully automatic translation program in existence today is an interesting study, both in the nature of language and in the mind’s ability to anchor concepts, words, and reality. Traditional computer programming, due to its formulaic and formalistic approach, is hard-pressed to deliver contextually and culturally accurate translations.
In computer translation, words in the source language are translated directly into words of the target language. The problem is that this approach misses the underlying complexity that seems to pervade most of life’s processes, including language. These show up as nuance, idiom, and tonality. They also result in strange translations such as “I want a pear” when the original sentence had to do with shoes.
Part of that difficulty has to do with the necessity of trading complexity for simplicity. Put another way, a human has a range of relational and structural experiences that are grounded in years of observation and living that are extremely textured and complex. A machine translation, particularly one using a formalistic (rule-based) approach, has no such grounding on which to base its choices even when augmented with specialized glossaries and database searches of past translations.
Formalism involves an analytic building block approach. In translations, formalism takes a sentence and decodes its meaning and then re-encodes it into the target language. There is both strength and weakness in this method. Its strength, as an overall solution, is that the method works most of the time. Analysis simplifies the whole into bite-size chunks. These chunks are translated and then reassembled. The weakness is that the new structures have lost some of the complexity and nuance that the original had, having been created with better context.3
Science and technology have always emphasized function over form. It is only in the last century and a half that structure has begun to assume its appropriate importance, starting with the rise of organic chemistry and the creation of pharmaceuticals.
Structure, too, has a place in the world of information manipulation and language. It is demonstrated in three ways:
- Rules and laws: The whole idea of a rule, or a natural law, is that one can take a bunch of information about a subject, or the world, and generalize it into a simpler form. The rule that a plate drops when released holds for all objects on Earth. One does not have to think of different objects. It is a simplification. Language, too, is a kind of simplification. A sunset can be condensed into a single word. A number sequence can be reduced to a simple rule such as “start with 1 and double it and then keep doing so”. Regardless of the fact that it is an infinite sequence, the simple rule can derive all of its elements, even an infinite number.Put another way, one is able to take life and, by applying rules, predict and control it.This then is the ultimate in formalism and the holy grail of science: the theory that everything can ultimately be reduced to, or simplified into, a short compact formula.Laws compress information and data. They reduce structure and complexity.
- What about randomness?There are arguments that randomness in and of itself is a manmade concept and that true randomness can only be found in lotteries and the strings of numbers generated by them.Regardless, by definition there is no rule that could or should be able to describe the sequence generated. The information generated is incompressible. The data in the set describes the data in the set, and the structure that exists is the minimal structure required to express it. Structure is maintained at a steady state.
- A new entry from complexity theory is called emergence whereby new data and structure appear to be manufactured or created from scratch. For example, if one combines sodium (a metal that burns in water) with chlorine (a poisonous gas) sodium chloride is produced. Sodium chloride (also known as table salt) is a solid, which not only dissolves in water, but is sprinkled on eggs to make them taste better.
This also works with hydrogen and oxygen in the making of water. Knowing all there is to know about these two gases individually, it is impossible to predict that when combined, they will produce a liquid at room temperature that boils at 100 degrees centigrade and, when frozen, expands in such a way that it floats on its own liquid state. Emergence has the peculiar aspect of making new structure and creating new data that did not exist before. It is associated with novelty. In it is the idea of surprise.4
When all three of these points (laws, randomness and emergence) are added together, one seems to get an inkling of the complexity that life encompasses quite easily. One can also begin to understand why life seems to defy categorization.
There is no doubt that formalism and its resultant technology have yielded results. It has remarkably sped up the pace of society, but at the expense of downplaying the complexity that remains regardless, ready to manifest in the form of storms, floods, market crashes, social upheavals, and the unpredicted event.
So where does this leave machine translations that have, to the present day, relied heavily on a formalistic approach?
There will come a time when computers will be able to achieve the ideal of automatic translation. This will be driven partly by sheer computing power and partly by the addition of contextual elements being added into the translation process, developed in the computer’s programming.5
Until then, humans must supply the context and structure so translators will still be a necessity.
1 Hutchins, J. W. & Somers, H. L. (1992). An Introduction to Machine Translation. Retrieved September 8, 2011, from John Hutchins: Publications on machine translation, computer-based translation technologies, linguistics and other topics: http://www.hutchinsweb.me.uk/IntroMT-TOC.htm
3 Hall, J. S. (2007). Beyond AI: Creating the Conscience of the Machine. Amherst, NY: Prometheus Books.
4 Georgescu-Roegen, N. (1971). The Entropy Law and the Economic Process. Cambridge, MA: Harvard University Press.
For more of Ivan’s writings, please visit his author blog.
Interested in reprinting our articles? Please see our reprint requirements.
© 2011 Ivan Obolensky. All rights reserved. No part of this publication can be reproduced without the written permission from the author.