Using Semantics to Improve Machine Translation

By Kathleen Dahlgren, Ph.D., CTO
Cognition Technologies, Inc.
www.cognition.com
Machine translation technologies currently use one of several statistical algorithms to guess the translation by similarity to known translations. This is a gradient process that produces a set of proposed translations, ordered by the algorithm’s guess as to how likely the translations are to be the correct one. For example, in translating the Spanish sentence “Los obreros tratataba de terminar el edificio a tiempo” into English, a ranking of guessed translations might be:
“The workers tried of finish the building in good weather.”
“The workers treated the finish the building at time.”
“The workers tried to terminate the building on time.”
“The workers tried to finish the building with a steady pace.”
“The workers tried to finish the building on time.”
By employing a Semantic Natural Language Processing (NLP) technology, like that developed by Cognition Technologies, the process can use the technology’s deep ”understanding” of language to significantly reduce the quantity of bad guesses coming from the statistical machine translation algorithms. It then enables the process to select the most structurally and semantically plausible translations from among the suggested alternatives.
Semantic NLP technology can rule out some of the translations because they cannot be parsed (as in examples (1) and (2) above). It is unlikely that an ungrammatical sentence is a good translation. In the case of Cognition’s Semantic NLP (which includes a complete semantic map of the English language), word and phrase meanings are discovered, therefore, it can rule out other translations as being semantically implausible, as in (3) above (i.e. buildings aren’t “terminated”, either in the sense of being fired, or in the sense of completing an electrical circuit). Finally, by ranking the semantic plausibility of the remaining translations, Cognition’s deep Semantic NLP can decide that (4) and (5) are good translations, but that (5) is more likely to be accurate.
In a prototype implementation within a commercial automated translation engine, Cognition’s Semantic NLP was able to eliminate 80% of 1,000 suggested parses for each of 50 sentences. This task showed the power and value of deep NLP for improving statistical language translation.
Conclusion
The accuracy of automated machine translation technology depends on an understanding language, yet it lacks the resources to achieve a high rate of understanding. Cognition’s Semantic NLP™ can give automated machine translation an understanding of word and sentence meaning that no other technology can.
Dr. Kathleen Dahlgren is the Founder and Chief Technology Officer of Cognition Technologies. She began her career as a professor of computational linguistics at Pitzer College of the Claremont Colleges and then worked for IBM at their Los Angeles Scientific Center, focusing on building a “natural language understanding system.” Dr. Dahlgren has a Ph.D. in Linguistics and a post-doctorate in Computer Science from the University of California, Los Angeles. She has published a number of scholarly articles on the subjects of linguistics and computer science, and is the author of Naive Semantics for Natural Language Understanding. She is the co-author of Cognition’s seminal patent (1998), and she received the Small Business Innovation Award from the U.S. Army in 1995. Currently, she is also an adjunct professor of Linguistics at the University of California, Los Angeles.








June 11th, 2008 at 7:43 am
Good to see machine translation is improving.
I’m believing the best way to improve machine translation is to use a global translation memory where users (human translators) can enter what they think is the best translation.
June 12th, 2008 at 4:23 am
There are so many technologies and theories that come into play in machine translation. I started with machine translation, slowly found myself doing information retrieval (due to a theory of mine where NLG could play a part), which then obviously threw me into natural language generation. I found that my mind always came back to using these things for machine translation. There are so many different dimensions, like construction grammar, mental spaces, and areas of cognitive psychology that I believe could play an important role. Ontologies are a problem because they take ages to compile manually (wordnet isn’t so complete for specific domains), but they do work, especially if you use semantic web things like OWL. I think we can build the resources to achieve a high rate of understanding. I don’t think it’s all down to word and sentence meaning but giving the machine the opportunity and the means to understand the world around it. Going beyond the language alone and working with entire constructions, and how they interact.
It’s a really interesting area of research, and pretty exciting as well. Can it really work? Well there’s no reason why not. Your work is very interesting.
Marie-Claire
November 7th, 2008 at 5:49 am
Kathleen wrote, “The accuracy of automated machine translation technology depends on an understanding language.”
Marie-Claire wrote, “I don’t think it’s all down to word and sentence meaning but giving the machine the opportunity and the means to understand the world around it.”
I agree with both statements. However, some texts can be both structurally correct and ambiguous. For example, “Find the man with a dog” can mean either of the following:
* Use a dog to find the man.
* Find the man who has a dog.
Possibly, the context lets readers know which meaning is correct. Therefore, possibly, a machine can know which context is correct.
However, in many cases, context does not let readers know which meaning is correct. The ONLY way to know with 100% certainty is to ask the person who wrote the text. (Lawyers make huge amounts of money by arguing about the meaning of text.)
For machine translation to be 100% accurate, it is necessary (but not sufficient) to use a controlled language for the source text. (A controlled language is a natural language that has restrictions on the grammar and the vocabulary that can be used. Ideally, each term has only one specific meaning.)
By the way, I am not belittling the achievements of Cognition Technologies, Inc. and other players in the market. Machine translation has a great future.