Stan Jou's lips were moving, but no sound was coming out.
![]() John Beale, Post-Gazette |
|
| Stan Jou takes a water break during a demonstration yesterday of a new voice translation device at Carnegie Mellon University. The electrodes pick up electrical signals from muscle movement during speech as part of the translation. |
The result boomed out of a loudspeaker a few seconds later:
"Let me introduce our new prototype," a synthesized voice announced. "You can speak in Mandarin and it translates into English or Spanish."
"This is a bit of science fiction," said Alex Waibel, director of the International Center for Advanced Communications Technologies, "but it is a vision that we think is very exciting." And where it once seemed a distant dream, it now is being actively developed thanks to recent advances in machine translation.
This particular gadget, when fully developed, might allow anyone to speak in any number of languages or, as Dr. Waibel put it, "to switch your mouth to a foreign language."
It was one of several translation devices his research group demonstrated publicly for the first time yesterday in a videoconference with reporters in Pittsburgh and at the University of Karlsruhe in Germany.
"We want to make language translation transparent," explained Dr. Waibel, a computer scientist who holds joint appointments at Carnegie Mellon and Karlsruhe.
The true centerpiece of the demonstration was the videoconference itself. As Dr. Waibel spoke, computer software translated his speech into Spanish and German.
Previous computer systems have translated the spoken word in limited contexts, or "domains," such as travel or medical information. But yesterday's demonstration was of so-called "open domain" speech-to-speech translation, a technically difficult feat to pull off because the spoken word is often ungrammatical and filled with colloquialisms.
"This is definitely a new frontier," said Kevin Knight, director of the University of Southern California's Information Sciences Institute. "If you look in the scientific literature, you couldn't find too much today on open domain speech translation."
What has made this possible has been a dramatic change in how computer translation programs are written. In the past, most translation software has been based on sets of rules -- dictionary definitions, grammatical rules and such. In other words, programmers tried to make a computer think like a human.
But increasingly, the trend in artificial intelligence is to allow the computers to think like computers, using statistical methods to draw meaning out of masses of information, said Randall E. Bryant, dean of Carnegie Mellon's School of Computer Science.
Speech recognition programs began using these statistical methods 15 years ago, Dr. Knight said. Only recently have they been applied to speech translation "and that's why things have been improving a lot lately."
The availability on the Internet of large amounts of translated text has been a major boon, said Dr. Waibel.
The results aren't perfect. When Dr. Waibel announced he would take questions from reporters in Germany and America, the computer heard it as "so we glycogen it alternating questions between Germany and America." And the systems don't really understand what they are translating, so may have trouble sometimes when a speaker tries to be humorous or ironic.
But he predicted open domain systems could be ready for use within five years.
"As we make contact, people will be more likely to learn other languages," Dr. Waibel said. U.S. soldiers in Iraq, for instance, who have handheld devices that repeat foreign phrases, ultimately have learned to speak those phrases themselves and discard the machines.